问题 使用局部加权回归（LOESS / LOWESS）预测新数据

如何在python中拟合局部加权回归，以便它可用于预测新数据？

有 statsmodels.nonparametric.smoothers_lowess.lowess，但它仅返回原始数据集的估计值;所以它似乎只做 fit 和 predict 一起，而不是像我预期的那样单独。

scikit-learn 总是有一个 fit 允许稍后在新数据上使用该对象的方法 predict;但它没有实现 lowess。

Lowess非常适合预测（当与插值结合使用时）！我认为代码非常简单 - 如果您有任何问题，请告诉我！ Matplolib图

import matplotlib.pyplot as plt
%matplotlib inline
from scipy.interpolate import interp1d
import statsmodels.api as sm

# introduce some floats in our x-values
x = list(range(3, 33)) + [3.2, 6.2]
y = [1,2,1,2,1,1,3,4,5,4,5,6,5,6,7,8,9,10,11,11,12,11,11,10,12,11,11,10,9,8,2,13]

# lowess will return our "smoothed" data with a y value for at every x-value
lowess = sm.nonparametric.lowess(y, x, frac=.3)

# unpack the lowess smoothed points to their values
lowess_x = list(zip(*lowess))[0]
lowess_y = list(zip(*lowess))[1]

# run scipy's interpolation. There is also extrapolation I believe
f = interp1d(lowess_x, lowess_y, bounds_error=False)

xnew = [i/10. for i in range(400)]

# this this generate y values for our xvalues by our interpolator
# it will MISS values outsite of the x window (less than 3, greater than 33)
# There might be a better approach, but you can run a for loop
#and if the value is out of the range, use f(min(lowess_x)) or f(max(lowess_x))
ynew = f(xnew)


plt.plot(x, y, 'o')
plt.plot(lowess_x, lowess_y, '*')
plt.plot(xnew, ynew, '-')
plt.show()

请考虑使用内核回归。

statmodels有一个履行。

如果你有太多的数据点，为什么不使用sk.learn radiusNeighborRegression 并指定三重权重函数？

这不是低位的原因。 Lowess用于平滑，而不是预测 - Jesse Bakker

@JesseBakker它当然可以用于预测。 stat.ethz.ch/R-manual/R-devel/library/stats/html/...。另见 stackoverflow.com/questions/12822069/...。 - max

这将使用线性插值。虽然这不是不合理的，但它与“使用lowess预测”并不完全相同。 Lowess被定义为训练点子集的加权线性回归。它对新点的预测应该基于该回归的结果，而不是预测训练集的两个附近点，然后用线连接它们。对于密集的数据集，当然差异是微不足道的。超出范围的点也应该用相应邻域的加权LR预测，而不是固定值。 - max

@max刚遇到类似问题的问题。虽然sklearn没有实现LOESS，但它有一个RANSAC实现，看起来与我未经训练的眼睛相似。希望这对某人有用： scikit-learn.org/stable/modules/generated/... - Aleksander Lidtke

@max这根本不是不合理的，我一直在使用类似的方法以非参数方式缩放代谢组学数据一段时间。我将范围之外的点缩放到LOWESS曲线的最大值或最小值，并对其他所有值进行线性插值。如果没有足够的点进行正确的线性插值，那么在我看来，没有足够的点用于合适的LOWESS曲线。另一个注意事项，我一直在使用R库来实现对于Python库的LOWESS。 python库有一些我无法调和的边缘效应问题。得爱RPy2 - Daniel Hitchcock

@DanielHitchcock If there are not enough points for a proper linear interpolation, then there are not enough points for a proper LOWESS curve in my opinion - 我很同情你的论点，但那是因为我个人偏向于支持超简单的技巧。我当然不会试图说服那些使用LOWESS的数据科学家，他们应该放弃它而采用线性插值。我没有回答你的答案，但我可以看到SO用户如何认为它没有回答我原来的问题。 - max

@AleksanderLidtke RANSAC可用于类似于LOWESS（或线性插值）可用的情况。但它肯定是一个非常不同的算法（它是非确定性的;它对异常值不太敏感，因为它试图删除它们）。 - max

@David_R，如果你提供了更清晰的意思（实际上显示了你的实现），这个答案就会很出色。只是一个建议。 - benjaminmgross

@benjaminmgross，谢谢你的说明。也许我会在本周晚些时候或本周末找一些时间来详细说明。 - David R

问题使用局部加权回归（LOESS / LOWESS）预测新数据

答案:

热门问题

问题 使用局部加权回归（LOESS / LOWESS）预测新数据

答案:

热门问题

问题使用局部加权回归（LOESS / LOWESS）预测新数据