Cross-Validation: Difference between cross-validated prediction and final interpolated prediction?

Elijah · ‎04-14-2022

I am referring to the highlighted portion of the attached document being some of your comments in response to a cross validation question.

I am not able to wrap my head around it yet. Could you please elucidate perhaps with an example? For instance, how does the cross-validated prediction and the final interpolated prediction differ. Cross validation conceptually removes a measured point and purports to predict that same value using all other points. Then the difference between the predicted and the measured is calculated which is the error. I can only understand "one" prediction here which, in my view, is the final prediction. How come we have "cross validated prediction" and "final interpolated prediction". Please, explain.

EricKrause · ‎04-14-2022

Hi Elijah, I'll try with an example of Inverse Distance Weighting with five points: p1, p2, p3, p4, and p5. Each of these points have a location and a measured value.

Cross validation would start by removing p1. It would then use p2, p3, p4, and p5 to predict the value of p1. In IDW, this means taking the weighted average of the values of p2 to p5 (weighted by inverse distance). This will result in some prediction (called the cross validation prediction) that can be compared to the measured value of p1.

Next, p2 would be removed, and p1, p3, p4, and p5 would be used to predict to the location of p2 (note that p1 is added back to the dataset after being cross validated). The same is done for p3, p4, and p5, each using the other four points. This would produce five cross validation errors that would be used to calculate, among other things, the root mean square error of the IDW model.

But when actually making the prediction surface (after cross validation), all points are used to make the predictions. The surface also predicts values everywhere, including at the input point locations. So, what will it predict at, say, the location of p3? The prediction is the weighted average of all the points p1, p2, p3, p4, and p5, weighted by the inverse distance to p3. But the distance from p3 to itself is zero, which gives the value of p3 a weight of infinity. This forces the predicted value to be exactly equal to the measured value at p3. This is what makes IDW an "exact" interpolation method.

Please let me know if that still is not clear.

-Eric

View solution in original post

Elijah · ‎04-19-2022

Hi @ Eric,

Your explanation makes sense to me. I didn't quite grasp this before now. Many thanks.

Elijah.

View solution in original post

EricKrause · ‎04-14-2022

Hi Elijah, I'll try with an example of Inverse Distance Weighting with five points: p1, p2, p3, p4, and p5. Each of these points have a location and a measured value.

Cross validation would start by removing p1. It would then use p2, p3, p4, and p5 to predict the value of p1. In IDW, this means taking the weighted average of the values of p2 to p5 (weighted by inverse distance). This will result in some prediction (called the cross validation prediction) that can be compared to the measured value of p1.

Next, p2 would be removed, and p1, p3, p4, and p5 would be used to predict to the location of p2 (note that p1 is added back to the dataset after being cross validated). The same is done for p3, p4, and p5, each using the other four points. This would produce five cross validation errors that would be used to calculate, among other things, the root mean square error of the IDW model.

But when actually making the prediction surface (after cross validation), all points are used to make the predictions. The surface also predicts values everywhere, including at the input point locations. So, what will it predict at, say, the location of p3? The prediction is the weighted average of all the points p1, p2, p3, p4, and p5, weighted by the inverse distance to p3. But the distance from p3 to itself is zero, which gives the value of p3 a weight of infinity. This forces the predicted value to be exactly equal to the measured value at p3. This is what makes IDW an "exact" interpolation method.

Please let me know if that still is not clear.

-Eric

Elijah · ‎04-19-2022

Hi @ Eric,

Your explanation makes sense to me. I didn't quite grasp this before now. Many thanks.

Elijah.