variogram and outliers

3327
3
11-17-2011 04:48 AM
MichaelMcManus
New Contributor
Does the GA extension include robust measures to handle outliers when producing sample variograms?  Is there something similar to "cressie=TRUE" argument that is available in the R package gstat?
Thanks
0 Kudos
3 Replies
EricKrause
Esri Regular Contributor
We do not have that option.  We decided long ago that the best way to proceed with kriging is to start with ESDA, then proceed to variography.  Outliers should be detected and removed in the ESDA step.
0 Kudos
MichaelMcManus
New Contributor
Can you help me understand the effect of outliers on the semivariogram?  For example on p. 653 of Konstantin Krivoruchko's Spatial Statistical Data Analysis for GIS Users, there is a histogram of well depth with the 30 deepest wells highlighted showing the skewness in the histogram.  The semivariogram model of well depth for all 333 wells shows mainly a nugget effect and little relationship between gamma and distance.  Such a pattern is attributed to deep wells being distributed somewhat randomly through out the study area.  Is the combination of outliers having values in the tail of the histogram and the spatial distribution of the outliers being random that causes lack of spatial dependence in the semivariogram?  Once the deep wells are removed, then the semivariogram models shows (Figure 15.7) strong dependence.  I am struck by the effect on the semivariogram model caused by the statistical distribution and spatial distribution of 10% of the data.
Thanks
0 Kudos
EricKrause
Esri Regular Contributor
The empirical semivariogram is created by averaging the squared differences between pairs of points that are approximately the same distance apart.  Even a few outliers can heavily influence this average (particularly because you're squaring the difference). 

Outliers are almost always problematic for kriging, but they're particularly bad when the outliers are scattered randomly throughout the study region (rather than being clustered together).  This is because randomly scattered outliers will affect the empirical semivariances at small distances (because they might be right next to low values), but if the outliers are clustered, the squared difference between two outliers might still be small, allowing for accurate semivariogram estimation at small distances (which is the most important part of the semivariogram because closer neighbors get the highest weights).
0 Kudos