How to do interpolation of missing data in grid cells?

2932
6
11-12-2017 07:38 PM
FrederikMadsen
New Contributor

Hi.

I have created grid cells over Liberia on a shapefile. Then added data from a household survey, and joined the georeferencing data based on spatial location. The trouble is, that only around 100 of 1.300 cells are represented with data, the rest has the value 0.

How can I do an interpolation that fills the missing values in my attribute table? I would prefer an inverse distance weight interpolation in the geostatistical wizard that gives me the predicted values. But the huge amount of zeroes are too biased, and when I change the zeroes to <Null>, they are not included in the interpolation.

What am I doing wrong?

Thank you!

Frederik

0 Kudos
6 Replies
DanPatterson_Retired
MVP Emeritus

A large number of your points are clustered to the west.  Even if you did a interpolation to a raster format, that information would be aggregated severly and essentially lost.  The rest of your data doesn't lend itself to other interpolators well. 

The big question is, you are trying to interpolate some household survey information.  Do any of the empty cells have similar characteristics to cells with data.  In otherwords do the other cells that are empty have similar populations as cells with data?  Are some cells devoid of people? or significant population?  If not, then they should be masked out because if they don't have a representative population then they would have no possible survey data.

I would mask out the areas that can't feasibly be considered in the survey in the first place.  Then you can start from that point

FrederikMadsen
New Contributor

Hi Dan,

The data should be representational for the whole country with survey design and weights I use. The data in each grid cell represents average wealth. I would like to use an inverse distance weight interpolation to fill out the missing values in my attribute table for all grid cells. But as I said, the big amount of zeros is a problem. How do I make them NOT count in the interpolation? I have tried to change the values to <Null>, but that just completely excludes the rows from the interpolation which is not what I want.

0 Kudos
DanPatterson_Retired
MVP Emeritus

Think of it this way.  You have a pre-chosen fishnet, size, shape and alignment.  If you summarized the values within those 'zones' you would then have to convert the zones out to nodata.  Taking the centroid of the fishnet would give you a point pattern that you would do a coarse interpolation to raster, using a cell size equal to the fishnet cell size. 

I strongly doubt that IDW using any of the options, is going to give you an interpolated surface that would be useful in the vast areas of nodata.  Getting numbers isn't the problem, getting meaningful numbers is the issue.  Have you looked at some of the other interpolation options in the GA?

I will flag slynch-esristaff‌ since he usually has good recommendations about appropriate interpolation methods given a pattern of points.

FrederikMadsen
New Contributor

Thank you for reply.

Yes, it's a pre-chosen fishnet with data, size, shape and alignment. It's a shapefile from another study that I'm trying to replicate, just with new data. I was thinking just to use that, and join the new data from another layer based on spatial location. If it would change anything, I could create my own fishnet, and not steal from the study, but dont know if it would solve the problem of not all cells being represented.

Citation from the study

"A potential challenge to our study is that the DHS surveys are only representative for that location and cannot inform us about nonsampled locations. We present below our method to interpolate measurements between locations. This method must necessarily underestimate variability within locations, and we do not see any obvious source of systematic bias. If so, our imperfect data should attenuate our results, leading to nonsignificant findings...() we interpolate data on wealth levels using a method designed for spatial data in GIS. The “Inverse Distance Weighted” (IDW) method".

The interpolation from the original study

The original variable and my own new

0 Kudos
DanPatterson_Retired
MVP Emeritus

Steve has given a suggestion which could be explored if you also take a look at whether there exists a correlation between the results of the previous study and your new variable

SteveLynch
Esri Regular Contributor

If the zero values are changed to nulls they will be ignored, that is by design.

The pattern of your points tell me that you'll need covariates (explanatory) variables. Cokring or better still EBKRegressionKriging in Pro 1.2+