Correlation between 2 rasters

11332
6
10-20-2016 11:30 PM
RodNielson1
New Contributor II

I have 2 raster datasets, crop yield and elevation for the same field. I want to find out whether there is a correlation between them, and, if possible, to produce a plot or table which I can plot to produce a liner of best fit. I haven't been able to find a tool in ArcGIS which allows me to compare two raster datasets. Any ideas?

6 Replies
DanPatterson_Retired
MVP Emeritus

there are tools in the Spatial Analyst tools

Exploratory Regression—Help | ArcGIS for Desktop 

but you need to assess spatial autocorrelation

Spatial Autocorrelation tool graphical output—Help | ArcGIS for Desktop 

There is a cavaet that doing regression on spatial data at the best of times, has a large number of conditions

Geographically Weighted Regression (GWR)—Help | ArcGIS for Desktop 

An overview is An overview of the Spatial Statistics toolbox—Help | ArcGIS for Desktop 

But I suspect that you are really looking to see whether there is a pattern in the results and not needing a predictive pattern because the basic tennant of simple regression are not met by using two rasters as input since the 'observation points' are  not uncorrelated.

If you classify your inputs into nominal classes, you could use the Combine—Help | ArcGIS for Desktop to show whether the association between the variables results in combinations of the classes.  The class divisions, however will affect the outcomes.

I would venture forth carefully to avoid suggesting that there is a 'correlation' between these two variables using interval ratio data since you could end up implying that an increase in elevation by 5 meters resulted in a 2% decrease in crop yield.  A ridiculous statement, even if the numbers supported the results.  How to Lie with Statistics - Wikipedia,  Spurious correlation - Wikipedia a required read and there are many other examples of the misappropriation or inappropriate use of statistical tests. Spurious correlations: Margarine linked to divorce? - BBC News 

XanderBakker
Esri Esteemed Contributor

In addition to all the good suggestions made by Dan Patterson‌, there is another simular thread here: Scatter plot of two rasters and I posted a sample to extract some random points and create a scatter plot: https://community.esri.com/thread/184249-re-perform-raster-calculation-from-multiple-sub-folder-usin... (it will just take a sample of the pixels, not all the pixels)

RodNielson1
New Contributor II

A change in elevation producing a change in yield is not a ridiculous statement when you consider the hundreds of thousands to millions of dollars spent every year on laser levelling and forming to enhance field drainage. If the numbers support the results you have to determine if elevation is a factor, or is something else may be causing this effect.

Given that drainage is an issue and that a 50mm depression in a field may pool water for extended periods of time during the wet season, severely reducing yield and even killing a proportion of the crop. An almost imperceptible ridge through a field, even 150mm or 200mm above the height of the rest of the field often produces a higher yield. Every grower in the industry is aware of this. What the intent is to find if there is actually a correlation between these slight changes in in-field elevation and yield. Of course this will not be the end of the story, finding a positive or negative correlation is not an end in itself, but provides an indication of one variable to consider.

One of the papers at the recent international conference for precision agriculture in St. Louis, advised producing a scatterplot between soil electromagnetic conductivity and yield, as a correlation has been found between these variables. In our part of the world, given our seasonal rainfall, drainage and therefore in-field elevation does make a difference in yield.

Unfortunately none of the tools mentioned are giving me the option of using a raster as an input layer, and none of them are allowing me to add more than one FC at a time.

I have an interpolated raster of crop yield, based on yield monitor and GPS data, and an elevation raster developed from LiDAR collected at 1 point per sqm. I would like to avoid going back to the original point data due to the high numbers of points to deal with.

I am currently limited to ArcGIS Advanced with Spatial Analyst. A couple of tools I have looked at require 3D analyst, and if I have to purchase that down the track some time then so be it, but for now I have this limitation.

Sorry for the long-winded explanation, but for decades there has been a suspected association between changes in in-field elevation, even small ones, and yield. Until recently we have not have the tools to measure/collect or to analyse the data that we now have available to us. As we do our analysis our results are reviewed by statisticians/biometricians to maintain rigour.

Regards

Rod Nielson

GIS Officer

Herbert Cane Productivity Services Ltd.

Ph: 07 4776 5660

Mob: 0403 215 594

Email: rnielson@hcpsl.com.au

Fax: 07 4776 1811

0 Kudos
DanPatterson_Retired
MVP Emeritus

Given those comments, use your data to delineate depressions in the landscape.  That seems to be the key, not the actual elevation. perhaps the slope.  In this way if you have an elevation of X, then you can differentiate between X in a depression, a steep slope, on a flat surface, a gently sloping surface, one facing north, one on sandy, silty, clay soils, on drained vs untrained fields etc etc.  Once partitioned perhaps some patterns can be extracted from the raw elevation, which in themselves will not be correlated with yield... at least without some adjective attached to them.

As for determining depressions there are some tools within the hydrology toolset that you might look at, such as the Identify sinks tool.  You can detect ridges or domes, by inverting the terrain and repeating..  These are just a few ideas.  Have you explored any of these?

MervynLotter
Occasional Contributor III

Hi Rod

Did you try running the Band Collection Statistics tool with computing correlation and covariance matrices option selected? It takes any number of rasters as input and I just quickly tested it on two input rasters (elevation and NDVI, not knowing whether it would run on only two rasters), and the resulting output provides that there is a 33% correlation for my example area. See below.

# CORRELATION MATRIX

# Layer 1 2
# --------------------------------------------------------------------------
1 1.00000 -0.33215
2 -0.33215 1.00000
# ==========================================================================

As Dan suggested, you could then use your DEM to create slope, aspect, depressions, topographical positional index, etc. and correlate these with your yield raster. 

0 Kudos
MervynLotter
Occasional Contributor III

You may want to try running the Band Collection Statistics geoprocessing tool. You can supply any number of rasters and then make sure you select the Compute the covariance and correlation matrices option. The output is a text file that will indicate how closely the two rasters are correlated. 

After running a PCA, I used this approach to determine the extent to which the informing input rasters are correlated to the resulting PCA axes. 

0 Kudos