Logarithmic Averaging / Geometric Mean

11666
12
10-04-2010 10:33 AM
KoreenMillard
New Contributor II
Hello, I want to calculate the logarithmic average of the values of points as a raster. 

Has anyone ever tried to do this? 

-k

FYI:

Geometric mean explained by wikipedia:  "The geometric mean, in mathematics, is a type of mean or average, which indicates the central tendency or typical value of a set of numbers. It is similar to the arithmetic mean, which is what most people think of with the word "average", except that the numbers are multiplied and then the nth root (where n is the count of numbers in the set) of the resulting product is taken."

http://en.wikipedia.org/wiki/Geometric_mean
12 Replies
KoreenMillard
New Contributor II
Thanks very much!
0 Kudos
MichaelAugust
Occasional Contributor III

Hi Koreen - did you figure out how to calculate the geometric mean with the raster calculator? Thanks!

0 Kudos
curtvprice
MVP Esteemed Contributor

The way you'd do this using the 10x Raster Calculator tool would be:

("grid1" * "grid2" * "grid3" ) ** (1 / 3.)

Don't forget to include the "." after the number of rasters (3), as Python division is integer if both inputs are integer.

>>> 1 / 3

0

>>> 1 / 3.

0.3333333333333333

0 Kudos
MichaelAugust
Occasional Contributor III

Hi Curtis - I meant calculate the geometric mean of all the individual cells contained in one raster, like this:

Geometric Mean

with each of those being an individual cell of my raster.  I think we realized that doing this to 1.2M cells in a raster simply implodes in Python, we couldn't get the script to do more than 50 or so values at one time.  Can someone out there with more programming experience explain why, or if there's a better way to do geometric mean in Python or ArcGIS?
thanks!

0 Kudos
ShaunWalbridge
Esri Regular Contributor

Not a spatial problem per-se, and the SciPy library does a pretty good job for problems like this. It'll be included in the 10.3 release, but can be downloaded and installed separately in 10.x. With that in place, convert your raster to a NumPy array, then use the statistics built in to SciPy to do the calculation:

import scipy.stats

rast_path = 'C:/my_input_raster.tif')

raster_as_numpy_array = arcpy.RasterToNumPyArray(rast_path)

raster_geometric_mean = scipy.stats.stats.gmean(

    raster_as_numpy_array, axis=None)

Which for a raster of 10M values takes a few seconds for me to run.

Hope that helps,

Shaun

Edit: Michael August I've updated my answer to include 'axis=None' which will compute the geometric mean for the whole matrix (instead of along one axis). With this change, it should work without any further steps.

MichaelAugust
Occasional Contributor III

>>> raster_as_numpy_array = arcpy.RasterToNumPyArray('C:\Users\xxxxx\Downloads\q47121g52be.tif\q47121g52be.tif')

>>> raster_geometric_mean = scipy.stats.mstats.gmean(raster_as_numpy_array)

>>> raster_geometric_mean

array([ 0.,  0.,  0., ...,  0.,  0.,  0.], dtype=float16)

So I tried the above on integer and float rasters and am not sure what I did wrong there...how do you get to the gm as a single reported value? If you can't tell I am fairly new at this point to Python -thanks you guys for all your help!

0 Kudos
DanPatterson_Retired
MVP Emeritus

once you have the array...check my examples in this thread

0 Kudos
MichaelAugust
Occasional Contributor III

So finish it with this part:

>>> cp = np.cumproduct(a)

  • >>> cp
  • array([    1,      2,      6,    24,    120,    720504040320
  • 362880]) 
  • >>> N = np.size(cp)
  • >>> N
  • >>> gm = (cp[-1])**(1.0/N) 
  • >>> gm
  • 4.1471662743969127
0 Kudos
DanPatterson_Retired
MVP Emeritus

In my thread, I outlined two options...one using the cumulative product approach...and the other..the cumulative sum of the logs...  the cumulative product approach will fail if you have a large array and/or the numbers in the array contain large numbers due to numeric overflow.  The log approach suffers less from this.  The only difference is the way in which the final GM is determined.  I documented this for teaching purposes and to provide some context on GM use and the calculation pitfalls.  I am pretty sure that spreadsheet geometric mean functions don't use the cumulative product approach but rollout the log summation method behind the scenes..... So read my thread carefully...