Skip navigation
All People > Dan_Patterson > Py... blog > 2016 > January
2016

Introduction

This blog post is for the two other people in the world that might be remotely interested in rasters/arrays and dimensionality as it can be applied to reshaping for subsequent analysis.  The topic was introduced in my previous blog about block statistics.  I am gently warming the readers (both of you) to moving/sliding window functions and how they work on a mechanical level.

 

The code below is for the record.  This is one incarnation of what is out there on the web.  Most of the comments and posts deal with the 'speed' and efficacy of determining what you will see.  In my opinion, those discussions don't matter... I have only needed primes and combinations and permutations once outside of an academic environment.  I will rely on this code should the need arise again.  So here it is with an output example.

 

Nerd Note:

As of worlds largest prime as of this date,   , the worlds largest prime is ... 274207281-1 .... or 22,338,618 digits the number is called M74207281

which is the 49th Mersenne Prime discovered.     http://www.mersenne.org/primes/?press=M74207281

 

In the next post, I will use this foundation to provide more background on working with block and moving window operations.

 

"""
Script:  primes_combos_demo.py
Author:  Dan.Patterson@carleton.ca
Purpose: Long forgotten, but it is done.
"""
import numpy as np
import itertools

def primes(n):
    """Returns a array of primes, p < n 
    http://stackoverflow.com/questions/2068372/fastest-way-to-list-
    all primes-below-n-in-python/3035188#3035188
    """
    assert n>=2
    sieve = np.ones(n/2, dtype=np.bool)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i/2]:
            sieve[i*i/2::i] = False
    p_num = np.r_[2, 2*np.nonzero(sieve)[0][1::]+1]
    return p_num


def divisor(n, prn=False):
    """Determine the divisors, products, cumulative products and
       divisors given a set of primes < n
    """
    p_vals = primes(n)
    divs = [p for p in p_vals if (n % p == 0)]
    cump = np.cumproduct(divs).tolist()
    divs = divs + cump
    prods = list(set(i*j for i in divs for j in divs))
    all_div = list(set(divs+prods))
    all_div.sort()
    if prn:
        print("all",all_div)
    return all_div 

def perm_sums(n,max_dim=3):
    """Determine the permutations that sum to n for an array of given size.
       A verbose workflow follows...   """
    ##divs, subs, prods = divisor(n)
    perms = divisor(n)
    p =[]
    for i in range(1,max_dim+1):
        p.append(list(itertools.product(perms, repeat=i)))
    combos = []
    for i in range(len(p)):
        vals = [v for v in p[i] if np.product(v)==n ]
        combos.extend(vals) #.append(vals)
    perms = np.array(perms)
    ppp = np.product(perms,axis=-1)   
    wh = np.where(ppp == n)
    perm_vals = perms[wh]
    return combos

def demo(n=36, prn=True):
    """ demo perm_sums"""
    p = primes(n)
    print("\nPrimes:...{}".format(p))
    all_div = divisor(n)
    print("Number:...{}\nAll:...{}".format(n, all_div))
    ndim = 4
    combos = perm_sums(n, max_dim=ndim)
    if prn:
        print("\nProducts for... {} - max_dim = {}".format(n, ndim))
        for i in combos:
            print(i)
    return combos  

if __name__=="__main__":
    """primes demo"""
    n = 36
    combos = demo(n, prn=True)

 

For the example in my last post for a 6x6 array for 36 unique numbers, these list the possible ways that that data can be reconfigure.

 

Primes:...[ 2  3  5  7 11 13 17 19 23 29 31]
Number:...36
All:...[2, 3, 4, 6, 9, 12, 18, 36]

Products for... 36 - max_dim = 4

2D3D4D

(36,)

(2, 18)

(3, 12)

(4, 9)

(6, 6)

(9, 4)

(12, 3)

(18, 2)

(2, 2, 9)

(2, 3, 6)

(2, 6, 3)

(2, 9, 2)

(3, 2, 6)

(3, 3, 4)

(3, 4, 3)

(3, 6, 2)

(4, 3, 3)

(6, 2, 3)

(6, 3, 2)

(9, 2, 2)

(2, 2, 3, 3)

(2, 3, 2, 3)

(2, 3, 3, 2)

(3, 2, 2, 3)

(3, 2, 3, 2)

(3, 3, 2, 2)

If you specify a minimum window size you can get possible combinations.

Minimum window...[3, 3]

[(3, 12), (4, 9), (6, 6), (9, 4), (12, 3), (2, 3, 6), (2, 6, 3), (3, 3, 4), (3, 4, 3), (4, 3, 3), (2, 2, 3, 3)]

 

That's all for now.  The cases of being able to reshape a raster in 1, 2, 3 or 4 dimensions are given above.

Block functions in rasters ...

Modified Jan 17,2015

 

The background..Results...

To begin, we will construct a simple array consisting of 36 numbers from 0 and 35.

The array is broken down into 3x3 blocks of numbers as can be seen in the output array sample.

 

Block functions are not used as often as they could be in GIS.  This type of function moves or "jumps" by steps in the X and Y direction specified by the block size.

 

Moving or sliding window functions are more common in GIS.  The window slides by one cell in the X and Y direction regardless of the shape, size and configuration of the window.

 

The block functions will be demonstrated in this post since they are simpler to comprehend.  The guiding principles for both types of functions are the same, differing only in their movement characteristics.

 

The input raster to the right, is being represented in array format so that one can see the values that each "cell" represents.  Each cell is of equal dimensions, varying only in the value that it possesses.  When using block function, the raster is tiled into subarrays/rasters using a "window".  In this example, the simple 3x3 window will be used.  Since the raster is to be broken down by this window in a jumping fashion, we end up... nicely ... nicely with 4 sub rasters each 3x3 in size.  Each sub-raster/array consists of 9 values.  It should be noted, that they could contain nodata values which should not be used in calculations.  Currently all sub-rasters all contain valid integer values

 

The values for the statistics shown represent the values for each array sub-block.  For instance, the minimum has values of 0, 3, 18, and 21.  If you examine the original array, or the samples sub-raster/arrays, you will see these values are indeed correct.  The difference is that each number now represents the 9 cells making up the 3x3 block.

 

The statistical parameters were derived using numpy functions and are shown below without explanation (for now).  it is suffice to say, that the input array represents dimension 0 and 1 and each sample array represents dimension 2 and 3, ergo, the values are calculated on the last 2 dimensions so that they represent each block.

 

The formulae for the code-ophiles are as follows:

    a_min = a.min(axis=(2,3))           a_max = a.max(axis=(2,3))

    a_mean = a.mean(axis=(2,3))     a_sum = a.sum(axis=(2,3))

    a_std = a.std(axis=(2,3))             a_var = a.var(axis=(2,3))

    a_ptp = a_max - a_min

 

There are the equivalents for the above for rasters/arrays that contain nodata values (ie. np.nanmin, np.nanmax, np.nanmean, np.nansum, np.nanstd, np.nanvar).

 

Now, if we take the results for the minimum, we can expand it back to its original form by expanding is both the X and Y directions.

 

    expanded = arr_expand(a_min,3,3)

 

There are the minor omissions of dealing with nodata values and input arrays that aren't evenly divisable by the window size...but they are minor details and will be covered elsewhere.

 

The observer should also note, that the array can be exported to a comma, space or tab delimited file  and given an ascii format header to give it real world coordinates for use/import into arcmap.  The whole process can also be accompanied with useage of RasterToNumPy array for export to NumPy and NumPyArrayToRaster if desired.  The subarray types can also be unraveled and presented in tabular format, concatenated and brought back into arcmap for use in graphing or in processes involving other arrays.

 

Input array...

[[ 0  1  2  3  4  5]

[ 6  7  8  9 10 11]

[12 13 14 15 16 17]

[18 19 20 21 22 23]

[24 25 26 27 28 29]

[30 31 32 33 34 35]]

 

Output array sample...

[[[[ 0  1  2]

   [ 6  7  8]

   [12 13 14]]

 

  [[ 3  4  5]

   [ 9 10 11]

   [15 16 17]]]

 

[[[18 19 20]

   [24 25 26]

   [30 31 32]]

 

  [[21 22 23]

   [27 28 29]

   [33 34 35]]]]

 

Results...............

Minimum...a_min

[[ 0  3]

[18 21]]

 

Maximum...a_max

[[14 17]

[32 35]]

 

Mean...a_mean

[[  7.  10.]

[ 25.  28.]]

 

Sum...a_sum

[[ 63  90]

[225 252]]

 

Std...a_std

[[ 4.97  4.97]

[ 4.97  4.97]]

 

Var...a_var

[[ 24.67  24.67]

[ 24.67  24.67]]

 

Range...a_ptp

[[14 14]

[14 14]]

 

expanded array...expanded

[[ 0  0  0  3  3  3]

[ 0  0  0  3  3  3]

[ 0  0  0  3  3  3]

[18 18 18 21 21 21]

[18 18 18 21 21 21]

[18 18 18 21 21 21]]

Summary: .more examples will be given in the upcoming weeks...

Some more information on finding python, its modules and whether they are built-in, modules of the standard distribution or those in the site-packages folder

References:

  29.12. inspect — Inspect live objects — Python 3.4.4 documentation look at the table on Types and Members

  http://stackoverflow.com/questions/247770/retrieving-python-module-path just one of hundreds of examples.

 

TypeAttributeDescription
module__doc__documentation string
__file__filename (missing for built-in modules)

 

TipsExample

A standard installation compatible with ArcMap and arcpy is assumed.

These:

  • dir(module_name)
  • help(module_name)

are your friends.

 

Look for a __file__ property in the function/property list.

The os module does...

>>> import  os
>>> dir(os)
[ ... snip ... '__doc__', '__file__',
'__loader__', '__name__', ... snip ... ]
>>> os.__file__
'C:\\Python34\\lib\\os.py'
>>>

 

The time module doesn't

>>> import time
>>> dir(time)
[ ... snip ... '__doc__', '__file__',
'__loader__', '__name__', ... snip ... ]
>>> time.__file__   # I will try anyway
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
AttributeError: 'module' object has no attribute '__file__'

Modules in the

C:\PythonXX\lib\

folder will have a __file__ property.

 

>>> # from ... C:\Python34\Lib   ... folder
>>> import collections
>>> collections.__file__
'C:\\Python34\\lib\\collections\\__init__.py'
Modules that aren't part of the standard distribution and have to be installed via pip or other means, get put in the site-packages folder and they will have a __file__

How about arcpy?

>>> import arcpy
>>> arcpy.__file__
'C:\\Program Files\\ArcGIS\\Pro\\Resources\\ArcPy\\arcpy\\__init__.py'
>>>

 

And my fav...

>>> import numpy as np
>>> np.__file__
'C:\\Python34\\lib\\site-packages\\numpy\\__init__.py'
You can use the inspect module as well... all the cool programmers do ...

Numpy was previously imported as np.

Notice the results are the same as the np.__file__ approach.

>>> import inspect
>>> inspect.getfile(np)
'C:\\Python34\\lib\\site-packages\\numpy\\__init__.py'
>>>

 

Lets try time.

>>> inspect.getfile(time)
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\Python34\lib\inspect.py", line 518, in getfile
    raise TypeError('{!r} is a built-in module'.format(object))
TypeError: <module 'time' (built-in)> is a built-in module
>>>

No... it is a built-in.

But... you can another way...

>>> import time
>>> inspect.getmodule(time)
<module 'time' (built-in)>
>>>

Original:    Aug 2013

Modified:  Sept 2015  to include Numpy version

Reposted:  Jan 2016

 

General references

-   http://webspace.ship.edu/pgmarr/Geo441/Lectures/Lec%2016%20-%20Directional%20Statistics.pdf

-   http://ncss.wpengine.netdna-cdn.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Circular_Data_Analysis.pdf

-   SciPy implementation  scipy/morestats.py at master · scipy/scipy ·GitHublines 2601 - 2642  using complex numbers

 

When working with angles or time (minutes, days, months or years) the mean (average) of a list of values can't be determined in the same fashion as we normally do.  This example show one how to calculate the values of angles (azimuths specifically).  The function can be used in any script once the circular_mean.py script is imported.

 

I will get around to other descriptive measures for circular data but they are covered in the references

 

"""
CircularMean
Author: Dan.Patterson@carleton.ca
Date:     Aug 2013
Modified: Sept 2015
Purpose:
  To calculate the mean (average) for points along a circle using the 
  angular measure to these points from the center of the circle
Notes:
  If the angles are given in degrees then there is no need to convert
  If the units are time, such as hours (24 hour clock) or months (12) then
  these need to be converted by
    angle in degrees = time quantity / number of equal intervals
    ie 0.5 = (12 hours / 24 hours per day)
    or 180 degrees
required modules
numpy, math
"""
import math
import numpy as np
def circ_mean(angles):
    """angles in degrees"
    """
    cosList = []
    cosSum = 0.0
    sinSum = 0.0
    for i in angles:
        theCos = math.cos(math.radians(float(i)))
        theSin = math.sin(math.radians(float(i)))
        cosSum += theCos
        sinSum += theSin
    N = len(angles)
    C = cosSum/N
    S = sinSum/N
    theMean = math.atan2(S,C)
    if theMean < 0.0:
        theMean += math.radians(360.0)
    return theMean
def circ_mean_np(angles,azimuth=True):
    """ numpy version of above"""
    rads = np.deg2rad(angles)
    av_sin = np.mean(np.sin(rads))
    av_cos = np.mean(np.cos(rads))
    ang_rad = np.arctan2(av_sin,av_cos)
    ang_deg = np.rad2deg(ang_rad)
    if azimuth:
        ang_deg = np.mod(ang_deg,360.)
    return ang_rad, ang_deg
if __name__ == "__main__":

    sampleData = [ [350,10],
                   [350, 5],
                   [355, 10],
                   [90,270],
                   [180,0],
                   [180,360],
                   [0,360],
                   [90,180,270]
                 ]
    print("\nCircular Mean Demo...python version\n")
    frmt ="{:15s}{:15s}{:15s}"
    print(frmt.format("angles", "Mean radians", "Mean degrees"))
    for angles in sampleData:
        theMean = circ_mean(angles)
        args = (angles, theMean, math.degrees(theMean))
        print("{!s:>15}{:>12.4}{:>15.4}".format(*args))
    print("\nCircular Mean Demo...numpy version\n")
    frmt ="{:15s}{:15s}{:15s}"
    print(frmt.format("angles", "Mean radians", "Mean degrees"))
    #
    for angles in sampleData:
        result = circ_mean_np(angles)
        print("{!s:>15}{:>12.4f}{:>15.1f}".format(angles, result[0],result[1]))

 

Results from both

angles         Mean radians   Mean degrees   
      [350, 10]       6.283          360.0
       [350, 5]        6.24          357.5
      [355, 10]     0.04363            2.5
      [90, 270]       3.142          180.0
       [180, 0]       1.571           90.0
     [180, 360]       4.712          270.0
       [0, 360]       6.283          360.0
 [90, 180, 270]       3.142          180.0

I have moved a substantial amount of my materials to a new place... the NumPy Repository ... which will open up soon.

The move is to place issues relating to numpy, scipy, matplotlib and arcpy to a more cohesive location.   I will continue to post less targeted materials here if the need arises.  So if some of your links are dead to the previous materials, they will be found on the new site.