Numpy Snippets... # 1... array basics and calculating values for arrays

1668
0
09-09-2016 08:47 AM
Labels (1)
DanPatterson_Retired
MVP Emeritus
2 0 1,668

Numpy_Snippets

Updated: 2016-09-09

Previous snippets:

None                   Jan 5, 2015

Documentation   Jan 30

Edits                    May 5

Documentation

NumPy Reference — NumPy v1.9 Manual

Tentative NumPy Tutorial -

for numpy python packages

http://www.lfd.uci.edu/~gohlke/pythonlibs/

other links

http://rintintin.colorado.edu/~wajo8931/docs/jochem_aag2011.pdf

-------------------------------------------------------------------------------------------------

As a companion to the Numpy Lessons series, I have posted within my blog, I have decided to maintain a series of snippets that don't comfortably fit into a coherent lesson.  They, like the lessons, will be sequentially numbered with links to the previous ones kept in the top section.  Contributions and/or corrections.

All samples assume that the following imports are made.  Other required imports will be noted when necessary.

# default imports used in all examples whether they are or not
import numpy as np
import arcpy
‍‍‍

This is a bit of a hodge-podge, but the end result is produce running means for a data set over a 10-year time period.
Simple array creation is shown using two methods, as well as how to convert array contents to specific data types.

>>> year_data = np.arange(2005,2015,dtype='int')   # 10 years worth of records from 2005 up to, but not 2015
>>> some_data = np.arange(0,10,dtype='float')      # some numbers...sequential and floating point in this case
>>> result = np.zeros(shape=(10,),dtype='float')   # create an array of 0's with 10 records
>>> result.fill(-999)                              # provide a null value and fill the zero's with null values
>>> result_list = zip(year_data,some_data,result)  # zip the 3 arrays together
>>> 
>>> dt = np.dtype([('year','int'), ('Some_Data', 'float'),('Mean_5year',np.float64)]) # combined array type
>>> result_array = np.array(result_list,dtype=dt)  # produce the final array with the desired data type
>>> result_array
array([(2005, 0.0, -999.0), (2006, 1.0, -999.0), (2007, 2.0, -999.0),
       (2008, 3.0, -999.0), (2009, 4.0, -999.0), (2010, 5.0, -999.0),
       (2011, 6.0, -999.0), (2012, 7.0, -999.0), (2013, 8.0, -999.0),
       (2014, 9.0, -999.0)], 
      dtype=[('year', '<i4'), ('Some_Data', '<f8'), ('Mean_5year', '<f8')])
>>>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The result_array now consists of a three columns, which can be accessed by names using array slicing.

>>> result_array['year']                           # slicing the year, data and result column values
array([2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014])
>>> result_array['Some_Data']
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
>>> result_array['Mean_5year']
array([-999., -999., -999., -999., -999., -999., -999., -999., -999., -999.])
>>>
‍‍‍‍‍‍‍

If this array, an ndarray, is converted to a recarray, field access can also be achieved using 'array.field' notation.

>>> result_v2 = (result_array.view(np.recarray))   # convert it to a recarray to permit 'array.field access'
>>> 
>>> result_v2.year
array([2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014])
>>> result_v2.Some_Data
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
>>> result_v2.Mean_5year
array([-999., -999., -999., -999., -999., -999., -999., -999., -999., -999.])
>>>
‍‍‍‍‍‍‍‍‍

The remainder of the demonstration basically shows some of the things that can be done with ndarrays and recarrays.  As an example, the 5-year running mean will be calculated and the Mean_5year column's null values replaced with valid data.  The 'np.convolve' method will be used to determine the running means for no other reason than I hadn't used it before.  Since the input data are sequential numbers from 0 to 9, it will be pretty easy to do the mental math to figure out whether the running mean is indeed correct.  The steps entail:

  1. decide upon the mean step to use (eg. N=5),
  2. run the convolve method on the 'Some_Data' column in the result_v2 recarray,
  3. pad the resultant array so that the sizes of the running mean calculation array and the column array are equal.

Here it goes...

>>> N = 5                                          # five year running mean step, see the help on convolve
>>> rm = np.convolve(result_v2['Some_Data'],np.ones((N,))/N, mode='valid')  # a mouth-full
>>> rm                                             # however, there are only values for the mid-point year
array([ 2.,  3.,  4.,  5.,  6.,  7.])              # so we need to pad by 2 on either end of the output
>>> 
>>> pad_by = N/2                                   # integer division...this has change in python 3.x
>>>
>>> new_vals = np.pad(rm,pad_by,mode='constant',constant_values=-999)   # padding the result to new_vals
>>> new_vals
array([-999., -999.,    2.,    3.,    4.,    5.,    6.,    7., -999., -999.])
>>>
>>> result_v2.Mean_5year = new_vals                # set the new_vals into the correct column
>>> 
>>> result_v2                                      # voila
rec.array([(2005, 0.0, -999.0), (2006, 1.0, -999.0), (2007, 2.0, 2.0),
       (2008, 3.0, 3.0), (2009, 4.0, 4.0), (2010, 5.0, 5.000000000000001),
       (2011, 6.0, 6.0), (2012, 7.0, 7.000000000000001),
       (2013, 8.0, -999.0), (2014, 9.0, -999.0)], 
      dtype=[('year', '<i4'), ('Some_Data', '<f8'), ('Mean_5year', '<f8')])
>>> 
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍


A bit messy with that floating point representation thing appearing for a few numbers....let's clean it up by changing the dtype to limit the number of decimal points in the array showing up in the 'Mean_5year column.  This will be done incrementally.

>>> x = np.round(result_v2.Mean_5year,decimals=2)
>>> result_v2.Mean_5year = x
>>> result_v2
rec.array([(2005, 0.0, -999.0), (2006, 1.0, -999.0), (2007, 2.0, 2.0),
       (2008, 3.0, 3.0), (2009, 4.0, 4.0), (2010, 5.0, 5.0),
       (2011, 6.0, 6.0), (2012, 7.0, 7.0), (2013, 8.0, -999.0),
       (2014, 9.0, -999.0)], 
      dtype=[('year', '<i4'), ('Some_Data', '<f8'), ('Mean_5year', '<f8')])
>>>
‍‍‍‍‍‍‍‍‍

So these snippets have shown some of the things that can be done with arrays and the subtle but important distinctions between numpy's array, ndarray and recarray forms.

About the Author
Retired Geomatics Instructor at Carleton University. I am a forum MVP and Moderator. Current interests focus on python-based integration in GIS. See... Py... blog, my GeoNet blog...
Labels