Numpy Snippets... # 1... array basics and calculating values for arrays

DanPatterson_Retired · ‎09-09-2016

Numpy_Snippets

Updated: 2016-09-09

Previous snippets:

None Jan 5, 2015

Documentation Jan 30

Edits May 5

Documentation

NumPy Reference — NumPy v1.9 Manual

Tentative NumPy Tutorial -

for numpy python packages

http://www.lfd.uci.edu/~gohlke/pythonlibs/

other links

http://rintintin.colorado.edu/~wajo8931/docs/jochem_aag2011.pdf

-------------------------------------------------------------------------------------------------

As a companion to the Numpy Lessons series, I have posted within my blog, I have decided to maintain a series of snippets that don't comfortably fit into a coherent lesson. They, like the lessons, will be sequentially numbered with links to the previous ones kept in the top section. Contributions and/or corrections.

All samples assume that the following imports are made. Other required imports will be noted when necessary.

# default imports used in all examples whether they are or not
import numpy as np
import arcpy
‍‍‍‍‍‍‍

This is a bit of a hodge-podge, but the end result is produce running means for a data set over a 10-year time period.
Simple array creation is shown using two methods, as well as how to convert array contents to specific data types.

>>> year_data = np.arange(2005,2015,dtype='int')   # 10 years worth of records from 2005 up to, but not 2015
>>> some_data = np.arange(0,10,dtype='float')      # some numbers...sequential and floating point in this case
>>> result = np.zeros(shape=(10,),dtype='float')   # create an array of 0's with 10 records
>>> result.fill(-999)                              # provide a null value and fill the zero's with null values
>>> result_list = zip(year_data,some_data,result)  # zip the 3 arrays together
>>> 
>>> dt = np.dtype([('year','int'), ('Some_Data', 'float'),('Mean_5year',np.float64)]) # combined array type
>>> result_array = np.array(result_list,dtype=dt)  # produce the final array with the desired data type
>>> result_array
array([(2005, 0.0, -999.0), (2006, 1.0, -999.0), (2007, 2.0, -999.0),
       (2008, 3.0, -999.0), (2009, 4.0, -999.0), (2010, 5.0, -999.0),
       (2011, 6.0, -999.0), (2012, 7.0, -999.0), (2013, 8.0, -999.0),
       (2014, 9.0, -999.0)], 
      dtype=[('year', '<i4'), ('Some_Data', '<f8'), ('Mean_5year', '<f8')])
>>>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The result_array now consists of a three columns, which can be accessed by names using array slicing.

>>> result_array['year']                           # slicing the year, data and result column values
array([2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014])
>>> result_array['Some_Data']
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
>>> result_array['Mean_5year']
array([-999., -999., -999., -999., -999., -999., -999., -999., -999., -999.])
>>>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

If this array, an ndarray, is converted to a recarray, field access can also be achieved using 'array.field' notation.

>>> result_v2 = (result_array.view(np.recarray))   # convert it to a recarray to permit 'array.field access'
>>> 
>>> result_v2.year
array([2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014])
>>> result_v2.Some_Data
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
>>> result_v2.Mean_5year
array([-999., -999., -999., -999., -999., -999., -999., -999., -999., -999.])
>>>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The remainder of the demonstration basically shows some of the things that can be done with ndarrays and recarrays. As an example, the 5-year running mean will be calculated and the Mean_5year column's null values replaced with valid data. The 'np.convolve' method will be used to determine the running means for no other reason than I hadn't used it before. Since the input data are sequential numbers from 0 to 9, it will be pretty easy to do the mental math to figure out whether the running mean is indeed correct. The steps entail:

decide upon the mean step to use (eg. N=5),
run the convolve method on the 'Some_Data' column in the result_v2 recarray,
pad the resultant array so that the sizes of the running mean calculation array and the column array are equal.

Here it goes...

>>> N = 5                                          # five year running mean step, see the help on convolve
>>> rm = np.convolve(result_v2['Some_Data'],np.ones((N,))/N, mode='valid')  # a mouth-full
>>> rm                                             # however, there are only values for the mid-point year
array([ 2.,  3.,  4.,  5.,  6.,  7.])              # so we need to pad by 2 on either end of the output
>>> 
>>> pad_by = N/2                                   # integer division...this has change in python 3.x
>>>
>>> new_vals = np.pad(rm,pad_by,mode='constant',constant_values=-999)   # padding the result to new_vals
>>> new_vals
array([-999., -999.,    2.,    3.,    4.,    5.,    6.,    7., -999., -999.])
>>>
>>> result_v2.Mean_5year = new_vals                # set the new_vals into the correct column
>>> 
>>> result_v2                                      # voila
rec.array([(2005, 0.0, -999.0), (2006, 1.0, -999.0), (2007, 2.0, 2.0),
       (2008, 3.0, 3.0), (2009, 4.0, 4.0), (2010, 5.0, 5.000000000000001),
       (2011, 6.0, 6.0), (2012, 7.0, 7.000000000000001),
       (2013, 8.0, -999.0), (2014, 9.0, -999.0)], 
      dtype=[('year', '<i4'), ('Some_Data', '<f8'), ('Mean_5year', '<f8')])
>>> 
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

A bit messy with that floating point representation thing appearing for a few numbers....let's clean it up by changing the dtype to limit the number of decimal points in the array showing up in the 'Mean_5year column. This will be done incrementally.

>>> x = np.round(result_v2.Mean_5year,decimals=2)
>>> result_v2.Mean_5year = x
>>> result_v2
rec.array([(2005, 0.0, -999.0), (2006, 1.0, -999.0), (2007, 2.0, 2.0),
       (2008, 3.0, 3.0), (2009, 4.0, 4.0), (2010, 5.0, 5.0),
       (2011, 6.0, 6.0), (2012, 7.0, 7.0), (2013, 8.0, -999.0),
       (2014, 9.0, -999.0)], 
      dtype=[('year', '<i4'), ('Some_Data', '<f8'), ('Mean_5year', '<f8')])
>>>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

So these snippets have shown some of the things that can be done with arrays and the subtle but important distinctions between numpy's array, ndarray and recarray forms.