Skip navigation
All People > Dan_Patterson > Py... blog > 2016 > February
2016
Dan_Patterson

List comprehensions... 2

Posted by Dan_Patterson Champion Feb 25, 2016

The other blog on list comprehensions (or comprehensions more generally) is getting to big to edit and maintain easily, so consider this part 2.  Part I can be found at List comprehensions...

 

Standard field calculator function

Code block

def func(a,b):
    if (a == -99):
        if (b <> -99):
            ans = -97
        else:
            ans = -99
    else:
        if (b == -99):
            ans = -98
        else:
            ans = float(a) - b
    return ans

Expression box    func(( !Fld_0!, !Fld_1!))

 

Readings:

Review:

  • [ list comprehension ]
    • lc = [ x for x in 'abcd']
    • ['a', 'b', 'c', 'd']
  • ( generator)
    • gen = ( x for x in list('abcd') )
    • >>> gen.next()
    • 'a'
  • { set comprehensison }
    • sc = { x for x in 'abcdaabbccdd}
    • set(['a', 'c', 'b', 'd'])
  • { dictionary comprehension }
    • dc ={ x: x*2 for x in 'abcd'}
    • {'a': 'aa', 'c': 'cc', 'b': 'bb', 'd': 'dd'}

 

As a list comprehension expression you still have to use a code block since the field calculator doesn't support iterable objects etc.  You can also use a generator like in the second example.

Since a list comprehension returns a list, you need to append a slice notation ...  [0] ... to indicate that the first element in the list is returned.   Obviously, this isn't an issue since the data are process one row at a time and there will only be one value returned.  You can use a generator expression if you wish to skip the slicing notation.  Remember, square brackets for list comprehensions, round brackets for a generator  and curly brackets for set and dictionary comprehensions.

Now this thread Script to return multiple missing values based on other criteria  posed the interesting question of what to do when you have been consciensious and used multiple null values to represent several conditions of null.  In the example, -99, -98 and -97 were used.  The desire was to provide a reclassification of the nulls whilst doing "something" with the remaining values in the two fields that needed to be examined.  In the code examples above, it was decided to retain the null value ranges while performing a simple subtraction for non-null values.  The results are shown in the figure.

As a cautionary note, this example only works IF the returned values of the subtraction does not return values in the range -99 to -97.  Solution?? instead of using a base conversion of -98, I could have easily substituted -9998 (or something similar) which would have retained the intent of the nulls yet permitted the subtraction to proceed.

I should be obvious that all you need to do is find a base value for the null should be obvious that all you need to do is find a base value for the null.

 

 

List comprehension code blocks

Code block

def LC_func(a,b): 
   return [(a+abs(a-b)) if ((a-b)<=1) else min(a,(a-b))][0]
   

Expression box     LC_func( !Fld_0!, !Fld_1!)

OR

Code block 

def LC_func2(a,b): 
  return ( (a+abs(a-b)) if ((a-b)<=1) else min(a,(a-b)) )

Expression box      LC_func2( !Fld_0!, !Fld_1!)

 

Tabular output
LC_demo.png
Dan_Patterson

Where did that file go?

Posted by Dan_Patterson Champion Feb 13, 2016

I know ... I know... you can do that in Window's Explorer ...

find stuff ... sort stuff ... check modification date ...

but nothing is nicer than having a hardcopy in your hand to document where things are at a point in time.

 

To this end, I began poking around in the os module and the arcpy.da module looking at the options for traversing paths.  This lead me to some amazing threads on GeoNet and elsewhere about documenting data sources and types.  Many people went to great lengths to develop the Amazing-o-rama that would document everything ... everywhere and at any time.

 

Me?  I know where all my data are.

My needs are simple...I have a couple of folders of shapefiles, a couple of folders of grids, some *.tif and *.sid things and more recently...from cajoling by some nameless cohorts on this forum...a real file geodatabase.

 

I generally don't make maps, I generally don't need real world data and when it comes to most things, a geometry type is a geometry type, maybe with its own personality, but the same-ish is good enough.

What I really need to do is find simple things. Python scripts...text files...*.png images for manuals, *.wpd (yes WordPerfect) and the odd *.doc/docx file.

 

This led me to the distraction ... simply explore the os module, specifically os.walk which of course led me to arcpy.da.walk and somewhere along the path...the time and datetime modules.  I got sidetracked into Python's Mini Formatting Language, the collections module (again!) and base python object properties and how some properties are not as overtly exposed as others.  The later one, will be the focus of my next post.

 

So for now...here is my documentation and the script that meets my simple needs.

In the if __name__ == '__main__':  section, you can change the source folder, the file extension you are looking for.  All other 'crap'...whether you want to get creation/modification/some-other-time option have been stripped back.

I have written in a style as noted in the script header.  I could turn it into a toolbox tool...but why...I have emailed it to myself...could use "the cloud", but that sounds pretentious and downright '60's.

 

"""
Script:    files_in_path.py
Path:      F:\A0_Main\
Author:    Dan.Patterson@carleton.ca
Created:   2015-05-25
Modified:  2015-05-25  (last change date)
Purpose:   To find all scripts with an 'ending' in a path
Notes:
  As verbose as possible. No error checking is intentional, to minimize
  bloat to permit verbosity (If you follow that...good).
References:
Size .......
  PEP 8: "For sequences, (strings, lists, tuples), use the fact that empty
          sequences are false."
- this applies to any sequence that has a len, __len__ method, such as
  collections.Counter, collections.OrderedDict etc
  - see empty_tests.py for more discussion
  >>> objs = [ [],(),{},"",[1],(1),{1:"one"},"1",None,True,1,False,0]
  >>> is_empty = [ not i for i in objs ]  # not used, but useful
  >>> print is_empty .....
  -
 - for NumPy arrays use size:
     x = np.array([[],[]]) ==> len(x) == 2, x.size == 0
Time ......
http://stackoverflow.com/questions/237079/
http://gis.stackexchange.com/questions/48537/how-to-make-a-gis-inventory
"""
import sys
import os
import datetime
def mod_time(folder,case):
    """obtain the modified time for the file and path
       get the file name, the modification time,
       convert to readable time, then format
    """
    f = os.path.join(folder,case)            
    t =os.path.getmtime(f)                    
    out = datetime.datetime.fromtimestamp(t)
    out = '{:%Y_%m_%d %H:%M:%S}'.format(out)
    return out
def os_walk(src_folder,ending="py"):
    """folder, file type ending, return creation time... returns a message
       Now walk the walk, getting the files that have the specified ending
       (ie. *.py) in the folder and its subfolders.
    """
    msg = ("\n.....File path: ..... {}".format(src_folder))
    for folder, subfolders, files in os.walk(src_folder):
        case_files = [ f for f in files if f.endswith(ending)]
        if case_files:
            counter = 0
            msg += ("\nFolder:... {}".format(folder))
            for case in case_files:
                counter+=1
                t = mod_time(folder,case)
                msg += ("\n ({:02d}) {: <25s} {}".format(counter,case,t))
            del case,case_files
    return msg
#----------------------------------------------------------
if __name__ == '__main__':
    """change the folder, ending etc below """
    src_folder =  r"F:\Writing_Projects" #r"F:\\"
    ending = "py"
    get_time = True
    #
    msg = os_walk(src_folder,ending=ending)
    print msg

 

Oh yes... I forgot... highlight...copy...paste output into Notepad++, or your wordprocessor and print/save from there.  I could write the code, but someone has to have homework.

Enjoy

Inspiration... Re: Calculate field conditional statement using a date formula

Background...

List comprehensions...

List comprehensions... 2

Bisect method

 

>>> dts = [1,28,29,91,92,182,183,365,366] 
>>> bins = [28,91,182,365,400]
>>> import bisect
>>> a = [ bisect.bisect_left(bins,i) + 1  for i in dts]
result
a = [1, 1, 2, 2, 3, 3, 4, 4, 5]

Numpy method

 

>>> dts = [1,28,29,91,92,182,183,365,366] 
>>> bins = [28,91,182,365,400]
>>> import numpy as np
>>> b = np.digitize(dts, bins, right=True) + 1
result
list(b) = [1, 1, 2, 2, 3, 3, 4, 4, 5]

 

It really bugs me that doing reclassification is such a pain whether you are using plain python or other software.  This situation is only amplified if you are trying to do this in the field calculator.

 

Things can be made much easier using either numpy or the bisect module. .. whether it be text or numbers.  The table to the right shows how to do this with the same data set using the bisect and the numpy modules.

 

The intent in the cited thread was to reclassify some values to an ordinal/intervale scale depending upon threshold values.  This of course normally leads to a multi-tiered if statement.  A simple list comprehension isn't much either due to the nature of the classification.  To expedite matters, some method of parsing the data into an appropriate class is needed.

 

The bisect module and its methods, allow for this, should you be able to supply it with a sequential classification scheme.  The data can be fed into the method within a list comprehension to return the values for a list of data all at once.  If this method is to be used in the field calculator, a generator expression needs to be used since the field calculator and cursors in general, read their values sequentially from rows in a table.

 

Reclass methods... numpy and bisect module

Numpy reclass method

np_reclass0.png

Numpy results

np_reclass.png

bisect_reclass.png

The results and the code are shown to the right.

 

 

Oh yes... forgot about text...here are two examples, one from text to numbers and the second from text to text (apologies to dog owners)

 

Homework
bisect_reclass1.png
bisect_reclass2.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Should you have any other unique situations, please forward them on to me.

Dan_Patterson

List comprehensions...

Posted by Dan_Patterson Champion Feb 3, 2016

List comprehensions (LCs)    Note:  Resurrected from another one of my places.

  • Do you wince when you use map and filter because you can't figure out LCs?
  • Do you rail against modernity and stick with for and if 's?
  • Do you always have to have that smoking one-liner?  Are you ashamed when you can't get it?
  • Did you know that you can put comments in LCs ?
  • Did you know that LCs can be split on to separate lines?  Is that sacrilege?
  • Which do you like better ... form (a), (b) or (c) ?

Basic LC forms

Traditional for loop

c = []
for i in range(20):
    if (i > 5) and (i < 10):
        c.append(i**2)

 

The smokin' cryptic one-liner

a = [i**2 for i in range(20) if i > 5 and i < 10]

 

The stacked, commented multi-liner

b = [i**2               # do this
     for i in range(20) # using these
     if (i > 5) and     # where this and
        (i < 10)        # this is good
     ]

 

 

Condition checking with numbers and conditions

Again, the three methods, but this time the condition checking is done upfront with an if-else clause.  This query is simply checking to see whether elements in a list are all numbers.  If True, return the number, if False, assign -999.

 

  nums = [0,1,None,3,None,5,6]   # the list of numbers to check

 

conventional approach

>>> good = []
>>> for val in nums:
...  if isinstance(val, (int,float)):
...   good.append(val)
...  else:
...   good.append(-999)
...

 

basic list comprehension

>>> good = [ val if isinstance(val,(int,float)) else -999 for val in nums]
>>> 

 

stacked list comprehension

>>> good = [
... val         # return the value
... if isinstance(val,(int,float))   # if this is true
... else -999   # return this otherwise
... for val in nums  # for each value in the list
... ]

 

in all cases

>>> good
[0, 1, -999, 3, -999, 5, 6]

 

 

More on condition checking

Some times you want to perform an operation given certain conditions.  In the example below, a check is made to ensure that "b" is not zero to avoid division by zero.  If it is zero, then an alternate value is supplied.

 

There are two variants of the LC shown and the results are compared as a final step...examine closely.

outer = [1,2]
inner = [2,0,4]
c = [[a, b, a*b, a*b/1.0]  # divide 2 numbers (outer/inner)
     if b                  # if != 0 (0 is a boolean False)
     else [a,b,a*b,"N/A"]  # if equal to zero, do this
     for a in outer        # for each value in the outer list
     for b in inner        # for each value in the inner list
     ]
for val in c:
    # val[0],val[1],val[2]))
    print("a({}), b({}), a*b({}) a/b({})".format(*val )) 

d = [[[a,b,a*b,"N/A"],           # do if False
      [a,b,a*b,a*b/1.0]][b!=0]   # do if True ... then slice
     for a in outer
     for b in inner
     ]
print("d == c??? {}".format(d==c))

a(1), b(2), a*b(2) a/b(2.0) 
a(1), b(0), a*b(0) a/b(N/A) 
a(1), b(4), a*b(4) a/b(4.0) 
a(2), b(2), a*b(4) a/b(4.0) 
a(2), b(0), a*b(0) a/b(N/A) 
a(2), b(4), a*b(8) a/b(8.0) 

d == c??? True

 

The examples shown here are just the beginning.  The reader should be aware that there are dictionary and set comprehensions as well.  This topic will also serve as a gentle introduction to generators.

 

If you would like to see other examples, please send me a note.

 

Additions

What is the difference between a list comprehension and a set comprehension.  Basically, a process step is the only thing sometimes.  As seen below, the set_comp results are completed during the query, whereas the list_comp results in line 04 still need to have a set() operation applied after the fact, should a set truly be required.

 

>>> a = [1,2,1,1,2,3,3,2,1,1,1,2,3,4,9,8,1,7,7,7,7,7]
>>> list_comp = [ i for i in a if i >3 ]
>>> set_comp = { i for i in a if i > 3 }
>>> list_comp
[4, 9, 8, 7, 7, 7, 7, 7]
>>> set_comp
set([8, 9, 4, 7])

 

>>> a = [5,6,5,6,5,6,5,6,5,6,4,4] 
>>> b = [5,6] 
>>> lc = [ x*y for x in a for y in b] 
>>> sc = { x*y for x in a for y in b } 
>>> lc [25, 30, 30, 36, 25, 30, 30, 36, 25, 30, 30, 
        36, 25, 30, 30, 36, 25, 30, 30, 36, 20, 24, 20, 24] 
>>> sc {24, 25, 36, 20, 30} 
>>>

---- Topic: Query and summarize revisited ----

This post problem "summarize" tool brought up the sometimes difficult task of finding information within tabular data.  As usual, there are many ways to do the same thing... this is but one.

 

Pre-tasks: Use TableToNumPyArray to convert your tabular information into an array.  It is pretty fool-proof

 

TableToNumPyArray—Help | ArcGIS for Desktop

 

The trick to remember is that each record in an array is treated as unique unless found otherwise.  This makes it particularly easy to sort records on multiple columns and summarize and/or extract what you need from within.  In this example, I concentrated on printing.

 

I used a simple list comprehension to show how to pull the records out according to the first column which we can treat as a class field.  Once the classes were determined the list comprehension grouped the data into the classes and each class could have further information extracted/summarized.

 

 

The output produced can be expanded upon.  Should want to partition the data into separate datasets, you can do it while it is into array form. Hope you get some ideas on breaking down problems into its constituent parts and using the tools available to you.  For ArcGIS Pro, the new python suite, contains Pandas which provides an other alternate to the same process.

 

I am also sure that someone can come up with an sql statement that does some...but not all of the tasks outlined here.

 

Summarizing data

 

>>> # ---- simplify a step or two ----
>>> # - The data... just in array format
>>>
>>> arr_data = [('a', 50, 4), ('c', 20, 1),
                ('a', 15, 5), ('e', 40, 4),
                ('a', 35, 2),('b', 100, 5),
                ('c', 80, 3), ('d', 100, 3), ('e', 60, 2)]
>>> dt =[('col_0', np.str, 5),
         ('col_1','<i4'),
         ('col_2','<i4')]
>>> a = np.array(arr_data, dtype=dt)
>>> a.reshape((-1,1))
array([[('a', 50, 4)],
       [('c', 20, 1)],
       [('a', 15, 5)],
       [('e', 40, 4)],
       [('a', 35, 2)],
       [('b', 100, 5)],
       [('c', 80, 3)],
       [('d', 100, 3)],
       [('e', 60, 2)]],
      dtype=[('col_0', '<U5'),... ])
>>>
>>> uni_info = np.unique(a, True, True)
>>> vals, idx, inv = uni_info 
>>> # vals is the data,
>>> # the others are for reshaping the array   

 

Now for some output

>>> vals.reshape((-1,1))
array([[('a', 15, 5)],
       [('a', 35, 2)],
       [('a', 50, 4)],
       [('b', 100, 5)],
       [('c', 20, 1)],
       [('c', 80, 3)],
       [('d', 100, 3)],
       [('e', 40, 4)],
       [('e', 60, 2)]],
      dtype=[('col_0', ... ])
>>> # returns array(['a', 'b', 'c', 'd', 'e'],dtype='<U5')
>>> uni = np.unique(vals['col_0'])     
>>> subs = [ vals[vals['col_0']==i] for i in uni ]
>>>
>>> for sub in subs:
...  n = len(sub['col_0'])
...  t = sub['col_0'][0]
...  val_max = np.max(sub['col_1'])
...  val_min = np.min(sub['col_1'])
...  frmt = "type {} -N: {} -max: {} -min: {}\n  -sub {}"
...  print(frmt.format(t, n, val_max, val_min, sub))
... 
type a -N: 3 -max: 50 -min: 15
  -sub [('a', 15, 5) ('a', 35, 2) ('a', 50, 4)]
type b -N: 1 -max: 100 -min: 100
  -sub [('b', 100, 5)]
type c -N: 2 -max: 80 -min: 20
  -sub [('c', 20, 1) ('c', 80, 3)]
type d -N: 1 -max: 100 -min: 100
  -sub [('d', 100, 3)]
type e -N: 2 -max: 60 -min: 40
  -sub [('e', 40, 4) ('e', 60, 2)]
>>>

NOTE:

A pdf version was added.  It contains more commentary on the process.  A second example was added on 2015-02-18