Using Python Dictionaries with Update Cursors instead of "Join and Calculate"

11168
8
11-16-2012 04:09 AM
by Anonymous User
Not applicable
Original User: Caleb1987

Like many others, I have become frustrated with the amount of time it takes to use the "Add Join" tool and "Calculate field" tool to get values from one table into another while working with Python and ArcMap in general. To increase the performance, I started playing around with dictionaries and saw that many other people have done this as well to create a dictionary to use with an update cursor.  By using this method, I do not have to do the "join and calculate" method to copy records from one table into another. The dictionary method is a very efficient way to update records (since update cursors cannot work with joined tables) and performs this operation about 100x times faster.

I decided to write a couple functions that will use the dictionary and update cursor method to update tables since it is a LOT faster. I wrote one function that will create a dictionary and update one field in an attribute table based on a dictionary created from another table somewhere else on disk.  This works for updating only one field (AttributeUpdate).

Next I created another function that will accept a list of fields to update in the source table, and a list of value fields from another table (join table) that will be used to update the fields in the source table. It creates a new dictionary for each join table field and updates all input fields in the source table. The only catch with this is there has to be the same amount of update fields and value fields from both tables (MultipleAttributeUpdate).

Anyways, these two functions work great and I am attaching the script that contains them to this post in case anyone is interested. Also attached is the tool help documentation I hastily put together. They have already saved me a ton of time. If anyone wants an idea of how fast the new tools work,  I have a script where I add 4 fields to a table and do a temporary join of that table with another and do 4 separate field calculations on a table to copy those records with about 20,000 records (the process that used to take about 8 minutes).  The process has now been reduced to only 33 seconds with these new tools I created!  Quite a performance enhancement!  They are attached to this email if anyone is interested.

One more note, there are two versions, one for Arc 10.0 and one for 10.1. The only difference is that the 10.1 version uses the da cursors and is faster than the 10.0 version for that reason. Enjoy!

I have re-posted this since I put some of the wrong files in the folders on the previous post.
8 Replies
RDHarles
Occasional Contributor
Interesting.  I'm doing more web development stuff now and not as much python, but this has got me thinking about how I can make some of my older processes much faster.  Nice job on the documentation, will give this code and/or this technique a try in the near future.
0 Kudos
by Anonymous User
Not applicable
Original User: apodidae

This is great!  Thank you for making it available.  I had a little trouble at first, calling it from a loop, but I think it was a schema lock which was the prob.

Thanks again.
0 Kudos
JohnHansen
New Contributor III
:cool:Awesome! I am updating 15 fields for about 880,000 rows. What was taking well over an hour using "Join and Calcuate" now takes 14 minutes. Caleb - if you will be at the UC this year (2014), I would like to buy you a cold one.
0 Kudos
by Anonymous User
Not applicable
Original User: bruce_harold

Hi

Please see also:  http://www.arcgis.com/home/item.html?id=da1540fb59d84b7cb02627856f65a98d

Regards
0 Kudos
by Anonymous User
Not applicable
Wow, I haven't looked at this thread in a long time.  I see the code samples I posted have been downloaded many times.  I wrote that a long time ago and have updated it way too many times since...In case anyone is interested, here is the code for my latest version which should handle both 10.0 and 10.x versions of ArcGIS:

import arcpy, os, sys

# test version for cursor type (data access module available @ 10.1 +)
ver = arcpy.GetInstallInfo()['Version']
dataAccess = False
if ver != '10.0':
    dataAccess = True
        
def Message(msg):
    print str(msg)
    arcpy.AddMessage(str(msg))
    return

def create_field_name(fc, new_field):
    '''Return a valid field name that does not exist in fc and
    that is based on new_field.

    fc: feature class, feature layer, table, or table view
    new_field: new field name, will be altered if field already exists

    Example:
    >>> fc = 'c:\\testing.gdb\\ne_110m_admin_0_countries'
    >>> createFieldName(fc, 'NEWCOL') # NEWCOL
    >>> createFieldName(fc, 'Shape') # Shape_1
    '''

    # if fc is a table view or a feature layer, some fields may be hidden;
    # grab the data source to make sure all columns are examined
    fc = arcpy.Describe(fc).catalogPath
    new_field = arcpy.ValidateFieldName(new_field, fc)

    # maximum length of the new field name
    maxlen = 64
    dtype = arcpy.Describe(fc).dataType
    if dtype.lower() in ('dbasetable', 'shapefile'):
        maxlen = 10

    # field list
    fields = [f.name.lower() for f in arcpy.ListFields(fc)]

    # see if field already exists
    if new_field.lower() in fields:
        count = 1
        while new_field.lower() in fields:

            if count > 1000:
                raise bmiError('Maximum number of iterations reached in uniqueFieldName.')

            if len(new_field) > maxlen:
                ind = maxlen - (1 + len(str(count)))
                new_field = '{0}_{1}'.format(new_field[:ind], count)
                count += 1
            else:
                new_field = '{0}_{1}'.format(new_field, count)
                count += 1

    return new_field
      
def AttributeUpdate(source_table, in_field, update_fields, join_table, join_key, join_values):
    
   
    # Check input parameters
    if not arcpy.Exists(source_table):
        print '"%s" not found!\nPlease verify that full path of table exists' %source_table
        sys.exit()
    if not arcpy.Exists(join_table):
        print '"%s" not found!\nPlease verify that full path of table exists' %join_table
        sys.exit()
    if in_field not in [f.name for f in arcpy.ListFields(source_table)]:
        print "'%s' not found in \"%s\""  %(in_field,os.path.basename(source_table))
        sys.exit()
    if join_key not in [f.name for f in arcpy.ListFields(join_table)]:
        print "'%s' not found in \"%s\"" %(join_key,os.path.basename(join_table))
        sys.exit()
    for fld in update_fields:
        if fld not in [f.name for f in arcpy.ListFields(source_table)]:
            print "'%s' not found in \"%s\"" %(fld,os.path.basename(source_table))
            print 'Please verify that field names match in "%s"' %os.path.basename(source_table)
            sys.exit()
    for fldb in join_values:
        if fldb not in [f.name for f in arcpy.ListFields(join_table)]:
            print "'%s' not found in \"%s\"" %(fldb,os.path.basename(join_table))
            print 'Please verify that field names match in "%s"' %os.path.basename(join_table)
            sys.exit()
    if not isinstance(update_fields, list):
        update_fields = list(update_fields)
    if not isinstance(join_values, list):
        join_values = list(join_values)


    # Make sure there is matching number of join and update fields
    if len(update_fields) == len(join_values):
        up_flds_list = [f for f in arcpy.ListFields(source_table) if f.name in update_fields]
        up_flds_dict = dict([(f.name,f.type) for f in up_flds_list])
        jn_flds_list = [f for f in arcpy.ListFields(join_table) if f.name in join_values]
        jn_flds_dict = dict([(f.name,f.type) for f in jn_flds_list])
        types_dict = {'String': str, 'Single':float, 'Double':float, 'Integer':long, 'SmallInteger':int}
        path_dict = {}
        
        if dataAccess:
            
            # Create Dictionary
            join_values.insert(0, join_key)
            with arcpy.da.SearchCursor(join_table, join_values) as srows:
                for srow in srows:
                    path_dict[srow[0]] = tuple(srow for i in range(len(join_values[1:])))
            
            # Update Cursor
            update_index = list(range(len(update_fields)))
            row_index = list(x+1 for x in update_index)
            update_fields.insert(0, in_field)
            with arcpy.da.UpdateCursor(source_table, update_fields) as urows:
                for row in urows:
                    if row[0] in path_dict:
                        allVals =[path_dict[row[0]] for i in update_index]
                        for r,v in zip(row_index, allVals):
                            row = v
                            urows.updateRow(row)
                        
            print 'Fields in "%s" updated successfully' %(os.path.basename(source_table))
            
        else:
            # version 10.0
            srows = arcpy.SearchCursor(join_table)
            for row in srows:
                path_dict[row.getValue(join_key)] = tuple(row.getValue(i) for i in join_values)
            del srows
            
            # Update Cursor
            rows = arcpy.UpdateCursor(source_table)
            for row in rows:
                theVal = row.getValue(in_field)
                if theVal in path_dict:
                    for i in range(len(update_fields)):
                        try:
                            theType = types_dict[jn_flds_dict[join_fields]]
                            row.setValue(update_fields,theType(path_dict[theVal]))
                        except:
                            row.setValue(update_fields, path_dict[theVal])
                    rows.updateRow(row)
                   
            del rows
    else:
        print 'ERROR:  Number of update fields and value fields does not match'
    return
    

def CopyFields(source_table, in_field, join_table, join_key, join_values=[]):
    '''
    Copies field(s) from one table to another

    source_table: table in which to add new fields
    in_field: a field that has common values to a field in the join_table.
              think of this as a "join_field"
    join_table: table with fields to add to source_table
    join_key: field to match values of the "in_field"
    join_values: fields to add to source_table (list)
    '''

    # Get Catalog path (for feature layers and table views)
    cat_path = arcpy.Describe(source_table).catalogPath
        
    # Find out if source table is NULLABLE                
    if not os.path.splitext(cat_path)[1] in ['.dbf','.shp']:
        nullable = 'NULLABLE'
    else:
        nullable = 'NON_NULLABLE'
   
    # Add fields to be copied
    update_fields = []
    join_list = arcpy.ListFields(join_table)
    for field in join_list:
        ftype = field.type
        name = field.name
        length = field.length
        pres = field.precision
        scale = field.scale
        alias = field.aliasName
        domain = field.domain
        for fldb in join_values:
            if fldb == name:
                name = create_field_name(source_table, fldb)
                arcpy.AddField_management(source_table,name,ftype,pres,scale,length,alias,nullable,'',domain)
                Message("Added '%s' field to \"%s\"" %(name, os.path.basename(source_table)))
                update_fields.insert(join_values.index(fldb), name.encode('utf-8'))
                                        
    # update new fields
    path_dict = {}
    if dataAccess:
            
# Create Dictionary
        join_values.insert(0, join_key)
        with arcpy.da.SearchCursor(join_table, join_values) as srows:
            for srow in srows:
                path_dict[srow[0]] = tuple(srow for i in range(1,len(join_values)))

        # Update Cursor
        update_index = list(range(len(update_fields)))
        row_index = list(x+1 for x in update_index)
        update_fields.insert(0, in_field)
        with arcpy.da.UpdateCursor(source_table, update_fields) as urows:
            for row in urows:
                if row[0] in path_dict:
                    try:
                        allVals =[path_dict[row[0]] for i in update_index]
                        for r,v in zip(row_index, allVals):
                            row = v
                        urows.updateRow(row)
                    except: pass
            
    else:
        # version 10.0
        rows = arcpy.SearchCursor(join_table)
        for row in rows:
            path_dict[row.getValue(join_key)] = tuple(row.getValue(join_values) for i in range(len(join_values)))
        del rows
        
        # Update Cursor
        rows = arcpy.UpdateCursor(source_table)
        for row in rows:
            theVal = row.getValue(in_field)
            if theVal in path_dict:
                try:
                    for i in range(len(update_fields)):
                        row.setValue(update_fields,path_dict[theVal])
                    rows.updateRow(row)
                except: pass
        del rows
    Message('Fields in "%s" updated successfully' %(os.path.basename(source_table)))
    return

if __name__ == '__main__':

    source = r'C:\TEMP\PermitTest\test.gdb\Permits'
    join = r'C:\TEMP\PermitTest\test.gdb\orono_test'
    fields = ['BLDG_NUM', 'SALE_DATE', 'APPLICANT']

    # Run it
    CopyFields(source, 'PID_NO', join, 'PID_NO', fields)
    


Also, I will be at the Esri UC this year so I'll take you up on that cold drink!
EricEagle
Occasional Contributor III

Caleb,

I've been using your updated module but can't get it to work -

I'm using it as part of a larger project to run statistics...basically I'm doing some frequency analysis and then want to join those to a single feature class and calculate over the rows.

I'm using the following syntax:

AttributeUpdate(target_table, 'XYZ', ['POS'], posoutfq, 'XYZ', ['FREQUENCY'])

Where target_table is my...well... target table

XYZ is the field in the target table to be joined on

POS is the field I want populated

posoutfq is the geodatabase table with the frequency stats in it

XYZ is the common table in posoutfq

FREQUENCY contains long integers that I need to populate POS with.

Generally everything works fine until it gets to urows.updateRow(row)  in the AttributeUpdate function.  Then I get a runtime error stating that the "value type is incompatible with the field type [POS]."  But both POS and FREQUENCY are "long" data types.  FREQUENCY contains some null values, but POS allows null values.

But when I join them manually in ArcMap and calculate over the values (using the python calculator) they update just fine.

So I'm guessing that somehow, the module is either reading the wrong field or reading one of the two fields as a string, or something like that.

Any ideas or assistance would be appreciated - and your work on this is already much appreciated... this will save me enormous amounts of time on what will be a regular process!

0 Kudos
by Anonymous User
Not applicable

Hi Eric,

Glad to see you are getting some use out of this.  It looks like I have an updated version (as of 6 months ago) so maybe replace the function with the code below:

def AttributeUpdate(source_table, in_field, update_fields, join_table, join_key, join_values):
    '''
    Copies field(s) from one table to another using using field matching lists

    source_table: table in which to add new fields
    in_field: a field that has common values to a field in the join_table.
              think of this as a "join_field"
    update_fields: fields to be updated (must be in same order as join_values)
    join_table: table with fields to add to source_table
    join_key: field to match values of the "in_field"
    join_values: fields to add to source_table (list)
    '''
    # test version for cursor type (data access module available @ 10.1 +)
    ver = arcpy.GetInstallInfo()['Version']
    if ver != '10.0':
        dataAccess = True
    else:
        dataAccess = False


    # Check input parameters
    if not isinstance(update_fields, list):
        update_fields = list(update_fields)
    if not isinstance(join_values, list):
        join_values = list(join_values)
    if not arcpy.Exists(source_table):
        print '"%s" not found!\nPlease verify that full path of table exists' %source_table
        sys.exit()
    if not arcpy.Exists(join_table):
        print '"%s" not found!\nPlease verify that full path of table exists' %join_table
        sys.exit()
    source_fields = arcpy.ListFields(source_table)
    join_fields = arcpy.ListFields(join_table)
    for fld in [in_field] + update_fields:
        if fld not in [f.name for f in source_fields]:
            print "'%s' not found in \"%s\"" %(fld,os.path.basename(source_table))
            print 'Please verify that field names match in "%s"' %os.path.basename(source_table)
            sys.exit()
    for fldb in [join_key] + join_values:
        if fldb not in [f.name for f in join_fields]:
            print "'%s' not found in \"%s\"" %(fldb,os.path.basename(join_table))
            print 'Please verify that field names match in "%s"' %os.path.basename(join_table)
            sys.exit()


    # Make sure there is matching number of join and update fields
    if len(update_fields) == len(join_values):
        up_flds_list = [f for f in arcpy.ListFields(source_table) if f.name in update_fields]
        up_flds_dict = dict([(f.name,f.type) for f in up_flds_list])
        jn_flds_list = [f for f in arcpy.ListFields(join_table) if f.name in join_values]
        jn_flds_dict = dict([(f.name,f.type) for f in jn_flds_list])
        types_dict = {'String': str, 'Single':float, 'Double':float, 'Integer':long, 'SmallInteger':int}
        path_dict = {}

        if dataAccess:

            # Create Dictionary
            join_values2 = join_values[:]
            join_values2.insert(0, join_key)
            with arcpy.da.SearchCursor(join_table, join_values2) as srows:
                for srow in srows:
                    path_dict[srow[0]] = srow[1:]

            # Update Cursor
            update_fields2 = update_fields[:]
            update_fields2.insert(0, in_field)
            with arcpy.da.UpdateCursor(source_table, update_fields2) as urows:
                for row in urows:
                    if row[0] in path_dict:
                        try:
                            row[1:] = path_dict[row[0]]
                            urows.updateRow(row)
                        except:
                            pass

        else:
            # version 10.0
            srows = arcpy.SearchCursor(join_table)
            for row in srows:
                value = row.getValue(join_key)
                path_dict[value] = tuple(row.getValue(i) for i in join_values)
            del srows

            # Update Cursor
            rows = arcpy.UpdateCursor(source_table)
            for row in rows:
                theVal = row.getValue(in_field)
                if theVal in path_dict:
                    for i in range(len(update_fields)):
                        try:
                            theType = types_dict[jn_flds_dict[join_fields]]
                            row.setValue(update_fields,theType(path_dict[theVal]))
                            rows.updateRow(row)
                        except:
                            try:
                                row.setValue(update_fields, path_dict[theVal])
                                rows.updateRow(row)
                            except:
                                pass

            del rows
        del path_dict
        print 'Fields in "%s" updated successfully' %(os.path.basename(source_table))
    else:
        print 'ERROR:  Number of update fields and value fields does not match'
    return

That error is strange though, you should never get an error that says the value is incompatible with the field type if both fields are long.  I haven't looked at this code in a while, but a lot of it looks pretty sloppy to me. Maybe I will go back and do a full rewrite when I have time.

TimHopper
Occasional Contributor III

Hi Chris,

I just found this thread, i hope its still active. I am faced with a task to do a spatial join of points to polygon and read from the points fc, the rows with a null street name and copy values from 6 fields in the join table to the original feature class. My code looks like this right now:

arcpy.env.workspace = r"path"
arcpy.env.overwriteOutput = True


GaR = "points"
Parcel = "Parcels"
Join_table = "Join_table"

arcpy.MakeFeatureLayer_management("points", GaR)
arcpy.MakeFeatureLayer_management("Parcels", Parcel)


arcpy.SpatialJoin_analysis (GFR, Parcel, Join_table, "", "", "", "WITHIN")

where = "STREET_NAME" + " IS NULL"
with arcpy.da.SearchCursor(GFR, ["STREET_NAME"], where) as cursor:
for row in cursor:
   CopyFields (GaR, "ID", Join_table, "ID", fields)

Join_table has ID, Name, street, location, state and zip and i am trying to copy these into the fields in GaR and then delete the Join-table. 

Looking forward to hearing from you

Thanks

Ayo

0 Kudos