14 Replies Latest reply on Jul 20, 2014 12:41 PM by filipkral

    Update cursor with joined tables work around w/ dictionaries

    mzcoyle
      This post could easily be called "How I fell in love with dictionaries"

      Drawing the idea from this post http://forums.arcgis.com/threads/52511-Cool-cursor-dictionary-constructor-one-liner

      I've come up with a solution to a nagging problem I know I have been having, and I believe some others have as well, of not being able to reliably use an update cursor when dealing with joined tables. I was really happy with my first foray into dictionaries, and I thought I'd share my work around for anyone looking to optimize some tedious processing with joins. My data was ~900k rows of forest stand data in one table, and a strata reference table of ~50 rows to calculate volumes. My previous method of using a permanent JoinField, processing, then deleting those fields, took approximately 3.5 hours. Temporary joins never worked for me in the manner I needed. Using dictionaries instead of joins, that time was reduced to under 15 minutes.

      This code goes through any table and creates a list of field names for every field other than OID and the key field you want to reference.

      Here is the fairly complete code to create the dictionary
          print "Starting function"     # Define and setup variables, tables, key field etc     calc_table = arcpy.MakeTableView_management(table_path)     vol_tab = join_table_path     strata_tab = "in_memory/temp"     arcpy.MakeTableView_management(vol_tab, strata_tab)     joinField = "STRATA"          # Create list of value fields, leaving out OID field and key/join field     flistObj = arcpy.ListFields(strata_tab)     flist = []     for f in flistObj:         if f.type != "OID" and f.name != joinField:             flist.append(f.name)      # Create empty dict object then populate each key with a sub dict by row using value fields as keys     strataDict = {}      for r in arcpy.SearchCursor(strata_tab):         fieldvaldict = {}         for field in flist:             fieldvaldict[field] = r.getValue(field)         strataDict[r.getValue(joinField)] = fieldvaldict      del strata_tab, flistObj


      In the update cursor you can then either explicitly reference dictionary objects like this
          rows = arcpy.UpdateCursor(calc_table, "\"%s\" IS NOT NULL" % joinField)     for row in rows:         strata = row.getValue(joinField)         variable = strataDict[strata]["sub_key_field"]


      What I did was use a reference list to reference the dictionary to keep things legible, and so I could remember what went where. This may not even be necessary for some people, but it helped me conceptually. Without getting in to too much detail, here's essentially my update cursor sans the actual calculations.
          species = [     ("C","Fb","FB_STEMS"),("C","Sw","SW_STEMS"),("C","Pj","PJ_STEMS"), # 0,1,2     ("C","Pl","PJ_STEMS"),("C","Lt","LT_STEMS"),("C","Sb","SB_STEMS"), # 3,4,5     ("D","Bw","BW_STEMS"),("D","Aw","AW_STEMS"),("D","Pb","PB_STEMS")  # 6,7,8     ]     sp_fields = [("SP1","SP1_PER"),("SP2","SP2_PER"),("SP3","SP3_PER"),     ("SP4","SP4_PER"),("SP5","SP5_PER")]     print "Beginning updates"     rows = arcpy.UpdateCursor(calc_table, "\"%s\" IS NOT NULL" % joinField)     for row in rows:         strata = row.getValue(joinField)         for sp, per in sp_fields:             sp_type = row.getValue(sp)             spp_f = float(row.getValue(per))             if spp_f > 0:                 for grp, spec, stem in species:                     stem_f = strataDict[strata][stem]                     (...)


      Hopefully that didn't get too convoluted, anyone else have anything that might contribute in terms of optimization?
        • Re: Update cursor with joined tables work around w/ dictionaries
          kimo
          I am using dictionaries to update tables instead of a join more as well.
          I tried refactoring my clumsy lines to use the oneline list comprehension but it turned out to be marginally slower.
          222565 dictionary count 0:00:46.594000
          222565 dictionary count 0:00:48.125000

          I note that you do not bother to specify a subset of fields when opening the cursor. If you have a lot of fields it apparently helps a lot to only list the relevant fields for the calculations. Not so easy to generalise I suppose, but it may help with memory management too.
          Has anyone done some tests on the 10.1 da module that has  rewritten cursors? Maybe we will not need dictionaries after all.
          • Re: Update cursor with joined tables work around w/ dictionaries
            rafaelr
            Thanks for this!
            had lots of troubles with processing/updating joined tables, took ages within arcmap/didn´t work at all with updatecursors.
            with your suggested dictionaries-route i´ve managed to get it working and really sped things up.
            • Re: Update cursor with joined tables work around w/ dictionaries
              csny490
              Related to this post: http://forums.arcgis.com/threads/58348-Large-Dictionary-Compression, I am having troubles when the dictionaries get too big!

              Although it's slower, especially for multiple fields, I am finding the ole' "Join and Calc" method is much more memory efficient.
              • Re: Update cursor with joined tables work around w/ dictionaries
                mzcoyle
                Related to this post: http://forums.arcgis.com/threads/58348-Large-Dictionary-Compression, I am having troubles when the dictionaries get too big!

                Although it's slower, especially for multiple fields, I am finding the ole' "Join and Calc" method is much more memory efficient.


                Yes, I can imagine when you get into storing multiple million tuple datasets to memory on a 32-bit process, you're going to have a bad time. When I implemented mine it was only ~50 rows to reference to the main table, which worked out quite well. I have another process with a 1:1 relationship on the 900k row dataset that I use a join and export process to run calculations on. I hope Esri bites the bullet this decade and converts desktop to a 64-bit application. It's not like datasets or file complexity are shrinking.

                Maybe as a quick fix develop some more easy to use interfaces between desktop and server to submit large geoprocessing jobs to server post 10.1 which utilizes 64-bit python.
                http://forums.arcgis.com/threads/54612-arcpy-is-using-32bit-Python-installation-how-about-64bit
                • Re: Update cursor with joined tables work around w/ dictionaries
                  brucejr312
                  It's funny I'm reading this post today.....I just switched one of my scripts from a join and select method to a dictionary method and processing time went down from 2 hours to 8 minutes.  Long live the dictionary!
                  • Re: Update cursor with joined tables work around w/ dictionaries
                    brucejr312
                    Here's a neat, pythonesque way of removing unwanted field names.  Not sure if it will be faster, but it looks cooler!

                    flistObj = arcpy.Listfields(strata_tab)
                    flist = [f.name for f in flistObj]
                    for exclude in ['OID','joinField']:[INDENT]flist.remove(exclude)[/INDENT]
                    • Re: Update cursor with joined tables work around w/ dictionaries
                      brucejr312
                      I think this will work, too...even shorter

                      flistObj = arcpy.Listfields(strata_tab)
                      exclude = ['OID','joinField']
                      flist = [f.name for f in flistObj if f.name not in exclude]
                      • Re: Update cursor with joined tables work around w/ dictionaries
                        Playa
                        Hi Mathew

                        I came across your thread and hope that you are able to assist me to use python dictionaries to accomplish what I'm trying to do. Please note that I'm new to Python and would need some assistance to understand your code if you don't mind and have the time.

                        I have 7.5 million parcles saved as a feature class. Within the feature class I have a field called "SG_Code". I also have two tables called WARMS (i.e. WARMS_DW760 & WARMS DW764). They each have a field called "SG_Code" & "TIT_DEED_NUM". I then have another two additional tables called RED (i.e. Redistribution) and REST (i.e. Restitution). The RED and REST tables have a two fields "SG_CODE" and "TIT_DEED_NUM".

                        I need to create a subset feature class of the 7.5 million parcles where I find a match using firstly the "SG_Code" between the parcles feature class and each WARMS table separately (i.e. WARMS_DW760 then WARMS_DW764). I then need to find a match using the original 7.5 million feature class and RED and REST tables using the "SG_Code". Then I need to find a match based on the match already found using the 7.5 million records between the WARMS_DW760 and WARMS_DW764 and then match the "TIT_DEED_NUM" and the "TIT_DEED_NUM" found in the RED and REST tables to see if I find additional matches using the "TIT_DEED_NUM" as not all the records have "SG_Codes" within the REST and RED tables.

                        In short, what I'm trying to accomplish is to identify where I find a match between the parcles and warms, then a match between the parcles and RED and REST.

                        I've used Add Joins so far to accomplish this, but its running forever. I've attached my model that I've built so far to better understand what I'm trying to accomplish.

                        Regards
                        • Re: Update cursor with joined tables work around w/ dictionaries
                          lisaesri
                          I'm creating a dictionary from featureclasses - emassDict =
                          {u'1': [2009621.0, 2009622.0, 2009624.0, 2009623.0, 2009625.0, 2009626.0, 2009627.0]}{u'2': [2009633.0]}{u'3': [2009632.0, 2009631.0, 2009630.0, 2009629.0, 2009628.0]}{u'4': [2009617.0, 2009611.0, 2009610.0, 2009614.0, 2009620.0, 2009612.0, 2009616.0, 2009615.0, 2009613.0, 2009618.0, 2009607.0, 2009605.0, 2009619.0, 2009609.0, 2009606.0, 2009608.0]}{u'5': [2009604.0, 2009601.0, 2009600.0, 2009603.0, 2009602.0]}{u'6': [2009100.0]}{u'7': [2009009.0]}{u'8': [2009004.0, 2009005.0, 2009007.0, 2009008.0, 2009001.0, 2009003.0, 2009002.0, 2009006.0]}{u'9': [2009500.0]}

                          In this same script I want to update one of the fields "iField" with the values from the dictionary - if the key matches the values in another field "eZoneName".    The dictionary is being created, but "iField" is not being populated.  I'm not receiving any error messages so it has to be in the logic, but I can't see it.  Please help, the total script is here:

                          eZones = r"C:\temp\NLF.gdb\NLF_EM_2009_Dissolve"
                          eZoneName = str("UniqueID")
                          iField = "All_EM_List"
                          
                          eIncidents = r"C:\temp\NLF.gdb\NLF_EM_2009_Identity"
                          emNameField = ("E_MASS")
                          joinField = "Dissolve_FID"
                          arcpy.MakeFeatureLayer_management(eIncidents, "eIncidentsLayer")
                              
                          with arcpy.da.UpdateCursor(eZones, (eZoneName, iField)) as zoneRows:
                              for zone in zoneRows:
                                  eZoneNameString = zone[0]
                                  queryString = '"' + eZoneName + '" = ' + "'" + eZoneNameString + "'"
                          
                                  arcpy.MakeFeatureLayer_management(eZones, "CurrenteZonesLayer", queryString)
                          
                                  try:
                                      arcpy.SelectLayerByLocation_management("eIncidentsLayer", "CONTAINED_BY", "CurrenteZonesLayer")
                                      
                                      emassDict = dict()
                                      for row in arcpy.SearchCursor("eIncidentsLayer"):
                                          emName = row.getValue(emNameField)
                                          snName = row.getValue(joinField)
                                          
                                          if snName in emassDict:
                                              emassDict[snName].append(emName)
                                          else:
                                              emassDict[snName] = [emName]
                                              
                                      print emassDict
                          
                          
                                      if  eZoneNameString == [snName]:
                                          zone[1] = [emName]
                                          zoneRows.updateRow(zone)
                                          
                                  except arcpy.ExecuteError:
                                      print(arcpy.GetMessages(0))
                          
                                  finally:
                                      arcpy.Delete_management("CurrenteZonesLayer")
                          
                          arcpy.Delete_management("eIncidentsLayer")
                          del zone, zoneRows
                          • Re: Update cursor with joined tables work around w/ dictionaries
                            mzcoyle
                            This is never true so you are not stepping into the update line.

                            if  eZoneNameString == [snName]:


                            I also think you may be confusing lists and dictionaries.
                            • Re: Update cursor with joined tables work around w/ dictionaries
                              lauzent

                              This is how I've implemented the new DA cursors. The cursor is limited to the fields you want to update with the join field located at index 0, the dictionary is created with row[0] as the key and easily accessed by the update cursor.

                               

                              #Define fields to update, and the field to use as join field

                              Fields = ['Direction', 'Cost', 'year', 'Color']

                              Key = "UniqueID"

                              Fields.insert(0, Key)

                               

                              #Create Dictionaries ; The dictionaries store the values from the update table in memory

                              x = len(Fields)

                               

                              UpdateDict = {}

                               

                              #Iterates through all values in the table and stores them in the update dictionary

                              #Dictionary format; Join Field value : list of field values

                              with arcpy.da.SearchCursor(Table, Fields) as cursor:

                                  for row in cursor:

                                      FieldValDict = {}

                                      for y in range(1,x):

                                          FieldValDict[y] = row[y]

                                      UpdateDict[row[0]] = FieldValDict

                               

                              #Updates the FC from the Update Dictionary

                              #Uses the Join Field value to look up update values

                              with arcpy.da.UpdateCursor(Input, Fields) as cursor:

                                  for row in cursor:

                                      for y in range(1,x):

                                          row[y] = UpdateDict[row[0]][y]

                                          cursor.updateRow(row)

                              • Re: Update cursor with joined tables work around w/ dictionaries
                                filipkral

                                Hi all,

                                Arcapi has these kind of functions for joining tables:

                                 

                                join_using_dict

                                https://github.com/NERC-CEH/arcapi/blob/master/arcapi.py/#L2052-2153

                                 

                                update_col_from_dict

                                https://github.com/NERC-CEH/arcapi/blob/master/arcapi.py/#L1207-1272

                                 

                                In many cases it is much faster than the Join Tool.

                                Filip.