Python script in Arc very slow + code improvement

MarkPeacey · ‎03-23-2011

The following script takes 3 min to complete in Python on a shapefile that has 211 features. In Arc as a script in a Toolbox it takes 25 min 😞 Arc sits there when it gets to writing a particular feature and I'm not sure why. Is there extra checking and processing that occurs behind the scenes when you run a script as a tool in Arc?

The script is pretty basic and I know there are some improvements I could do to speed it up; such as utilising a dictionary, so I'm not repeating MakeFeatureLayer for rows that have the same Area_m2 value, and actually working out how to query based on [Shape_Area], and get it to work! However, if it took 3 min in Arc I'd be happy.

Any insights into this would be much appreciated.

import arcpy
import sys

arcpy.env.overwriteOutput = True

# Variables defined by the user

infc = arcpy.GetParameterAsText(0)
FGDBlocation = arcpy.GetParameterAsText(1)
FGDBname = arcpy.GetParameterAsText(2)
outfcname = arcpy.GetParameterAsText(3)
##lyrfile = arcpy.GetParameterAsText(4)

# Set other variables
##lyrfile = "G:\\GIS\\ModelBuilder\\Count_overlaps_tool\\CountOverlaps.lyr"
FGDB = FGDBlocation + "\\" + FGDBname + ".gdb"
output = FGDB + "\\" + outfcname
unionout = FGDB + "\\union"

try:
    # Run repair incase of empty geometries and self intersects
    arcpy.RepairGeometry_management(infc)
##    print "Repaired geometry"
    
    # Create File Geodatabase to hold output featureclass
    arcpy.CreateFileGDB_management(FGDBlocation, FGDBname)
##    print "Created FGDB"
    
    # Create temporary layer in memory for processing
    arcpy.MakeFeatureLayer_management(infc, "in_memory\temp")
##    print "Created temp layer"
    
    # Union featureclass to process overlaps
    arcpy.Union_analysis ("in_memory\temp", unionout)
      
    # Create new field for storing overlap count to be populated
    arcpy.AddField_management(unionout, "Count", "SHORT", "2")
    arcpy.AddField_management(unionout, "Area_m2", "TEXT", "", "", "15")
##    print "Added fields"

    # Populate the new field
    arcpy.CalculateField_management(unionout, "Area_m2", "round(!Shape_Area!, 1)", "PYTHON_9.3")    
        
except:
    print arcpy.GetMessages()
##    print "error"

# Delete unrequired fields    
arcpy.DeleteField_management(unionout, ["Id", "FID_Treatment", "Area", "Hectares"])    
##print "Deleted fields"

# Create temporary layer in memory for processing
arcpy.MakeFeatureLayer_management(unionout, "in_memory\union")

# Create attribute index to speed up querying??
arcpy.AddIndex_management("in_memory\union", "Area_m2", "Area_m2Index")

# Setup cursor for update
rows = arcpy.UpdateCursor("in_memory\union", "", "", "Count; Area_m2")

for row in rows:

    # Get the value from the Shape Area field
    area = row.getValue("Area_m2")
    
    # Clause used to create selection later             
    clause = "\"Area_m2\" = " + "'" + area + "'"
##    print clause

    # Create temporary layer of all features with the same Area_m2 field value
    arcpy.MakeFeatureLayer_management("in_memory\union", "in_memory\temp", clause)

    # Count number of identical features to be used to populate count field
    featcount = arcpy.GetCount_management("in_memory\temp")
##    print featcount

    # Populate the Count field with the number of overlaps
    row.Count = str(featcount)

    # Update the row to write the Count value to the field
    rows.updateRow(row)

else:
    print "End of rows"

try:
    
    # Delete identical features
    arcpy.DeleteIdentical_management(unionout, ["Area_m2"])
##    print "Deleted duplicates"

    # Copy feature class from memory to a physical location
    arcpy.CopyFeatures_management("in_memory\union", output)
##    print "Copied features"

    arcpy.Delete_management(unionout)
    arcpy.Delete_management("in_memory\temp")
    arcpy.Delete_management("in_memory\union")
##    print "Deleted redundant and temporary layers"
    
except:
    print arcpy.GetMessages()

del row
del rows

PhilMorefield · ‎03-25-2011

The fact that your script runs eight times faster outside of AGD narrows things down. When you say "in Arc", do you mean as a script tool?

Right click on the tool and go to the Source tab. Make sure "Run Python script in process" is checked.

MarkPeacey · ‎03-27-2011

Yes it's a script tool and I'm running it "in process" already. I'm thinking I'll have to re-write the script to utilise Arcpy geometry, based on the centroid of the features, to work out the number of polygons that intersect and see if I can get the same result with better speed. I'm still baffled by what Arc is doing, when it runs a script tool, to slow everything down by that order of magnitude.

I've got a few more test shapefiles so will try them when I have time, but I've heard from a colleague that it still runs very slowly in Arc.

ChrisSnyder · ‎03-28-2011

Would this script help? http://arcscripts.esri.com/details.asp?dbid=16700

MarkPeacey · ‎04-03-2011

Would this script help? http://arcscripts.esri.com/details.asp?dbid=16700

Thanks Chris, I did have a look at your script previously which gave me the idea of using a dictionary to prevent duplication. I think I'll implement that plus use the Geometry objects and hopefully that speeds things up.

MarkPeacey · ‎04-28-2011

Below is the drastically improved script which utilises a dictionary. Now takes less than 20 sec to process 10000+ features. There's still a number of improvements that could be made to the code, especially for the field deletes etc. as I've just specified them based on knowing the input files.

# Purpose: Allow for symbolic representation where helicopter bait drop swaths overlap. The more overlaps the higher the concentration applied.

import arcpy
import time
import sys

arcpy.env.overwriteOutput = True

print "Start time = " + time.asctime()

# Variables defined by the user

infc = arcpy.GetParameterAsText(0)
existFGDB = arcpy.GetParameterAsText(1)
existFGDBPath = arcpy.GetParameterAsText(2)
FGDBlocation = arcpy.GetParameterAsText(3)
FGDBname = arcpy.GetParameterAsText(4)
outfcname = arcpy.GetParameterAsText(5)

# Checks if the user has specified an existing geodatabase for the output. If so there's a change
# to the path variables so the existing geodatabase is used.

# Function to define path to existing geodatabase
def FGDBsource():
    global FGDB
    FGDB = existFGDBPath
    print arcpy.AddMessage("Opted to put output in existing file geodatabase")

FGDB = FGDBlocation + "\\" + FGDBname + ".gdb"

# Call definition above if output selected to go into existing geodatabase
if existFGDB == 'true':     #Should be capital "T" but ArcMap returns lower case!
    FGDBsource()
# Otherwise create new geodatabase  
else:
    print arcpy.AddMessage("Opted to create new file geodatabase")
    # Create new File Geodatabase to hold output featureclass
    arcpy.CreateFileGDB_management(FGDBlocation, FGDBname)
    print arcpy.AddMessage("Created new file geodatabase")

# Set script variables
unionout = FGDB + "\\union"
jointbl = FGDB + "\\join_table"
output = FGDB + "\\" + outfcname

try:
    # Run repair incase of empty geometries and self intersects
    arcpy.RepairGeometry_management(infc)
    print arcpy.AddMessage("Repaired geometry")
 
    # Create temporary layer in memory for processing
    arcpy.MakeFeatureLayer_management(infc, "in_memory\temp")
    
    # Union featureclass to process overlaps
    arcpy.Union_analysis ("in_memory\temp", unionout)
    print arcpy.AddMessage("Completed union")

    # Create temporary layer in memory for processing
    arcpy.MakeFeatureLayer_management(unionout, "in_memory\temp")

    # Delete unrequired fields
    arcpy.DeleteField_management("in_memory\temp", ["Id", "Area", "Hectares"])

    # Create new field for storing overlap count to be populated
    arcpy.AddField_management("in_memory\temp", "Count", "SHORT", "2")    

##    # Create attribute index to speed up querying
##    arcpy.AddIndex_management("in_memory\temp", "Shape", "Shape_Index")
##    print arcyp.AddMessage("Completed field changes and index build") 

except:
    print arcpy.GetMessages()

# Dictionary for storing feature areas
countDict = {}

# For getting shapefield name
shapefldname = arcpy.Describe("in_memory\temp").ShapeFieldName

# Setup cursor for update
rows = arcpy.UpdateCursor("in_memory\temp", "", "", "Shape; Count")

print arcpy.AddMessage("Populating dictionary")

# For iterating through features
for row in rows:
    feat = row.getValue(shapefldname)
    # Get area of each feature using geometry object
    area = feat.area
    # If value is not in dictionary add it to dictionary with count = 1
    if countDict.has_key(area) == 0:
        countDict[area] = 1
    # If area value is in dictionary add + 1 to the count value and delete the row as it's a duplicate
    elif countDict.has_key(area) == 1:
        countDict[area] = countDict[area] + 1
        rows.deleteRow(row)
    else:
        print "Error"
        print arcpy.AddMessage("Error: error with Shapefield value")
        print arcpy.GetMessages()

# Setup another cursor for update
table = arcpy.UpdateCursor("in_memory\temp", "", "", "Shape; Count")

arcpy.AddMessage("Populating overlap 'Count' field")

# For iterating through non-duplicate features in order to write Count field
for r in table:
    # Getting area again
    feat = r.getValue(shapefldname)
    area = feat.area
    # Write count if count is not equal to zero
    if countDict[area] in range(1, 10):
        r.Count = int(countDict[area])
        table.updateRow(r)
    else:
        print arcpy.AddMessage("Error: count value not between 1-10")
        print arcpy.GetMessages()

# Copy feature class from memory to a physical location
arcpy.CopyFeatures_management("in_memory\temp", output)
arcpy.AddMessage("Created output featureclass")

# Add the output featureclass to the current map
arcpy.SetParameterAsText(6, output)
arcpy.AddMessage("Adding output to current mxd")

# Clean up redunadant layers and cursors to remove them from memory
arcpy.Delete_management("in_memory\temp")
arcpy.Delete_management(unionout)

del row, rows, r, table, arcpy
    
print "End time = " + time.asctime()