in-process .da searchcursor question

ChrisSnyder · ‎11-20-2012

I have a script that uses the arcpy.da.SearchCursor method. I am using v10.1 SP1. The script basically runs the SelectByLocation tool in a recurisive loop and identifies "spatial clusters" - that is groups of features that are all less than a certain distance from each other. I originally have this script running as an "in-process" toolbox based script. However, when run on extreemly large datasets, it exhibits some sort of memory leak (I notice the ArcMap screen flashes - I assume everytime the SelectByLocation/SelectByAttribute tool fires off), which ultimately causes ArcMap to issue an "out of memory" error at ~3.4 GB of RAM usage (ESRI gets lots of thanks for the now-standard large address aware in v10.1 BTW).

So, I noticed that if I run the script as a stand alone process via PythonWin, the memory usage never becomes and issue (and in fact stays < 300MB), and generally the script finishes much faster... Which is not what I would expect since I though running a script "in-process" was always the way to go. So I think, "Hey, I'll just make the script tool run "out-of-process" (via the toolbox), and memory problem solved!". Unfortunatly, when I do this, I find that the .da.SearchCursor can't read the featurelayer.

Anyone know:

Why is the "in-process" method of running the script via ArcMap exhibiting memory leak behavior while running the script as "stand alone" via Python/PythonWin so much more efficient?

Why does the arcpy.da.SearchCursor blow up when it tries to read my feture layer when the script is run "out-of-process" via ArcToolbox in ArcMap". The pertinent error message is:

Tue Nov 20 15:31:47 2012 - 
*** PYTHON ERRORS *** 
Tue Nov 20 15:31:47 2012 - Python Traceback Info:   File "C:\csny490\clusterId_v3.py", line 92, in <module>
    preSelectionOidSet = set([r[0] for r in arcpy.da.SearchCursor("fl1", ["OID@"])])

Tue Nov 20 15:31:47 2012 - Python Error Info: <type 'exceptions.RuntimeError'>: cannot open 'fl1'

MikeHunter · ‎11-20-2012

Could you maybe have a lock problem here? If you run scripts out of process, you will sometimes get this type of error when accessing data sources that are open, or have been open, in the current mxd. On the selectbylayer issue, I've noticed the blinking on several toolbox tools. I haven't noticed memory leaks, but I haven't looked for them either. SelectbyLayer and Calculate are 2 that I've noticed that do this. Started with 10.1.

Mike

ChrisSnyder · ‎11-20-2012

I don't think it's a lock problem... My only guess is that the screen flicker is the "in-process" script refreshing the screen upon having a new selection... I believe this "flashing" is perhaps the source of the excessive memory usage...

I changed up my code so that instead of using a da.SearchCuror it uses the arcpy.Describe.fidSet method (BTW: .fidset is a bit slower than using a da.SearchCursor to get a list of selected OIDs). Running this altered version of the script "in-process" yields the same memory leak behavior, and running it "out-of-process" via ArcMap toolbox throws basically the same error as when I was using the da.SearchCursor method. This time:

Tue Nov 20 17:04:32 2012 - 
*** PYTHON ERRORS *** 
Tue Nov 20 17:04:32 2012 - Python Traceback Info:   File "C:\csny490\clusterId_v4.py", line 93, in <module>
    preSelectionOidSet = set(int(fid) for fid in arcpy.Describe("fl1").fidSet.split(";"))
Tue Nov 20 17:04:32 2012 - Python Error Info: <type 'exceptions.RuntimeError'>: ERROR 999999: Error executing function.

So it seems that for whatever reason running scripts "out of process" via ArcMap toolbox can't access the table/selected features of the input feature layer when the input feature layer has selections applied to it in the script.

ChrisSnyder · ‎11-20-2012

Hmmm...

Funny thing is that in my script I run an arcpy.da.SearchCursor on the input feature layer (before it has a selection placed on it) and that works fine, but as soon as the script issues a selection:

arcpy.SelectLayerByAttributes("fl1", "NEW_SELECTION", "OBJECTID = " + str(1))

and then trys to run a SearchCursor on "fl1" or arcpy.Describe("fl1").fidSet on the "selection-containing" feature layer the thing throws a rod... but only if the script is run "out-of-process" via an ArcToolbox script tool. If run "in-process" it's fine (but leaks memory like my grandma)... and if run stand alone it's fine and doesn't exhibit a memory leak.

JamesCrandall · ‎11-21-2012

Curious...

Do you have any imports of 3rd party modules in the script? Perhaps one that attempts to load needed .dll's to function?

EDIT: I just remembered something from some of my ArcObjects/.NET development work where I had some bottleneck issues when doing spatial selections/spatial relationship analysis. The solution was to be absolute certain that the participating spatial data sets were in the same projection/coordinate system. Not saying definitively this is the cause of your issue, but perhaps something to double-check.

ChrisSnyder · ‎11-21-2012

No 3rd party stuff...

I found this simple code reproduces the issue:

import arcpy
inLayer = arcpy.GetParameterAsText(0)
oidFieldName = arcpy.Describe(inLayer).oidFieldName
oidDict = dict([(r[0], -1) for r in arcpy.da.SearchCursor(inLayer, ["OID@"])])
arcpy.MakeFeatureLayer_management(inLayer, "fl1")
try:
    nextOidSeedValue = (key for key,value in oidDict.items() if value == -1).next()
except:
    pass
arcpy.SelectLayerByAttribute_management("fl1", "NEW_SELECTION", oidFieldName + " = " + str(nextOidSeedValue))
selectedSet = set([r[0] for r in arcpy.da.SearchCursor("fl1", ["OID@"])])
arcpy.AddMessage("Selected set is: " + str(selectedSet))

When this is set up as a script tool and is run "in-process" or as a stand alone script with a hardcoded inLayer variable it works fine...

However, if it is not run as a script tool "out-of-process" if will fail on the line:

selectedSet = set([r[0] for r in arcpy.da.SearchCursor("fl1", ["OID@"])])

With the error:

Python Traceback Info:   File "C:\csny490\temp\ideas\selection_reprod.py", line 20, in <module>
    selectedSet = set([r[0] for r in arcpy.da.SearchCursor("fl1", ["OID@"])])

Python Error Info: <type 'exceptions.RuntimeError'>: cannot open 'fl1'

ChrisMathers · ‎11-21-2012

Have you tried using a non DA cursor there? Problem could be with the DA module.

ThomMackey · ‎11-22-2012

I can't answer the actual question (i.e. why is that behaviour occurring?) but I can say that I've had to do a similar task (find all records in same feature class within Xkm of each individual records). I found SelectLayerByLocation too slow, so I went about it a different way:

Use the Select tool to extract the current record into a temp IN_MEMORY fc containing a single point

Create an empty geometry object

Use the Buffer tool to buffer the temporary point FC into the geometry object

Clip the original points fc using the geometry object as the clip features.

The results of the clip contains the features within X km of your original point. I guess it depends on what you're doing with the output, this prob won't work if you're looking to calculate a value back into the original FC or something. Although you could build up a dictionary of {source: [neighbours]} which you could then use to re-iterate over the original FC and apply the attributes I suppose.

Some sample pseudocode:

arcpy.env.workspace = "IN_MEMORY"
buff_geom = arcpy.Geometry()

with arcpy.da.SearchCursor(points_fc,cur_fields) as cur:
    for row in cur:
        current_row_oid = row[0]

        # Select the point we're looking at into "tempTarget"
        current_row_selector = '"{0}" = {1}'%.format(oid_field, current_row_oid)
        arcpy.Select_analysis(points_fc, "tempTarget", current_row_selector)

        # Get a geometry containing an $analysis_distance buffer from current point
        buff_geom_list = arcpy.Buffer_analysis("tempTarget",
                                                buff_geom,
                                                "%s Meters"%analysis_distance)

        # Extract the nearby properties into "tempNearby"
        arcpy.Clip_analysis(points_fc, buff_geom_list, "tempNearby")

No idea if that will help you but thought I'd post it out of curiosity. Avoiding using feature layers altogether might mitigate the issue. I also found this noticeably faster (3-5x) than using the SelectByLocation tool. No idea how this will behave when you run it from ArcMap though, I was just using it externally.

I'd still be very curious to find out why it's happening and how to fix it.

ChrisSnyder · ‎12-03-2012

Temporarily gave up on this... Seems that there is some weird dependency on 'FeatureLayer' data types when run as an in-process vs. an out-of-process script tool. This sort of makes sense since there is a linkage between ArcMap TOC and the selected feature set in the script tool... When run out of process there is some sort of break in that linkage. This is a bug I think, but now with LargeAddressAware in v10.1 SP1 I can process my large dataset "in-process"... although it finishes right about at 3.05GB of RAM usage. I think I will submite this behavior as a bug to ESRI (Can't run a search cursor or .fidset describe property on an input feature layer when selections are applied to the feature layer in an 'out-of-process' script tool).

To Chris M: Nope, the old cursors exhibit the same behavior. And to boot, they are too slow (compared with .fidset or da.SearchCursor methods of retreiving the OIDs of the selected set).

To Thom: Negatory also... Your solution, which sounds familiar 🙂 BTW, won't work in my situation. This needs to loop 10's of thousands of times and copying stuff to in_memory would slow things down and probably cause more memory usage.