I had created a python script that used the python multiprocessing module to take advantage of a multi-core computer. This was created in ArcMap 10.3. It ran fine in IDLE but when I attempted to wire it into a Script Tool interface so I could expose it as a Tool in ArcToolbox I started to have problems...
With some great help from the community on GIS SE I was able to finally get it working, the solution was rather obscure so I am documenting it here for others. As you will see this was doing nothing amazing but it does provide a template to get you up and going and hopefully you won't waste time as I did trying to work out why it was not working...
Scenario
I have a polyline dataset that I want to clip to a polygon dataset. Due to the nature of the polygon dataset the clip tool was taking a long time to process. So I decided to use multiprocessing to speed things up.
Solution
My script that was wired up to a Script Tool interface is very simple and is shown below. Note it imports another module I called multicode. This python file sits in the same folder as this script.
'''
Title:
multiexample
Description:
This simple script is the script that is wired up into toolbox
'''
import arcpy
import multicode
# Get parameters (These are Layer objects)
clipper = arcpy.GetParameter(0)
tobeclipped = arcpy.GetParameter(1)
def main():
arcpy.AddMessage("Calling multiprocessing code...")
multicode.multi(clipper,tobeclipped)
if __name__ == '__main__':
main()
Important: When wiring up this script to an interface make sure Run Python script in process is UNTICKED under the source tab!
My main multiprocessing code is shown below, take note of the limitations. I've tried to document it so that it is understandable.
"""
Title:
multicode
Description:
The module that does multicore clipping. It will take as input a polygon layer and another layer. For
each polygon it will clip the dataset into a new separate shapefile.
Limitations:
This code expects the folder c:\temp\tc to exist, this is where the output ends up. As geoprocessing objects
cannot be "pickled" the full path to the dataset is passed to the worker function. This means that any selection
on the input clipper layer is ignored.
Author:
Duncan Hornby (ddh@geodata.soton.ac.uk)
Created:
2/4/15
"""
import os,sys
import arcpy
import multiprocessing
from functools import partial
def doWork(clipper,tobeclipped,oid):
"""
Title:
doWork
Description:
This is the function that gets called as does the work. The parameter oid comes from the idList when the
function is mapped by pool.map(func,idList) in the multi function.
Note that this function does not try to write to arcpy.AddMessage() as nothing is ever displayed.
If the clip succeeds then it returns TRUE else FALSE.
"""
try:
# Each clipper layer needs a unique name, so use oid
arcpy.MakeFeatureLayer_management(clipper,"clipper_" + str(oid))
# Select the polygon in the layer, this means the clip tool will use only that polygon
descObj = arcpy.Describe(clipper)
field = descObj.OIDFieldName
df = arcpy.AddFieldDelimiters(clipper,field)
query = df + " = " + str(oid)
arcpy.SelectLayerByAttribute_management("clipper_" + str(oid),"NEW_SELECTION",query)
# Do the clip
outFC = r"c:\temp\tc\clip_" + str(oid) + ".shp"
arcpy.Clip_analysis(tobeclipped,"clipper_" + str(oid),outFC)
return True
except:
# Some error occurred so return False
return False
def multi(clipper,tobeclipped):
try:
arcpy.env.overwriteOutput = True
# Create a list of object IDs for clipper polygons
arcpy.AddMessage("Creating Polygon OID list...")
descObj = arcpy.Describe(clipper)
field = descObj.OIDFieldName
idList = []
with arcpy.da.SearchCursor(clipper,[field]) as cursor:
for row in cursor:
id = row[0]
idList.append(id)
arcpy.AddMessage("There are " + str(len(idList)) + " object IDs (polygons) to process.")
# Call doWork function, this function is called as many OIDS in idList
# This line creates a "pointer" to the real function but its a nifty way for declaring parameters.
# Note the layer objects are passing their full path as layer objects cannot be pickled
func = partial(doWork,clipper.dataSource,tobeclipped.dataSource)
arcpy.AddMessage("Sending to pool")
# declare number of cores to use, use 1 less than the max
cpuNum = multiprocessing.cpu_count() - 1
# Create the pool object
pool = multiprocessing.Pool(processes=cpuNum)
# Fire off list to worker function.
# res is a list that is created with what ever the worker function is returning
res = pool.map(func,idList)
pool.close()
pool.join()
# If an error has occurred report it
if False in res:
arcpy.AddError("A worker failed!")
arcpy.AddMessage("Finished multiprocessing!")
except arcpy.ExecuteError:
# Geoprocessor threw an error
arcpy.AddError(arcpy.GetMessages(2))
except Exception as e:
# Capture all other errors
arcpy.AddError(str(e))
When I ran the tool from ArcToolbox it immediately bombed out with a 000714 error, far too quickly for it to be a silly syntax error by me. Again GIS SE came to the rescue and it turned out to be an issue with which version of Python that was being used. If you open the python command line window in ArcMap and type the following:
import sys
sys.version
You will see '2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]' which tells you that it is using the 32 bit version. Running my script from Arctoolbox turns out to be running the 64 bit version which I assume was installed when I installed the 64 bit background geo-processing. This was upsetting ArcToolbox.
So how did I solve this? I went to my py file in file explorer, right clicked and went to open with > Choose default programs and browsed to C:\Python27\ArcGIS10.3\pythonw.exe. Once the default application to open python files was reset to the 32 bit version the Script Tool ran without error and I could see all the cores on my machine max out.