Multiprocessing: Arguments and Errors

2187
2
04-30-2013 08:59 AM
DwightLanier
New Contributor III
Hi all,

Using ArcGIS 10.0 SP3, win 7

First time working through using multiprocessing.  What I have right now works up until everything goes into multiprocessing (the CPU use spikes for all processors, etc.), but then nothing is accomplished for any of the subprocesses.  One of the first statements inside the function being called is to print a simple line that should ID the data being worked on or just let me know that it's alive.  After that are some simple folder creations steps using os, but no statement is printed and no folders are ever created an no errors are thrown.  The CPU use drops off to 1% after the first couple of seconds and it just sits there, never really erroring out, or seeming to do anything.

I realize I probably have some fundamental misunderstanding about how the processing works using this module, and would greatly appreciate any help with few questions...may even be something related to the version of ArcMap that's been addressed in newer versions...

First would be how best to pass arguments along to the function when called using multiprocessing pools.

All of the examples I've seen have something like the following (not quite pseudocode :p):

import os
import multiprocessing
import arcpy

# Define function to do work...
def someFunction(shp):
    do this with shp

# Data to do work on...
fcs = [shp1, shp2, shp3]

# Call multiprocessing module...
pool = multiprocessing.Pool()
pool.map(someFunction, fcs)
pool.close()
pool.join()


In examples of this type that I have seen, the function that will do the work is identified, and only a list of items that will be worked on and assigned to their own processes are passed along.  My question is, what if I have a function that takes multiple arguments?  Should it work if I submit something like the following?:

fcs = [[shp1, 500], [shp2, 350], [shp3, 200]]


So that a list is being passed into the function from the pool.map() for each item and it's arguments?

And in case this was not correct I also tried it this way so that only one item of information is being passed, and then parsed later in the function:

fcs = ["shp1,500", "shp2,350", "shp3,200"]

def someFunction(shp):
    shp = re.split(",", shp)[0]
    dist = re.split(",", shp)[1]


None of this is working either way, just wanting to see if anyone can shed some light on this for me.

Second part is looking for clarification on what to expect in terms of error messages, print statements, etc. from inside multiprocessing calls.  If I want to see where things may be going wrong, how do I do this?  I have tried using print statements inside the function being called, but nothing prints.  I have tried try/except loops that use Exception, e and print, e (also tried writing to logfiles), but no luck.

Thanks for any info,

Dwight
Tags (2)
0 Kudos
2 Replies
DwightLanier
New Contributor III
Finally got it to run without problems.  The error, on  my part of course, was forgetting to modularize the namespace.

So it should look more like:

import multiprocessing
import arcpy

# Define function to do work...
def someFunction(shp):
    do this with shp

if __name__ == '__main__':
    #main part of code

    # Data to do work on...
    fcs = [shp1, shp2, shp3]

    # Call multiprocessing module...
    pool = multiprocessing.Pool()
    pool.map(someFunction, fcs)
    pool.close()
    pool.join()


Not posting the actual code wasn't entirely fair to those looking in to help...sorry.

Using this took a process that once required 2 hours to process down to 7.5 minutes.

Questions still remain: what are the rules governing passing arguments to a target function in multiprocessing, and how to get print statements from the functions when they're being called in multiprocessing?  Would like to write out to a logfile...
0 Kudos
MikeHunter
Occasional Contributor
I believe the only rule for args is that they are pickleable, so this leaves out a bunch of arcpy objects.  As far as printing goes, it works fine for me running in a Windows cmd window. The only issue is that the output gets jumbled up from time to time if 2 or more processes try to print something at the same time.

thanks for posting all this,
Mike
0 Kudos