Multiprocessing script errors on geoprocessing line of code - INFADI (Missing Dir)

5012
8
Jump to solution
05-12-2014 07:05 AM
MichaelMitchell1
New Contributor III
I have written a script that uses pool.map to process multiple netCDF files and store information in a table. Each process runs a function to process one year. Each year has it's own individual file geodatabase, table within that geodatabase, and mxd. I also set the default workspace and scratch workspace to that geodatabase. For example when the function loads the year 1979 it accesses the 1979 geodatabase, 1979 table within that geodatabase, and 1979 mxd. 1980 would access the 1980 geodatabase, 1980 table within that geodatabase, and 1980 mxd.

If I run 1 process everything works fine. If I try to run 2 or more I get Fatal Error (INFADI) Missing Directory. It always happens at random times on a random process.  A try catch phrase isn't catching the error.  The process is completely shutting down.  All the other processes continue and the process that crashes starts another year.

It happens here:
              try:                    deleteme = Raster(arcpy.gp.ExtractByMask_sa(WSILayerRas, AOIshape))               except:                    continue  


Does anyone have any thoughts?
Thanks
Tags (2)
0 Kudos
1 Solution

Accepted Solutions
ChrisSnyder
Regular Contributor III
Try adding this code right before the function/child process is called:

import time time.sleep(1.1) #insurance newTempDir = r"C:\temp\gptmpenvr_" + time.strftime('%Y%m%d%H%M%S') os.mkdir(newTempDir) os.environ["TEMP"] = newTempDir os.environ["TMP"] = newTempDir


http://forums.esri.com/Thread.asp?c=93&f=1729&t=284041#881375

This is the trick I use to get multiple instances of arcpy (or gp) to run in parallel... Without this: Random crashes due to concurrent processes attempting to write to the same TEMP file(s) at the same time.

I use the subprocess module, so this might not (???) work with multiprocessing... Execute this code right before the child process is called, or in the child process, before you import arcpy.

View solution in original post

0 Kudos
8 Replies
MathewCoyle
Frequent Contributor
Multiprocessing is currently poorly supported using gp functions. Some work fine, others don't. In each process try explicitly setting both the workspace and scratchworkspace environmental settings to make sure they are unique. I've run into issues with two processing trying to write to the same scratchworkspace using the same file names at the same time with predictable results. The part where it crashes randomly on different dates seems to imply some kind of collision of this nature that is just a timing problem with the different threads.
MichaelMitchell1
New Contributor III
I'm currently setting default and scratch workspaces to the year geodatabase I'm pulling information from.
So process 1979 is setting the 1979 geodatabase as default and scratch and the year 1980 is setting the 1980 geodatabase as default and scratch.  Like so:

def doCalc(year):
        yearstr = str(year)
        print('Starting doCalc: ' + yearstr)

        defaultGDB = "D:\\GIS\projects\\year" + yearstr + ".gdb"
        #Setting environmental variables
        arcpy.env.workspace = defaultGDB
        arcpy.env.scratchWorkspace = defaultGDB
0 Kudos
ChrisSnyder
Regular Contributor III
Try adding this code right before the function/child process is called:

import time time.sleep(1.1) #insurance newTempDir = r"C:\temp\gptmpenvr_" + time.strftime('%Y%m%d%H%M%S') os.mkdir(newTempDir) os.environ["TEMP"] = newTempDir os.environ["TMP"] = newTempDir


http://forums.esri.com/Thread.asp?c=93&f=1729&t=284041#881375

This is the trick I use to get multiple instances of arcpy (or gp) to run in parallel... Without this: Random crashes due to concurrent processes attempting to write to the same TEMP file(s) at the same time.

I use the subprocess module, so this might not (???) work with multiprocessing... Execute this code right before the child process is called, or in the child process, before you import arcpy.
0 Kudos
MichaelMitchell1
New Contributor III
csny490 - That's a really good idea.  Thanks!  I'll try that and report back.
0 Kudos
MichaelMitchell1
New Contributor III
That worked perfectly!  Thank you so much.  I've been struggling with this on and off for about 2 months.

I put it right inside my function that the pool calls. and changed it just a little.  I added my year string because multiple processes would try to create the directory with the same time.  Adding the variable passed to the end of it made sure that each one was unique.

Does the os.environ command set the temp directory for just the current process?

def doCalc(year):
        yearstr = str(year)
        import time
        time.sleep(1.1)
        newTempDir = r"C:\temp\gptmpenvr_" + time.strftime('%Y%m%d%H%M%S') + yearstr
        os.mkdir(newTempDir)
        os.environ["TEMP"] = newTempDir
        os.environ["TMP"] = newTempDir
0 Kudos
ChrisSnyder
Regular Contributor III
Does the os.environ command set the temp directory for just the current process?


It depends on where and how you call it. As soon as you call it in a script, it will change the variables just for that script... I think. I use the subprocess module to do parallel processing stuff, and rely on a parent script and a child script. In my experience, the child script will inherit the system variables of the parent script (whatever they may be) at the time the child script was called. 

BTW:
time.sleep(1.1) #sleep for 1.1 seconds

was designed to keep the folder names unique, since the folders are named with YYYYMMDDHHMMSS format.
0 Kudos
FilipKrál
Occasional Contributor III

Hello,

I have been setting os.environ["TMP"] and os.environ["TEMP"] to unique folders while multiprocessing for some time now and it worked until now. Now I have some code which keeps using the TMP as defined in "User defined variables" in computer properties. The script is too long and too secret to be presented here but I've been through it many times and could not find anything that should cause a problem.

I wrote a function to dump the os environ variables, workspace, scratch workspace, scratch folder, and scratch gdb to a file and I used this function in several places in the script. Each dump indicates that these variables are set as I intended, i.e. to the unique folders or geodatabases. When I run the script however, many things are written into the TMP folder defined in "User defined variables" and not to the unique folders.

I tried to troubleshoot it for several days but to this time I am not entirely sure why is this happening. However, based on what I have seen, I believe that setting os.environ["TMP"] and os.environ["TEMP"], workspace and scratch workspace to unique folders or geodatabases is necessary, but it is not always enough (in many cases workspace and scratch workspace can point to the same folder or geodatabase).

It seems that the crashing I've experienced is caused by spatial analyst tools that write temporary data to represent Raster objects into the temp folder as ESRI Grid (as mentioned in some previous posts). Unless the Raster objects are explicitly saved to a unique workspace or with a unique name, ArcGIS will likely crash. Therefore, it really depends how the functions your master scripts calls and any deeper functions are written. What I ended up doing in my scripts is something like:

...
shrunk_raster = arcpy.sa.Shrink(a_raster)
shrunk_raster.save(os.path.join(wd, 'shrunk')) # wd is a unique workspace for the process, consider using arcpy.CreateScratchName
...‍‍‍‍

The problem is that now I have to pass the unique workspace wd down to every function I call and some function may not allow that kind of parameter.

The bottom line is that the crashing would not occur if the deeper functions respected the TMP and TEMP variables in os.environ. Do you have any idea why the os.environ is not honoured?

Here is what remained in the TMP folder defined in computer properties once the whole process was over. The CitrixLogs folder is probably not from ArcGIS.

Filip.

0 Kudos
BruceHarold
Esri Regular Contributor

I use a pattern of subprocesses and independent workspaces, here:

http://www.arcgis.com/home/item.html?id=b3c7c6273ef54e91aa57a073aa873eca 

0 Kudos