Running multiple models at the same time

4185
5
11-09-2011 05:11 AM
TomGeo
by
Occasional Contributor III
Dear community,

I have several tens of thousands objects to buffer and than to intersect. The result of the intersection goes into a summary statistics process and the outcome is appended to a table in a FGDB.

The first thing that bugs me is that I cannot set a workspace for each and every model, instead the workspace is a global thing! 😞

The second thing is that ArcGIS10 is not capable of multi-threading! Dear ESRI Team most of us have to work with tons of data and we would like to use the possibilities of CPUs with two/four or eight cores available even in laptops.

Temporary solution for the ladder problem - Start multiple ArcGIS instances and bind each manually to specific cores! 😞

With this setup I can run three different models at the same time. When I start a fourth one I produce an error in the third instance. All of the sudden the third model is incapable to create some output. It might be the output of the buffer process or the output of the intersection. This error is reproducible.

Each model run has its own input folder, his own FGDB, his own ArcGIS instance and his own CPU core - Why isn't it working?


Another problem with running the setup described above is that another error occured, saying the geometry is not M-aware. The geometry is always fine as long as I do not start multiple models at the same time! The only solution I found here in the forum was to change the environment settings in the processing extent to 'Union of inputs'.

I would really appreciate any help in this matter. Later this week I will have to run this task on a eight core machine where I need every single core to get this job done in time.

A slightly disappointed Thomas
- We are living in the 21st century.
GIS moved on and nobody needs a format consisting out of at least three files! No, nobody needs shapefiles, not even for the sake of an exchange format. Folks, use GeoPackage to exchange data with other GIS!
0 Kudos
5 Replies
BruceHarold
Esri Regular Contributor
Hello Thomas

Are you able to share your data and model?  Big Data issues that we can reproduce are of interest; while I can't commit to a resolution timeframe I hopefully can take a look.  If you can share data and models please email me at bharold@esri.com and I'll set up an FTP location.

Regards
0 Kudos
DuncanHornby
MVP Notable Contributor

Thomas,

If you are happy to convert your models into Python code then you could take advantage of multi-core processors as described in this blog. It does mean you will have to stop working in the model builder environment.

Duncan

0 Kudos
ShitalDhakal__GISP
Occasional Contributor

The link is not working now  

0 Kudos
DuncanHornby
MVP Notable Contributor

I have corrected the link,. It broke with ESRI changing their website, fortunately the actual blog page still exists.

KimOllivier
Occasional Contributor III

You need to share more of your workflow for us to comment. My initial though is "Why do you need to run many parallel processes?" Why not do them all at once? It feels like you are 'Reinventing GIS'. By this I mean that the spatial tools are designed to run on whole datasets, so running a tool for each feature or even a group is very inefficient, if you are doing that. I can't tell. Partitioning is a good strategy if the data overloads the tool sometimes and in theory you could run in parallel, but I find that the partitioning is so successful that i just run the tool in a loop of a few partitions (not thousands) is good enough. My goal is to run the tool in a few minutes so that the total time is still reasonable.

I personally do not use Modelbuilder because I do not have enough control of intermediate results. They are always written out to a scratch geodatabase. In Python you can hold selections as views, use SQL queries, store sets in python dictionaries that are hashed arrays, use spatialite which is much faster for some operations using SQL and generally avoid some of the elegant but unscaleable standard tools. For example avoid any processing with a joined table.

I think that a change of approach can make you process run in the time to have a cup of coffee.

0 Kudos