Quickest Way to Spatial Join 250000 Features

6949
4
11-25-2010 06:09 AM
AndrewJoseph
New Contributor
Hello,

I am trying to look at neighborhood landuse charactersitics within a quarter mile around all landuse polygons in a database of roughly 250,000 polygons.

I'm using an Intel CORE I7 Vpro with 64 bit and a total of 8 processors with arc10, but even it seems unable to manage spatial joining this dataset even after I have split the file up into feature classes of only 35,000 each! Three days of my existence have been wasted on this problem. It seems to get through the entire process but then cannot write the attribute table.

So, I was wondering if anyone had any ideas about the best way to spatial join a dataset of this size, and what factors besides size of dataset affect spatial join speed? 

On a related note, is it faster to first create quarter mile buffers around each landuse polygon, or simply specify a quarter mile  search radius in the join dialog? I also have a dataset consiting of points that are 660 ft apart along the street network, would it be quicker just to join landuses a quarter mile from these points than the landuse polygons themselves?
0 Kudos
4 Replies
KimOllivier
Occasional Contributor III
It's a good question we all ask ourselves every day.

I have the "Cup of Coffee Rule". -If any single process takes longer than a cup of coffee, then interrupt the process and Find a Better Way.

I don't bother to run any process for days, it will not be likely to produce anything useful, as you have found. Too much is done in memory, then virtual memory, then paging to disk and then there is none left to write out the results. This happens to me when I try to create a large XY event table from a route, but it only takes an hour to crash! I watch the CPU spinning with no I/O. Its obviously a very poor algorithm.

I'm not quite sure why you are using a spatial join, presumably using the interactive join tool?
Have you calculated the likely size of that output? 250,000 x 250,000 = 6.25E10 ! Yes, it would be a good idea to limit the comparisons to a buffer if you can.

It's a worry that you cannot get a process to work on a sample of 35,000 polygons. That should work in minutes, so you are missing something.To just give you a benchmark time I ran a union between 33,000 parcels with a 428,000 polygon landcover dataset as the hardest process I could think of with similar sizes. That gives me a parcel-landcover combined set of  polygons with all attributes for further statistical summary in 5 minutes.

Tune your PC. I have a similar machine, but I know that 8 CPU's are irrelevant, only one is used. Do you have heaps of virtual memory on a separate disk in a contiguous partition, or even better a dedicated disk for it? Do you have a local scratch workspace defined as a file geodatabase (not just a folder). Have you got a RAID 0 array running at 10,000RPM? Is there plenty of free disk, unfragmented. Since the process is equivalent to an old clean, you probably still need 13 times the space of the source before starting. Is all your data in local file geodatabases, not on a network or in SDE? Dragging data across to do analysis is likely to be unsuccessful.

Tune your data. Have you run a Repair Geometry on the data first? Check for uniform size of polygons. If there are some very large ones, split them up. At 10 there is a dice tool to automate this. Do you have multi-part polygons? These are a great evil for geoprocessing. Explode them first.

Be reasonable with your requests. ArcGIS does not often warn you if you have not thought through the task. You might get a hint that a join does not have an attribute index, but not that you do not have a spatial index or simply multiplying the two sets to give you a predicted output size.

If I can recast my problem into a simpler one which does not requires polygon splitting then I do. eg Will a point in polygon do? Extract one layer as centroids. Not the same, but maybe good enough.

Finding a better way would be to use another Python module or another tool such as FME.
0 Kudos
ChrisSnyder
Regular Contributor III
Make 1/4 mile buffers.

Intersect the buffers with your landuse polygons (using the Intersect tool).

What happens?

Do you see something to the effect of "Tiling Dataset" in the result info?
ColinLang1
New Contributor III

This is the tiny piece of information I've been searching for, for months.  Screw the spatial join, Intersect produces a similar enough result, with a massive performance improvement.  I just made a script that takes less than 2 minutes and replaces my previous effort that used spatial joins, that took 16 hours.

Thanks!

EdMontano
New Contributor
Just ran into this problem as well. I read some of these hints in the help file in the Spatial Join dialog box.

Make sure all your feature classes are in the same geodatabase.
I am not sure if this helped, but I indexed attribute tables and I spatially indexed the feature classes themselves.

File geodatabases seem to be faster and don't run into the size problems that a personal geodatabases runs into.

Oh and run it in ArcCatalog, not ArcMap!