Alternative to dissolve.

4630
19
02-22-2011 11:12 AM
OvidioRivero
New Contributor III
I have been trying to dissolve a very large polygon feature class.  This dataset is composed of about 3 million polygons that were generated by buffering points.   All attempts in arcmap have been unsuccesful.

I am trying to see if I can accomplish the same operation in Oracle.  Has anyone in this forum been able to do that?

My basic query I am trying to use now is very slow but it seems to produce the right number of records when I run it on a small dataset but it is slower than actually running the dissolve tool.  It would seem that dissolving directly on the database should be faster but it is not. This is my currect query.

select TRANSTECH,MAXADDOWN,MAXADUP,HOCONUM,HOCONAME,SDO_AGGR_UNION(
  MDSYS.SDOAGGRTYPE(c.shape, 0.005))SHAPE
   FROM ADDRESS_EXTRACT_FC_BUFFER_RI c
  group by TRANSTECH,MAXADDOWN,MAXADUP,HOCONUM,HOCONAME;


Any help will be greatly appreciated,
0 Kudos
19 Replies
VinceAngelo
Esri Esteemed Contributor
You'd probably want to try it in both environments.  I started writing some accumulator
capabilities into command-line tools, but got distracted by something for a client and
never got back to it.

It would probably be under 150 lines of code to just write some Python to query the
driver table in a loop, query based on the combination of attributes in the big table,
then collect the single-part points in a nested loop, then export the attributes with
a buffer of the point array.

- V
0 Kudos
VinceAngelo
Esri Esteemed Contributor
How many distinct values in each of your aggregation columns? 
How many rows in the tmp_union_driver table?

I could generate a random dataset for reproduction of this task.  If you file an incident on
the failure of dissolve, I can give your TS analyst a couple of tables to use.

- V
0 Kudos
OvidioRivero
New Contributor III
Thank you so much for helping us with this!   We do have an open incident about dissolve. The case number is #898667.

The numbers below are from our entire dataset.  Previously it was a small subset.  Let me know if this is the information you need.


select count(1) from tmp_union_driver;

COUNT(1)              
----------------------
404                   


Total from the NUM_ROWS column is 3339270

MIN_VERTS              AVG_VERTS              MAX_VERTS             
---------------------- ---------------------- ----------------------
37                     305824.2326732673267326732673267326732673 16070728
0 Kudos
VinceAngelo
Esri Esteemed Contributor
OK, so 300k vertices is quite large (as an average) and 16 million vertices (434k *parts*)
is likely to break things (including the 500k 2-D vertex limit for SDO_GEOMETRY).

It would probably be worthwhile to break out how many attribute combinations exceed
num_rows of 14169, since you'll need to handle them delicately.

- V
0 Kudos
VinceAngelo
Esri Esteemed Contributor
If you could run:
select count (distinct transtech) from tmp_union_driver;
select count (distinct maxaddown) from tmp_union_driver;
select count (distinct maxadup) from tmp_union_driver;
select count (distinct hoconum) from tmp_union_driver;
select count (distinct hoconame) from tmp_union_driver;

I could simulate a rough approximation of your dataset (uniform distribution
across all columns)

If you can zip up an ASCII dump of tmp_union_driver and have it added
to the incident, I can generate a random spatial component for the actual
data distribution.

- V
0 Kudos
OvidioRivero
New Contributor III
I am experimenting now with dissolving the points first and then buffering (using ArcGIS).  It worked really well for the initial sample and now I am trying it on a larger dataset.  


Answering your question.

select count(1) from tmp_union_driver where NUM_ROWS > 14169
COUNT(1)              
----------------------
44  


I knew about about the vertex limit on Oracle but why 14,169 is significant?

I will  create the ascii and send it to TS.


The counts you requested

COUNT(DISTINCTTRANSTECH)
------------------------
10                      

COUNT(DISTINCTMAXADDOWN)
------------------------
10                      

COUNT(DISTINCTMAXADUP)
----------------------
11                    

COUNT(DISTINCTHOCONUM)
----------------------
108                   

COUNT(DISTINCTHOCONAME)
-----------------------
108
0 Kudos
VinceAngelo
Esri Esteemed Contributor
512 * 1024 / 37 = 14269 (maximum number of multi-part points before a 37-vertex/part
buffer exceeds 1M ordinates)

- V
0 Kudos
VinceAngelo
Esri Esteemed Contributor
Do HOCONUM and HOCONAME correspond to each other, or are they independent?
0 Kudos
OvidioRivero
New Contributor III
Yes, one is an identifier and the other a name.
0 Kudos
OvidioRivero
New Contributor III
Vince I forwarded the table to TS. Thanks again for all yuour help. I  tried the dissolve first on points and then the buffer on one state. Not only was it successful it was quite fast. I am doiung it now on a copy of our nationwide dataset.

Since  we are going into a higher level of detail with this you can contact me via ESRI TS or email.

My email is:
ovidio.rivero@fcc.gov
0 Kudos