19 Replies Latest reply on Feb 25, 2011 5:15 AM by orivero

    Alternative to dissolve.

    orivero
      I have been trying to dissolve a very large polygon feature class.  This dataset is composed of about 3 million polygons that were generated by buffering points.   All attempts in arcmap have been unsuccesful.

      I am trying to see if I can accomplish the same operation in Oracle.  Has anyone in this forum been able to do that?

      My basic query I am trying to use now is very slow but it seems to produce the right number of records when I run it on a small dataset but it is slower than actually running the dissolve tool.  It would seem that dissolving directly on the database should be faster but it is not. This is my currect query.

      select TRANSTECH,MAXADDOWN,MAXADUP,HOCONUM,HOCONAME,SDO_AGGR_UNION(
        MDSYS.SDOAGGRTYPE(c.shape, 0.005))SHAPE
         FROM ADDRESS_EXTRACT_FC_BUFFER_RI c
        group by TRANSTECH,MAXADDOWN,MAXADUP,HOCONUM,HOCONAME;


      Any help will be greatly appreciated,
        • Re: Alternative to dissolve.
          fmuyoba2010
          orivero,
          not sure whether your problem was dissolved but what one can normally do is to add a dummy field (preferrably short integer which will automatically be populated with zeros) and dissolve based on that dummy field. it should work.

          felix
          • Re: Alternative to dissolve.
            orivero
            Thank you for the tip but that is not what I want to do. I want to dissolve based on the five fields in the query above.  If I used a dummy field I would reduce the data to one polygon.

            Thanks,

            Ovidio
            • Re: Alternative to dissolve.
              vangelo-esristaff
              How many vertices were generated for each polygon?

              How many features (min/max/mean) are unioned by the compound key
              (and how many vertices exist in the unioned shapes)?  Is this simply
              a too-many-vertices issue?

              - V
              • Re: Alternative to dissolve.
                orivero
                With the dissolve tool it looks like the number of features is the problem, not the vertices per feature.
                The polygons are 37 vertices each.  They were generated by buffering points by 500 feet.
                I export up to about 200,000 features and I can dissolve them. Between 200,000 and 500,000 it works part of the time beyond 500,000 it runs for 10-17 hours and then it crashes.

                I will check the number of points in the sample output dataset and put them here. It will take me some time because I just deleted my test table.
                • Re: Alternative to dissolve.
                  vangelo-esristaff
                  I was afraid there would be 361 vertices per circle.  Should the circles with similar attributes
                  overlap, or are you making multi-part features? 

                  You can estimate the worst-case vertex count by doing a:

                  CREATE TABLE tmp_union_driver AS
                  SELECT transtech,maxaddown,maxadup,hoconum,hoconame,count(*) num_rows
                  FROM address_extract_fc_buffer_ri
                  GROUP BY transtech,maxaddown,maxadup,hoconum,hoconame
                  ORDER BY transtech,maxaddown,maxadup,hoconum,hoconame;

                  then

                  SELECT min(num_rows)*37 min_verts, avg(num_rows)*37 avg_verts, max(num_rows)*37 max_verts
                  FROM tmp_union_driver;

                  - V
                  • Re: Alternative to dissolve.
                    orivero
                    Vince,

                    I get this from your query

                    MIN_VERTS              AVG_VERTS              MAX_VERTS             
                    ---------------------- ---------------------- ----------------------
                    37                     31228                            121360                

                    With my query I was expecting to get multipart features and later I was going to make them single part.  The user I am doing this for wanted both.

                    121360  is a lot of vertices. You think that dissolve  crashes because the output has too many vertices?  If that is the case; how can I limit that?
                    • Re: Alternative to dissolve.
                      vangelo-esristaff
                      120k vertices is a lot, but it's smaller than Russia in most 1:1m COUNTRIES tables.

                      I wonder if you can't try the processing the other direction, and union the points
                      into multi-part shapes, then buffer the points.

                      - V
                      • Re: Alternative to dissolve.
                        orivero
                        I checked my layer in Oracle. Four polygons, 3 polygons had 37 vertices, 1 had 18344. Before dissolving using my query I had 3336.
                        • Re: Alternative to dissolve.
                          orivero
                          I can try what you suggest.  Let me see if I understood it right.

                          1) Union the points based on the attributes. That would make them multipart.
                          2) Buffer the result?

                          That would be using the tools in ArcGIS not a sql query?
                          • Re: Alternative to dissolve.
                            vangelo-esristaff
                            You'd probably want to try it in both environments.  I started writing some accumulator
                            capabilities into command-line tools, but got distracted by something for a client and
                            never got back to it.

                            It would probably be under 150 lines of code to just write some Python to query the
                            driver table in a loop, query based on the combination of attributes in the big table,
                            then collect the single-part points in a nested loop, then export the attributes with
                            a buffer of the point array.

                            - V
                            • Re: Alternative to dissolve.
                              vangelo-esristaff
                              How many distinct values in each of your aggregation columns? 
                              How many rows in the tmp_union_driver table?

                              I could generate a random dataset for reproduction of this task.  If you file an incident on
                              the failure of dissolve, I can give your TS analyst a couple of tables to use.

                              - V
                              • Re: Alternative to dissolve.
                                orivero
                                Thank you so much for helping us with this!   We do have an open incident about dissolve. The case number is #898667.

                                The numbers below are from our entire dataset.  Previously it was a small subset.  Let me know if this is the information you need.


                                select count(1) from tmp_union_driver;

                                COUNT(1)              
                                ----------------------
                                404                   


                                Total from the NUM_ROWS column is 3339270

                                MIN_VERTS              AVG_VERTS              MAX_VERTS             
                                ---------------------- ---------------------- ----------------------
                                37                     305824.2326732673267326732673267326732673 16070728
                                • Re: Alternative to dissolve.
                                  vangelo-esristaff
                                  OK, so 300k vertices is quite large (as an average) and 16 million vertices (434k *parts*)
                                  is likely to break things (including the 500k 2-D vertex limit for SDO_GEOMETRY).

                                  It would probably be worthwhile to break out how many attribute combinations exceed
                                  num_rows of 14169, since you'll need to handle them delicately.

                                  - V
                                  • Re: Alternative to dissolve.
                                    vangelo-esristaff
                                    If you could run:
                                    select count (distinct transtech) from tmp_union_driver;
                                    select count (distinct maxaddown) from tmp_union_driver;
                                    select count (distinct maxadup) from tmp_union_driver;
                                    select count (distinct hoconum) from tmp_union_driver;
                                    select count (distinct hoconame) from tmp_union_driver;

                                    I could simulate a rough approximation of your dataset (uniform distribution
                                    across all columns)

                                    If you can zip up an ASCII dump of tmp_union_driver and have it added
                                    to the incident, I can generate a random spatial component for the actual
                                    data distribution.

                                    - V
                                    • Re: Alternative to dissolve.
                                      orivero
                                      I am experimenting now with dissolving the points first and then buffering (using ArcGIS).  It worked really well for the initial sample and now I am trying it on a larger dataset.  


                                      Answering your question.

                                      select count(1) from tmp_union_driver where NUM_ROWS > 14169
                                      COUNT(1)              
                                      ----------------------
                                      44  


                                      I knew about about the vertex limit on Oracle but why 14,169 is significant?

                                      I will  create the ascii and send it to TS.


                                      The counts you requested

                                      COUNT(DISTINCTTRANSTECH)
                                      ------------------------
                                      10                      

                                      COUNT(DISTINCTMAXADDOWN)
                                      ------------------------
                                      10                      

                                      COUNT(DISTINCTMAXADUP)
                                      ----------------------
                                      11                    

                                      COUNT(DISTINCTHOCONUM)
                                      ----------------------
                                      108                   

                                      COUNT(DISTINCTHOCONAME)
                                      -----------------------
                                      108
                                      • Re: Alternative to dissolve.
                                        vangelo-esristaff
                                        512 * 1024 / 37 = 14269 (maximum number of multi-part points before a 37-vertex/part
                                        buffer exceeds 1M ordinates)

                                        - V
                                        • Re: Alternative to dissolve.
                                          vangelo-esristaff
                                          Do HOCONUM and HOCONAME correspond to each other, or are they independent?
                                          • Re: Alternative to dissolve.
                                            orivero
                                            Yes, one is an identifier and the other a name.
                                            • Re: Alternative to dissolve.
                                              orivero
                                              Vince I forwarded the table to TS. Thanks again for all yuour help. I  tried the dissolve first on points and then the buffer on one state. Not only was it successful it was quite fast. I am doiung it now on a copy of our nationwide dataset.

                                              Since  we are going into a higher level of detail with this you can contact me via ESRI TS or email.

                                              My email is:
                                              ovidio.rivero@fcc.gov