Grouping linear data into polygons by percentages

866
8
08-21-2017 09:12 AM
MaryElliott
New Contributor III

The linear data's values are either A or B.  The percentage of A and B within each polygon should be roughly equivalent across all polygons.  So if polygon 1's linear features are 36% A and 64% B, polygons 2-5 should also have nearly the same ratio of A and B.  Ideas on how to create these polygons efficiently?

0 Kudos
8 Replies
DanPatterson_Retired
MVP Emeritus

Show an image of what you expect, or at least the geometry you are working with.

Linear features?  

  • Do you have them? 
  • Do you want to create them?
  • Do they have to be located in any specific way within your polygons?
  • Is this like .... create random points in polygons.... but you want lines?

Any pertinent information would be useful

0 Kudos
MaryElliott
New Contributor III

Thanks for the inquiry, Dan. It wasn’t particularly easy to describe concisely.

I have linear data, with the key attributes: A. lane miles (VMT) and B. Jurisdiction.

Goal: to create 4-6 polygons grouping the linear data.

Each resulting polygon should contain line segments where the percentage of juris A to pct of Juris B is roughly 80/20.

The 80/20 is the ratio of VMT segments in Juris A to Juris B.

Count VMT in Juris A / Total VMT = 79% and Count VMT in Juris B / Total VMT = 21%.

Oh, and the polygons should be compact.

I’ve just been selecting a group of segments by drawing a box around them, and running Statistics on the total, then on the Juris A & B totals, which I convert to pcts. And repeat to adjust to try to get closer to the 80/20 goal. It’s a painful way to do this.

Any help would be appreciated. Thank you!

Mary

0 Kudos
DanPatterson_Retired
MVP Emeritus

Double trouble... you want to do some form of cluster analysis which has spatial, distribution AND attribute constraints.  If there is no spatial constraints, k-means options in some forms of cluster analysis help.  The distribution adds another layer of complexity which can be handled by some. And finally, they all have to be clustered into regions as compact as possible  which could/will give rise to gaps between the clusters... which is ok.

It would be useful to see what you have, perhaps symbolized into class ratios  ie 90/10, 80/20, 70/30 + ??/??  with the last class being the absolute... If I am desparate to use... class.  You could then prune out/remove the rest to simplify the analysis and the visual.

So if you have that visual it might suggest a strategy other than the one you are currently using. 

MaryElliott
New Contributor III

My apologies, forgot this part: the number of segments in each polygon should also be roughly equalivalent. More leeway on the total segment numbers per polygon than the ratio between Juris A and B VMTs. The ratio is more important than simple counts.

Mary

0 Kudos
MaryElliott
New Contributor III

Thank you, Gentlemen, for you responses. And my apologies for the delay in getting back to you. Never rains but it pours . . . . . .

It is a bit difficult to talk in abstract terms sometimes, the full flavor of the problem may be lost.

The data at hand is centerline data. Each road is represented by a single centerline, divided lanes (center physical dividers of some sort) represented by 2 centerlines. This centerline has been divided wherever the number of lanes changes. For each segment thus divided, the VMT (vehicle miles traveled) was computed. The other variable is jurisdiction, referring to which entity maintains the road: County, State, municipality, federal, etc. This variable is used in an either/or way; either County maintained, or not County maintained. Thus the State, city, federal are considered together as a single jurisdiction.

The goal is to create 4 to 6 zones in the County (the universe) wherein the percentage of all the road segments contained in the zone are composed of 20% County maintained and 80% maintained by the rest.

Zones are to be as compact as possible. I hope that helps to clarify the task, should there have been any haziness about it.

One suggestion was to, I believe, use the Grouping Analysis. This would require converting the segments to points or polygons - reading the discussion for this tool seems to restrict applying it to those 2 feature types.

Not sure how connectivity would be maintained with points, and cleanup with polys would take ages.

Having said that, there may well be options or techniques that work around those issues.

How to then work the 20 / 80 ratio into the parameters?

Thank you, Gentlemen.

0 Kudos
FelixPertziger
New Contributor III

Will you succeed with this or not greatly depends on relative spatial pattern of classes A and B.

It is hard to recreate your lines, so I generated 1000 random points and randomly selected 20% of them.

To those 20% of the points (shown as blue points) I assigned weight of 4, the rest (red points) received the weight of 1.

As a next step I computed proximity polygons and continued with grouping technique described multiple time on GIS Stack Exchange, e.g. here attempting to create 4 continious groups with approximately the same total of weights. After that I computed ratio of different classes per group using fractions in Excel:

Experiment results

As one can see from the table inside picture I managed to get close enough to magic ratio, but this is an ideal situation in terms of spatial pattern...

Perhaps you can try technique and script from hyperlink. Start with placing evenly distributed points on your lines and estimating their ratio accross the area of interest. This will help you to assign weights. Good luck.

0 Kudos
MaryElliott
New Contributor III

Thank you, Gentlemen, for you responses. And my apologies for the delay in getting back to you. Never rains but it pours . . . . . .

It is a bit difficult to talk in abstract terms sometimes, the full flavor of the problem may be lost.

The data at hand is centerline data. Each road is represented by a single centerline, divided lanes (center physical dividers of some sort) represented by 2 centerlines. This centerline has been divided wherever the number of lanes changes. For each segment thus divided, the VMT (vehicle miles traveled) was computed. The other variable is jurisdiction, referring to which entity maintains the road: County, State, municipality, federal, etc. This variable is used in an either/or way; either County maintained, or not County maintained. Thus the State, city, federal are considered together as a single jurisdiction.

The goal is to create 4 to 6 zones in the County (the universe) wherein the percentage of all the road segments contained in the zone are composed of 20% County maintained and 80% maintained by the rest.

Zones are to be as compact as possible. I hope that helps to clarify the task, should there have been any haziness about it.

One suggestion was to, I believe, use the Grouping Analysis. This would require converting the segments to points or polygons - reading the discussion for this tool seems to restrict applying it to those 2 feature types.

Not sure how connectivity would be maintained with points, and cleanup with polys would take ages.

Having said that, there may well be options or techniques that work around those issues.

How to then work the 20 / 80 ratio into the parameters?

Thank you, Gentlemen.

0 Kudos