Point Density Interpretation help please for bike accidents, and question on it pls

Anonymous User · ‎05-03-2013

Original User: Gillyh

Hello,

I have 136 bicycle accidents mapped. I perform the point density tool using square miles. My results:

0-5
5-14
14-23
23-32
32-41
41-50
50-59
59-68
68-77
77-86
86-231.

How do I interpret these results? For instance, I think of density as 0-5 accidents per square mile as any other density reading but my results go way above 136 (my total number of points), so I must be wrong. How would I interpret the results above in a sentence in relation to the number of bike accidents?

Also for classification, what is 1/2, 1/3, and 1/4 standard deviation? Can someone explain this to me? I have a decent knowledge of how standard deviations work like 1, 2, or 3 standard deviations away from the mean but being in fractions, I don't think this is the same. What is 1/3 standard deviation? What's that mean?

A great much thanks for any help. Please

Anonymous User · ‎05-29-2013

Original User: iemba

There is a new blog post that just came out about this subject matter; you can find it here: http://blogs.esri.com/esri/arcgis/2013/05/28/how-should-i-interpret-the-output-of-density-tools/.

Here are some links to the density tool help: http://resources.arcgis.com/en/help/main/10.1/index.html#//009z0000000w000000
http://resources.arcgis.com/en/help/main/10.1/index.html#/How_Point_Density_works/009z00000013000000...

This is not my strong spot with ArcGIS so I hope the links will help.

Good luck!
-Amber

WilliamHuber · ‎05-29-2013

...I think of density as 0-5 accidents per square mile as any other density reading but my results go way above 136 (my total number of points), so I must be wrong.

You're probably fine. As you point out, these are counts per square mile, not counts themselves. They can be extremely high over very small areas. The integral of the density ought to total 136 (or near enough). For an illustrated explanation of this in one dimension, please see http://stats.stackexchange.com/questions/4220/a-probability-distribution-value-exceeding-1-is-ok/422... (which discusses probability densities, but it's exactly the same idea).

Also for classification, what is 1/2, 1/3, and 1/4 standard deviation?

A standard deviation is just another number exactly like your data. Think of it as a data-centric unit of measurement. So, just as length can be measured in different units (Angstroms, furlongs, rods, meters, parsecs, ...), so can anything else.

For instance, if you have a collection of length measurements and their standard deviation is 1.36 meters, then on a standard deviation scale your "unit" is exactly 1.36 meters long. Ergo, 1/4 SD is equivalent to 1.36/4 = 0.34 meters, 1/2 SD is 0.68 meters, and so on.

It works the same way for counts per unit area as it does for length.

Why use the SD as a unit of measurement? Because for many datasets about two-thirds of the values will be within one SD (one unit) of the average, 95% will be within two SDs of the average, and the vast majority will be within three SDs of the average (the "68-95-99.7" rule). If your dataset does not behave like this, it is exceptional: and that is interesting. So data-centric units of measurement can be useful for quickly diagnosing the behavior of the data (and in the hands of an experienced data analyst, they will suggest ways of re-expressing the data to reveal more information).

As an application, when choosing regular intervals of 1/3 SD for the class breaks, you would guess that no more than ten or so classes (ten * 1/3 = 3 1/3) above the average and no more than ten or so classes below the average would be needed to display the full range of data. You would also anticipate that most of the data would fall into the middle six to ten classes, with the remaining 14 to 10 classes (respectively) devoted to displaying the upper and lower extremes.

There's also another rule: for all datasets, it is impossible for more than a quarter of the values to be more than 2 SDs from the average (the arithmetic mean) or more than a ninth to be more than 3 SDs from the average or ... or more than 1/n² of the data to be more than n SDs from the average. (This is Chebeyshev's Inequality.) This provides upper limits on how many data possibly could fall into various classes defined by multiples of an SD.

Thus, by using various small multiples of the SD to set class breakpoints when symbolizing data, you can have foreknowledge of the possible amounts of data within those classes and thereby anticipate and control the appearance of the map.