How to test for relationships between point data sets

GeorgeWilliams · ‎03-04-2015

I have several point data sets representing: A. high school students who bicycle to school; B. high school students who don't bicycle to school; C. features of the road network that discourage travel by bicycle, and D. features that encourage cycling.

When mapped, the above features appear to be related, ie. there are more students who bicycle where there are more encouraging/less discouraging features and more students who don't bicycle where there are more discouraging/less encouraging features (see attached image).

Legend - Light blue = students who cycle

- light brown = students who don't cycle

- red/pink = discouraging features

- green = encouraging features

My question is how do I prove this statistically?

ChrisDonohue__GISP · ‎03-05-2015

My advice would be to consult with a Statistician if possible, as this can get complex fast.

If you have access to the GeoStatistical Analyst Extension, before you see them, read through this document (see link below) to get a sense of what statistical processes are available in GIS and what the workflows typically are. Note that the link is for the 9.0 Version, so will not exactly match the 10.x flavors, but the processes are still the same.

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=7&ved=0CEwQFjAG&url=http%3A%2F%2Fdusk2....

Also, be aware that the most efficient analysis may involve a combination of non-GIS statistical software and GeoStatistical Analyst.

Chris Donohue, GISP

DanPatterson_Retired · ‎03-05-2015

Things are not proven statistically. There is a big difference between correlation and causation: some weblinks are:

undergraduate education - Impressive common misleading interpretations in statistics to make student...

regression - Datasets constructed for a purpose similar to that of Anscombe's quartet - Cross Valida...

and my most favorite

Spurious Correlations

You probably have a strong supposition that factors that may make bicycle traffic more attractive to individuals might lead to increased bicycle usage amongst the population.

For lack of a better grouping term...let's call them the bikers versus the non-bikers. The factors to address are:

where is the school?
how were the groups identified? (did they self identify or were there some criteria to be met?)
are they exclusionary groups? (what about the "fair-weather" biker?)
how far does each group have to travel?
did they travel to the school exclusively by their chosen mode of transport?
etc etc

So prior to beginning down the path of using tools in ArcMap that will give the impression of correlation or association, I would address the basis of your problem and the nature of your data to avoid having to eat a lot of cheese should it be desired to get a PhD in engineering. Supporting this is a good book

Amazon.com: How to Lie with Statistics (9780393310726): Darrell Huff, Irving Geis: Books

and one of my favorites

How to Lie with Maps, Monmonier

Good luck on your data explorations.