I have several point data sets representing: A. high school students who bicycle to school; B. high school students who don't bicycle to school; C. features of the road network that discourage travel by bicycle, and D. features that encourage cycling.
When mapped, the above features appear to be related, ie. there are more students who bicycle where there are more encouraging/less discouraging features and more students who don't bicycle where there are more discouraging/less encouraging features (see attached image).
Legend - Light blue = students who cycle
- light brown = students who don't cycle
- red/pink = discouraging features
- green = encouraging features
My question is how do I prove this statistically?
My advice would be to consult with a Statistician if possible, as this can get complex fast.
If you have access to the GeoStatistical Analyst Extension, before you see them, read through this document (see link below) to get a sense of what statistical processes are available in GIS and what the workflows typically are. Note that the link is for the 9.0 Version, so will not exactly match the 10.x flavors, but the processes are still the same.
Also, be aware that the most efficient analysis may involve a combination of non-GIS statistical software and GeoStatistical Analyst.
Chris Donohue, GISP
Things are not proven statistically. There is a big difference between correlation and causation: some weblinks are:
and my most favorite
You probably have a strong supposition that factors that may make bicycle traffic more attractive to individuals might lead to increased bicycle usage amongst the population.
For lack of a better grouping term...let's call them the bikers versus the non-bikers. The factors to address are:
So prior to beginning down the path of using tools in ArcMap that will give the impression of correlation or association, I would address the basis of your problem and the nature of your data to avoid having to eat a lot of cheese should it be desired to get a PhD in engineering. Supporting this is a good book
Amazon.com: How to Lie with Statistics (9780393310726): Darrell Huff, Irving Geis: Books
and one of my favorites
How to Lie with Maps, Monmonier
Good luck on your data explorations.