interpreting Jarque-Bera value

5484
2
05-30-2012 04:57 AM
AlettaDijkstra
New Contributor
I'm a graduate student, who is fairly new to the subject of spatial statistics. I'm trying to define a model explaining disease prevalence by looking at certain neighbourhood socio-economic variables, but whenever I put more than one variable in the model, the Jarque-Bera p-value gets very low. I've checked for linearity of my explanatory variables, using the scatterplot matrix, and after some tweaking and transformations, all my variables seem fairly linear. I've checked the histograms to see how the variables were distributed, and in most cases they are roughly normal, although sometimes the distribution is a little off, due to some tiny peak near zero, but to me and my supervisor it doesn't seem to be drastically wrong. I've tried creating the model in Geoda and the diagnostics for spatial dependance don't suggest that I should run a spatial lag or spatial error model, also there doesn't seem to be any heteroskedasticity.

Which makes me wonder how bad it is that the Jarque-Bera test keeps being significant. I remember that in my first year, the statistics professor taught us that for linear regression your data would ideally be normally distributed, but if you have a larger amount of cases (n>40) this criterium becomes much less important. Would this also be the case for running OLS in arcGIS or geoda or am I missing something? What else could I do to improve the Jarque-Bera value?



On another note: if the tests for heteroskedasticity in geoda suggest no heteroskedasticity, does this mean I have no reason to run a GWR? I tried a GWR with one of my models and the adjusted R-square and Akaike criterion were much better in the GWR....
0 Kudos
2 Replies
boonejardot
New Contributor
I have not taken graduate level statistics, but I do know that if your variables are linear, OLS regression is a good option.  Furthermore, If the T-Statistic within the OLS regression comes back as significant, running GWR is the next step.

If the T-statistic is not significant from OLS regression, You cannot justify running GWR.  Hope this helps.
0 Kudos
AlettaDijkstra
New Contributor
Thank you both for your replies. In the mean time I have changed my prevalence data (from prevalence in the entire population to prevalence in the age group that is most at risk) and now there are combinations of variables that seem to generate a plausible model, passing all criteria 🙂

I am still unsure whether or not to continue with a GWR analysis or not. On one hand the tests for heteroskedasticity don't suggest nonstationarity, but on the other hand it improves the model in parts of the study area. However, I would like to test some categorical variables as well, but I understand these can't be part of a GWR?
0 Kudos