I'm a graduate student, who is fairly new to the subject of spatial statistics. I'm trying to define a model explaining disease prevalence by looking at certain neighbourhood socio-economic variables, but whenever I put more than one variable in the model, the Jarque-Bera p-value gets very low. I've checked for linearity of my explanatory variables, using the scatterplot matrix, and after some tweaking and transformations, all my variables seem fairly linear. I've checked the histograms to see how the variables were distributed, and in most cases they are roughly normal, although sometimes the distribution is a little off, due to some tiny peak near zero, but to me and my supervisor it doesn't seem to be drastically wrong. I've tried creating the model in Geoda and the diagnostics for spatial dependance don't suggest that I should run a spatial lag or spatial error model, also there doesn't seem to be any heteroskedasticity.
Which makes me wonder how bad it is that the Jarque-Bera test keeps being significant. I remember that in my first year, the statistics professor taught us that for linear regression your data would ideally be normally distributed, but if you have a larger amount of cases (n>40) this criterium becomes much less important. Would this also be the case for running OLS in arcGIS or geoda or am I missing something? What else could I do to improve the Jarque-Bera value?
On another note: if the tests for heteroskedasticity in geoda suggest no heteroskedasticity, does this mean I have no reason to run a GWR? I tried a GWR with one of my models and the adjusted R-square and Akaike criterion were much better in the GWR....