I think you have a couple options. The first thing you’ll want to do is clarify the question you’re trying to answer. Which census tracts, with at least N of “our health plan members”, are part of a statistically significant cluster of high prevalence (a hot spot)? If that’s your question, you’ll want to begin by removing all census tracts with fewer than N members from your analysis. If the tract denominator reflecting number of members is larger than your threshold (N), and your numerator reflecting the number of cases is 0, your prevalence rate is zero (which is accurate and valid). Keep in mind that hot spot analysis looks at each tract within the context of neighboring tracts. If many tracts won’t have neighbors because they don’t have at least N members, then ask yourself if you’re really looking for clusters of high prevalence after all ?? Are you trying to determine which tracts have higher than expected prevalence (if so, just map prevalence, but also see the disparity index suggestion below) or do you want to know where tracts with high prevalence cluster spatially (and where that clustering is statistically significant)? Hot spot analysis will show you statistically significant regions of high prevalence.
I’m thinking that mixing tracts WITH members, and tracts with NO members will complicate your analysis… you wouldn’t know for sure if a cold spot was cold because of clustering of low prevalence or because of clustering of low membership (a cluster of zeros because there aren’t N members), for example. You could, however, aggregate census tracts so that all your polygons (tracts or groups of tracts) have at least N members. Here is a case study that provides a workflow that might help you do that: https://desktop.arcgis.com/en/analytics/case-studies/linguistic-diversity-1-intro.htm
I’m thinking the best solution, however, might be to compute disparity indices. The disparity indices would identify where the disease was not distributed “fairly”/evenly based on health plan membership. You could then run hot spot analysis on the disparity indices if you choose to. Computing the disparity indices addresses 2 problems with rates: division by zero, and small numbers problem (a tract has 2 people, one gets the disease, so the rate is 50%, yikes!). Running hot spot analysis on the disparity indices addresses a third problem: the artificial nature of tract boundaries in relation to disease cases. [These three issues with rates are discussed here: https://desktop.arcgis.com/en/analytics/case-studies/locating-a-new-retirement-community.htm. That case study refers to the disparity index as “Level of Service”, but it’s the same thing. In this other learn lesson, the disparity index is used to see how equitably trees are distributed across race/ethnicity and susceptible populations: https://learn.arcgis.com/en/projects/shade-equity-determine-tree-planting-locations-with-suitability... ]
Oh, and if you decide to go the disparity index route, you can use all your tracts, even those with 0 or only a few members.
Basically, the disparity index expects a census tract with 2% of all your health plan members to be associated with 2% of all the cases. The formula is this:
For each tract compute: (Ci / All Cases) – (Mi /All Members)
Where Ci is the number of cases in the tract, and Mi is the number of members in the tract.
All Cases is the sum of cases for all tracts. All Members is the sum of members for all tracts.
A positive result means the proportion of cases is higher than the proportion of members (so a higher-than-expected rate/prevalence). A negative result is a lower-than-expected proportion of cases. When the case proportion matches the member proportion (the expectation), the result is zero.
When you run hot spot analysis on the indices, you’ll see hot spots in locations where positive indices cluster and cold spots where negative indices cluster.
I hope this helps, or at least gives you some ideas for other options.
Best wishes!
Lauren