Mean Center vs. Median Center

8346
3
Jump to solution
08-15-2016 12:36 AM
FaisalBasudan1
New Contributor

I have a points dataset. I want to identify the cluster area first, and then, find the center of this cluster area.

I was thinking to use "K-Means", but I'm working on a single cluster, so I believe this method is not fitting with my case.

Now, I'm thinking to use either "Mean Center" or "Median Center" tool after identifying the cluster area by using the "Optimized Hot Spot Analysis". However, I'm confused between the mean and median. I'm dealing with only locations without any required weights, which means that I just want to find the center location of the points.

Which one of the two methods is the most proper and accurate for my case? And why?

* I read a lot about them, but I couldn't find the answer for my case.

* In the attachment, you can see the dataset and the cluster area marked by the red circle.

Thank you in advance

0 Kudos
1 Solution

Accepted Solutions
DanPatterson_Retired
MVP Emeritus

the mean center is going to be the arithmetic average of the coordinates which will be influenced to a greater degree than the median because of that outlier to the right/east The median can be calculated in several ways, but typically it is the middle value of the ranked X and Y coordinates, hence, outliers have less impact on the value.  If you are looking to get your measure of centrality in the ellipse you have identified, there is no guarantee that you will get it with either measure.  In such cases where you have outliers, one can use a trim mean which looks at 95% of the data points with 2.5% trimmed off the extremes from the sorted list of X and Y, this is not implemented in ArcMap.

Other alternatives, although less employeed, would be to produce a Delaunay triangulation (TIN) and determine the area of the triangles.  A sorted list of their areas would identify areas  from which you could select the points as candidates for the mean or median, after trimming the triangle list so that 90% or so of the area is represented..  You can do the same with successive removals of convex hulls.  determine the convex hull, remove the points on the hull and recalculate until you are left with a certain percentage of area (perhaps 90% or even 50%)

Centrality has no 'accurate' measure, only 'best'... so in short, go with the median, or if you want to trim, do a trim median.

Other options are possible but more esoteric.

View solution in original post

3 Replies
DanPatterson_Retired
MVP Emeritus

the mean center is going to be the arithmetic average of the coordinates which will be influenced to a greater degree than the median because of that outlier to the right/east The median can be calculated in several ways, but typically it is the middle value of the ranked X and Y coordinates, hence, outliers have less impact on the value.  If you are looking to get your measure of centrality in the ellipse you have identified, there is no guarantee that you will get it with either measure.  In such cases where you have outliers, one can use a trim mean which looks at 95% of the data points with 2.5% trimmed off the extremes from the sorted list of X and Y, this is not implemented in ArcMap.

Other alternatives, although less employeed, would be to produce a Delaunay triangulation (TIN) and determine the area of the triangles.  A sorted list of their areas would identify areas  from which you could select the points as candidates for the mean or median, after trimming the triangle list so that 90% or so of the area is represented..  You can do the same with successive removals of convex hulls.  determine the convex hull, remove the points on the hull and recalculate until you are left with a certain percentage of area (perhaps 90% or even 50%)

Centrality has no 'accurate' measure, only 'best'... so in short, go with the median, or if you want to trim, do a trim median.

Other options are possible but more esoteric.

FC_Basson
MVP Regular Contributor

The Directional Distribution tool (Directional Distribution (Standard Deviational Ellipse)—Help | ArcGIS for Desktop ) might help eliminate the outliers.  It also provides the centroid coordinates of the ellipse indicating the distribution.

DanPatterson_Retired
MVP Emeritus

Yes that is one of the possibilities, although affected by outliers as well, it would perform a point based approach. The reducing convex hulls would produce similar results as the SDE, but would be areal based rather than distance to centrality based.  As indicated, there is no 'ONE' measure of centrality.  For instance, you could construct a Spanning Tree http://www.arcgis.com/home/item.html?id=6ce9db93533345e49350d30a07fc913a  and find the middle point of the tree which would represent the place which minimizes the connectedness of all points.

Bounding Containers with central measure in a minimum area bounding rectangle or 

Voronoi/Delaunay could be used if one wanted to peal away the outer layers of a voronoi or delaunay triangulation and find the center of mass/area.