Geoprocessing behind reports

4446
13
03-29-2013 09:22 AM
JulieKanzler
New Contributor II
Hi all,

I have checked the API reference and the Concepts reference for the Community Analyst API, but I can't see anything that describes the geoprocessing routine that generates the reports. In order to use the API, I need to know what assumptions are being made. For example, in this sample:

http://help.arcgis.com/en/communityanalyst/apis/flex/samples/index.html?sample=SummaryReports

are the values merely a summary of the census geographies that fall under the selected polygon, or is a clip taking place? If the latter, is there an assumption that population is regular across the entire census polygon, or is population apportioned based on other variables, e.g., land use.

I suspect I'm just missing a key reference that documents how BA works - can someone please point me in the right direction???

TIA,
Julie
0 Kudos
13 Replies
TonyHowser
Esri Contributor
Hi all,

I have checked the API reference and the Concepts reference for the Community Analyst API, but I can't see anything that describes the geoprocessing routine that generates the reports. In order to use the API, I need to know what assumptions are being made. For example, in this sample:

http://help.arcgis.com/en/communityanalyst/apis/flex/samples/index.html?sample=SummaryReports

are the values merely a summary of the census geographies that fall under the selected polygon, or is a clip taking place? If the latter, is there an assumption that population is regular across the entire census polygon, or is population apportioned based on other variables, e.g., land use.

I suspect I'm just missing a key reference that documents how BA works - can someone please point me in the right direction???

TIA,
Julie


Hi Julie. 

This is a very good question and I can see that you are doing your due diligence.  I am happy to say that the overall discussion revolves around one of Esri's distinctive competencies with respect to spatial analysis and data apportionment.

At a high level, this essentially leverages the concept of dasymetric interpolation in order to provide very accurate estimates of population/households and their associated demographics, lifestyle, consumer characteristics, spending, market potential, etc.

Our underlying methodology goes way beyond what a simple polygon intersection may provide because we support truly variable study areas (which may only cover a part of a standard geography/administrative boundary area like a ZIP code, county, Census tract, Census Block Group, Japanese prefecture, Russian oblast, UK postal code, etc.  e.g. What if my single study area crossed over multiple ZIP codes and only a part of each of them? ).  On top of this, we also take into account the variability of population/household distributions within these administrative geographies because households and populations are not uniformly distributed across areas (e.g. The urban core of metropolitan area may have a far greater population density than its outlying suburbs even though they may be in the same county or jurisdiction so...If my custom study area covered 50% of a county, I can't simply say that 50% of the population of the county lie in the queried study area because that assumes a uniform distribution of population---regretfully, many third-party solutions make this assumption which generally results in less accurate results.)  

These represent some of the enhancements in our underlying data apportionment "engine" which enable us to quickly and accurately provide estimates and projections for ad hoc or arbitrary study areas such as drive time polygons, manually digitized/drawn polygons, and custom regions or areas.

Hope this was helpful and sheds light on why we believe our content and analysis represent an extreme value add to your use cases and workflows.

Thanks,
Tony
0 Kudos
JulieKanzler
New Contributor II
Tony,

Thanks for the response! It sounds like you have a defensible approach, but I still would like to know a bit more. Every analytical result has caveats and pitfalls, and it's difficult to really know what decisions can be made based on data that isn't qualified by a clear understanding of the algorithm that produced it. I know that dasymetric mapping comes in many flavors, especially depending on the inputs you select for your interpolation. Do you have a whitepaper that summarizes the techniques you're using? Perhaps, at least some peer reviewed articles referencing the key components of your algorithm(s)?

Thanks again!
Julie
0 Kudos
Jason_RobinsonRobinson
Esri Regular Contributor
Tony,

Thanks for the response! It sounds like you have a defensible approach, but I still would like to know a bit more. Every analytical result has caveats and pitfalls, and it's difficult to really know what decisions can be made based on data that isn't qualified by a clear understanding of the algorithm that produced it. I know that dasymetric mapping comes in many flavors, especially depending on the inputs you select for your interpolation. Do you have a whitepaper that summarizes the techniques you're using? Perhaps, at least some peer reviewed articles referencing the key components of your algorithm(s)?

Thanks again!
Julie


Julie,

You find the below link to the Business Analyst 10.1 Desktop online Help useful as it details how the software summarizes data.  There are several methods that allow to decide how data is summarized spatially.  I believe BAO uses a method which is like the Hybrid that is detailed but I find the section on block apportionment vs. cascading to me the most informative.

http://resources.arcgis.com/en/help/main/10.1/#/Data_tab/000z000000v7000000/

Regards,
Jason R.
0 Kudos
JulieKanzler
New Contributor II
Hi Jason,

Thanks! I can see how the ArcGIS Desktop help could provide some clues here, because I'm guessing that everything is accessing the same server resources. But I'm just not sure how to rectify your input with what Tony provided. I was mainly asking about the SummaryReportsTask in the API, but I am also interested to learn more about the various ways that data can be pulled. How do the API, BAO, and the Extension relate to one another?

Thanks again,
Julie
0 Kudos
TonyHowser
Esri Contributor
Hi Julie,

All Esri Business Analyst or Community Analyst products, including the Desktop, Server, Online/Web, and API, essentially rely on the same data apportionment methodology through common business components "under the hood."  That said, the Desktop documentation that Jason has provided should be a good reference for you.

Additionally, I wanted to call out an independent data accuracy study which compared our projections and estimates (which are computed with the underlying methodology we have been discussing in this thread) with those of major competitors:

Thanks,
Tony
0 Kudos
RobertSwett
New Contributor II
Julie,

You find the below link to the Business Analyst 10.1 Desktop online Help useful as it details how the software summarizes data.  There are several methods that allow to decide how data is summarized spatially.  I believe BAO uses a method which is like the Hybrid that is detailed but I find the section on block apportionment vs. cascading to me the most informative.

http://resources.arcgis.com/en/help/main/10.1/#/Data_tab/000z000000v7000000/

Regards,
Jason R.


I am writing a fact sheet to make folks at my university aware of the availability of Community Analyst. I would like to be able to give them a definitive answer as to how data is apportioned for non-standard areas (polygons). The answers in this thread point to methods outlined for Business Analyst Desktop (block apportionment, cascading centroid, and hybrid) in ArcGIS Destop 10.x Help. However, I do not see a definitive answer in this thread as to whether these are the same exact methods used in Community Analyst. Can anyone suggest where I could get that answer? Thanks
0 Kudos
Jason_RobinsonRobinson
Esri Regular Contributor
I am writing a fact sheet to make folks at my university aware of the availability of Community Analyst. I would like to be able to give them a definitive answer as to how data is apportioned for non-standard areas (polygons). The answers in this thread point to methods outlined for Business Analyst Desktop (block apportionment, cascading centroid, and hybrid) in ArcGIS Destop 10.x Help. However, I do not see a definitive answer in this thread as to whether these are the same exact methods used in Community Analyst. Can anyone suggest where I could get that answer? Thanks


Bob,

Community Analyst uses Hybrid method.  Reports like the local version Demographic and Income Profile run in Business Analyst Desktop should be exactly the same if one selected the same report from Online Reports.  I know it was definitely a focus for the development to make sure both products line up report value wise the past couple of years.  I have attached a pdf that goes into more advanced detail on summarizations in general that might be beneficial.

Regards,
Jason R.
0 Kudos
RobertSwett
New Contributor II
Thanks much Jason. Very helpful.

I have another question. The following Esri web page seems to indicate that Community Analyst Standard Plus currently provides access to 12,764 variables: http://www.esri.com/software/arcgis/community-analyst/variables.

Is that right?

Also, when I sum up the quantities that appear alongside the names of each dataset (or data category) on the left of the page, the total is 30,595. If these quantities indicate the number of variables in each dataset (or data category), then am I right in assuming that some variables are being "double counted"? I.e., whoever created the list decided that particular variables could be classified in various data categories?

Thanks
0 Kudos
Jason_RobinsonRobinson
Esri Regular Contributor
Thanks much Jason. Very helpful.

I have another question. The following Esri web page seems to indicate that Community Analyst Standard Plus currently provides access to 12,764 variables: http://www.esri.com/software/arcgis/community-analyst/variables.

Is that right?

Also, when I sum up the quantities that appear alongside the names of each dataset (or data category) on the left of the page, the total is 30,595. If these quantities indicate the number of variables in each dataset (or data category), then am I right in assuming that some variables are being "double counted"? I.e., whoever created the list decided that particular variables could be classified in various data categories?

Thanks


Bob,

There is definitely double counting going on in that category list.  Expand Employments/Jobs/Labor as well as Transportation and you will see 2012 Labor Force datasets are in both.

Regards,
Jason
0 Kudos