Latest Contributions by AlbertoNieto1

‎02-08-2024

Hi Jamal, Thank you for posting your question, and hope you're doing well. The error is a bit nuanced - while multicollinearity is typically associated with multiple variables, the error can also occur even with a single variable that has low variation in a small neighborhood for a feature. Please forgive me if you're already aware, but GWR works with the concept of neighborhoods: Within this neighborhood for each feature, we expect variation in the explanatory and dependent variables to be able to create a local regression model. When the variables within that neighborhood do not have variation, you can run across this error - even when using a single variable. Here's a simple thing you might try to check if this is the case: Create a map of your explanatory variable, and assess the smallest neighborhood size you are running the tool with (30 neighbors, by default) is it possible that the neighborhood being created doesn't have variation in your explanatory variable? Hint: You can use Neighborhood Explorer if you're on Pro 3.2 to check this too. There's a few things you can do to try to still proceed with this single variable: 1. Increase the starting neighborhood size. Larger neighborhoods often have a better chance of including variation needed for those local models. To increase the starting neighborhood size, set the Neighborhood Selection Method to "User defined" and test with various increasing sizes. 2. Use the Gaussian Kernel. The Gaussian Kernel essentially makes all features neighbors of all features, increasing the neighborhood size but diminishing the effect of distant neighbors. This may help, as the model essentially uses all the data and allows the full variation in your variable to be used in the local model. Despite these steps, please be aware that GWR really shines when local variation is present, and the fact that you're running into this error may be indicating data problems that should be corrected. It's not guaranteed that this is the case, but please consider this if you proceed with that single variable. Hope this helps, and thanks again for your question Jamal. Alberto PS: Just realized that Eric already answered your question more concisely!

‎04-17-2023

Great, thanks Kelly. When you have a moment, could you please try the following with your notebook and then attempt to run Density-based Clustering with the OPTICS method again to see if the problem is resolved? 1. Open your notebook item page, where you saw the notebook runtime dropdown. 2. Ensure the notebook is not open on any browser tabs. 3. Change the runtime to ArcGIS Notebook Python 3 Advanced - 7.0 4. Open the notebook and attempt to run Density-based Clustering with OPTICS. Please let me know if you have a moment to run this and if the tool runs as expected in that new runtime. Thank you for your time, Alberto

‎04-14-2023

Hi Kelly, Thanks for letting us know. Can you please confirm the following two things? 1. In your notebook, please run the following command on a cell: arcpy.GetInstallInfo() ... then please post the result. 2. In your notebook item "Info"... ... can you please post a screenshot of what you see in the settings at the bottom for the notebook runtime?

‎02-14-2023

Hi Josh, Thanks for the question. I think your math looks solid and it would be good for us to take a closer look and replicate the results you're seeing. The zipped file has a single .shp file. By any chance would you be able to zip the shapefile once more with all the files it requires? Alternatively you can share it as a feature class in a zipped file geodatabase if that works. Alberto

‎05-04-2022

@BrianHilton and @DominicLee - Thanks again for reporting the problem. We've created a fix for the issue and it will be included with the ArcGIS Online 10.2 release, which is expected in late June or early July. We apologize for the inconvenience the issue created. In the meantime, would you have access to ArcGIS Pro? The Density-based Clustering with the OPTICS method works in ArcGIS Pro, and while I understand that this may not solve the immediate need to run the tool from a notebook in AGOL, I'm hoping that you can still complete your analysis on a different part of ArcGIS in the meantime.

‎04-28-2022

@BrianHilton - Thanks for reporting this issue. We've reproduced it and are investigating. We'll keep you posted on a resolution.

‎03-24-2022

Hi @CMV_Erik, thank you for the idea and for your comment on the Data Engineering view. It's helpful to hear about your experience with it! We really appreciate you taking the time to write and submit your idea. We've discussed it and considered it at length. We've designed the Data Engineering view to be highly interactive as an exploratory tool in conjunction with other parts of ArcGIS Pro, such as charts, maps, and scenes. Whether you update the layer symbology using the Fields panel, create a full histogram from a chart preview, or make a selection on the Nulls cell to see where missing data is located on the map, these features and workflows all depend on a layer being accessible on a map or scene and therefore having the layer accessible on the Contents pane. If you're only interested in statistics and not so much in the map, layer, or interactive charts, we have also added a new tool called Field Statistics to Table which provides all the statistics that the Data Engineering view offers. The resulting table from this tool can be viewed directly in the Catalog pane, without needing to add it to the Contents pane. We honestly appreciate the idea, and we want to continue understanding your workflows and steps you take when accessing data prior to loading it to a Contents Pane for a map or scene. Please feel free to add any parts of your workflow that you'd like to share and we will be sure to read and consider your workflow as we continue designing and developing new capabilities in ArcGIS Pro.

‎08-20-2021

Hi Rachel, My name is Alberto Nieto, and I'm a product engineer in the spatial statistics team that worked on the tool. Thank you for posting the problem and apologies that the tools are not working properly in this case. The problems in the two tools are both in the section of the tool that attempts to establish a SAS session using the saspy python package. We are likely making some assumptions about the ability of the package to create a temporary configuration file in a scratch location, and depending on your installation of SAS, this may be running into problems that we did not catch during development. Would it be possible to schedule a brief call with you to ask you a few details about your SAS installation and to manually test if this temporary configuration file is properly being created by the tool in your case? The call would help us identify the source of the problem so that we can implement a fix as soon as possible. Once we identify a fix, I will post back in this thread for the community to see. If you agree, please reach out to me directly at anieto@esri.com and we can coordinate a call. Thank you again for posting and helping us identify these problems. Looking forward to hearing from you. Alberto Nieto

‎06-28-2021

Thank you. We are investigating a change to the ArcGIS API for Python's to_featureclass function that might have resulted in an issue with this workflow. The problem is that null rows in the spatially-enabled dataframe are not being processed. An intermediate workaround is to drop rows with null values: out_2016_fc = data_2016_df.dropna().spatial.to_featureclass(os.path.join(fgdb, out_2016_fc_name))

‎06-28-2021

Hi HamzaM, Sorry to hear about the problem. I was able to reproduce the error on my side, and we are working on a solution to the problem. To help us confirm some details, could you please let me know: - What version of ArcGIS Pro are you using? - What version of the ArcGIS API for Python do you have installed in ArcGIS Pro? (to check, select the "Project" menu at the top-left corner of ArcGIS Pro... ... then check the Python option, select Installed Packages, then select arcgis: - Did you install the ArcGIS API for Python separately from ArcGIS Pro?

‎11-10-2020

Hi Justin, thank you for your question! I may repeat some things you might already know, so please bear with me, but I find it helps to cover the whole goal of the analysis and then answer your question: To begin, let’s recall the goal of using OLS: to create a linear formula representing the relationships between a dependent variable and one or more explanatory variables. Most things we want to predict will not have an exact linear relationship, and any linear formula we apply will be off from observations. The difference between the linear formula and the observations corresponds to the residuals. This is not necessarily bad. Most models should be generalized so that they can capture a general trend, or a “signal” in the data, without mapping the formula to every single point observation. In fact, because a linear formula corresponds to a straight line, this will basically mean that our formula will be off somewhere. What matters then is to try to make these residuals random; in other words, we want the error to look and feel as random as possible so that our linear formula isn’t biased towards a specific type of error. Here’s a rough example of a linear model that is biased to ignore smaller homes: When we think of residuals in geography, the same idea applies: we want the residuals to be randomly distributed across your study area. If you see a clustering of low or high residuals somewhere in your study area, that would suggest that your model is missing an important characteristic of that area and the model is over-predicting (for low residuals) or under-predicting (for high residuals) for that area. When you run a hot spot analysis of OLS residuals, you are testing for a hypothesis that residual values are randomly distributed across your study area; that there is no underlying process driving the clustering of significantly high or significantly low residual values. When a hot spot analysis provides hot spots (high residual values) the analysis suggests that the model under-predicted in that neighborhood, and that the neighborhood’s average residuals are significantly different to the entire study area. The same applies with cold spots: but your model is over-predicting in those cases. There are several potential causes, and it’s a bit tricky to cover them all in a short conversation, but a common one is that your model is likely missing some key characteristic of that neighborhood. For example: if our home value prediction linear model has clustered high residuals (i.e. hot spots of OLS residuals) for homes near the beach, then we likely did not include enough information about expensive homes near the beach in our training dataset, and we could do a bit of work to test if using the distance to the beach and a wider sample of homes with varying prices would help these residuals be more spatially random. This would help your model have a higher chance of capturing those patterns and hopefully lead to better predictions for those homes as well. A useful article about this can be found here, where the fourth check “Is my model biased?” covers a few additional considerations. Please let us know if this helps, and feel free to post screenshots or data for your model so we can help if you have any other questions!

‎04-05-2020

Hi Ranjeet, Thank you. The tool requires ArcGIS Pro 2.5. Is it possible for you to upgrade?

‎04-05-2020

Hi Ranjeet, Thank you for posting. Can you please confirm your version of ArcGIS Pro?

‎04-30-2018

This is exactly what we were looking for! Thank you for documenting it through the blog post as well.

‎06-13-2017

Jupyter Notebook is a powerful tool that allows Python users to create and share documents containing live code, visualizations, explanatory text, and equations. The term “notebook” is very applicable, since the tool allows you to write snippets of self-contained executable code (named “cells”), note each procedure, and even visualize data you are working with at any step of the way. Why should I use a Jupyter Notebook? Jupyter Notebooks have gained tremendous popularity in the Python data science community over the past years for a variety of reasons. As a GIS user, I have personally found Jupyter Notebooks to be extremely useful for the following three reasons: 1. Prototyping of Python Workflows Jupyter Notebooks are extremely useful when you do not have a defined final process and are still in the prototyping phase of your scripted workflow. This is mainly thanks to the feature where code is written into independent cells, which can each execute independently from the rest of the code. This allows a Python user to quickly test a specific step in a sequential workflow without re-executing code from the beginning of the script. Many Integrated Development Environments (IDEs) allow you to do this in several ways, but I’ve found Jupyter Notebook’s concept of a “code cell” to be the most intuitive approach for prototyping logic and sequential code 2. Visualizing Pandas Dataframes Pandas (Python Data Analysis Library) provides high-performing and easy-to-use data structures that allow you to work with large amounts of data extremely fast. The core data object is a Dataframe, which is essentially an in-memory table that allows powerful indexing operations. Jupyter Notebook allows you to visualize these tables at any point in your notebook. This is extremely useful because you can view the state of your data (and the effect of all the actions your code is performing on your data) as each step of your logic executes. This capability reinforces the use of Jupyter Notebook in a prototyping workflow when you are attempting to confirm that your workflow is doing what it needs to do at each step of the way. Showing a Dataframe in a code cell can go a long way to understanding how your code operates. So why are Pandas Dataframes such a big deal? As a GIS user, the first foray into working with Python and GIS data management typically uses some mix of arcpy’s “CalculateField”, “SearchCursors”, and “UpdateCursors”. Most of the examples teach you to use these operations and they are all completely functional, but they suffer from the same process-intensive issue: they all need to iterate upon every record of your data to perform a data management operation. In other words: Imagine that you are a director of a movie in production, and you find out that to change the lighting in a scene, you need to watch the movie from the very beginning… for every change. This would take forever! Operating on a Pandas Dataframe solves for this with powerful indexing that allows effective querying and array-wide operations. You essentially find the specific scene of the movie that you need to fix, and skip to that scene. Once my GIS data analysis workflows started integrating Pandas Dataframes into heavy data operations, I saw exponential improvements in performance. Visualizing these Dataframes and seeing the effects of my code in each dataset became a crucial component of working efficiently. 3. Integration with ArcGIS The newest (and most exciting) reason is the integration of Jupyter Notebooks with the ArcGIS Platform. My two main production tools had long been the ArcGIS Platform and Jupyter Notebook. When Esri announced that the ArcGIS API for Python would provide support for geographic visualizations, organization administration, and even access to the most powerful analytical capabilities of the platform within Jupyter Notebooks, I literally could not stop smiling. Seeing this for the first time made me pump my fist in the air. The new ArcGIS API for Python renders each Jupyter Notebook an extension of your distributed GIS. Among several other capabilities, you can: Set up a notebook that will connect to your Portal and provide you detailed reports on each user’s content, groups, and statistics, and perform backups of all the content in a Portal based on user group. Free yourself from administration tasks to explore and analyze. Create integrated maps and data operations that are connected to code cells in your notebook. All the prototyping benefits mentioned above are now part of your spatial analysis workflow. Leverage GeoAnalytics tools and other geoprocessing operations on data workflows that you are already working with in your Jupyter Notebook. The most powerful new tools are already incorporated into the API. Even with all these benefits, coming up to speed with Jupyter Notebooks as a GIS user can be a daunting task. Stay tuned for a few tips on how to navigate and operate Jupyter Notebooks…

Online Status	Offline
Date Last Visited	‎02-12-2024 05:13 AM

My Ideas

Latest Contributions by AlbertoNieto1

Re: ArcGIS Pro 3.0.2: The GRW generates “multicollinearity” error despite the fact that only one explanatory variable is used,

Re: Density Based Clustering Issue AGOL

Re: Density Based Clustering Issue AGOL

Re: Local Moran’s I Calculation Example

Re: Density Based Clustering Issue AGOL

Re: Density Based Clustering Issue AGOL

Re: Access Data Engineering from the Catalog pane/view

Re: Why is the SAS toolset in ArcGIS Pro 2.8 Failing?

Re: Invalid JSON Data

Re: Invalid JSON Data

Re: How do I interpret a hot spot analysis of OLS residuals?

Re: CHIME Model v1.1.1 gives error like, Traceback (most recent call last): File "<string>", line 1063, in execute AttributeError: module 'SSUtilities' has no attribute 'DataContainer'. How to resolve?

Re: CHIME Model v1.1.1 gives error like, Traceback (most recent call last): File "<string>", line 1063, in execute AttributeError: module 'SSUtilities' has no attribute 'DataContainer'. How to resolve?

Re: Filtering Dashboards Using Chart Selections

Three Reasons to use Jupyter Notebooks as a GIS User

Re: ArcGIS Pro 3.0.2: The GRW generates “multicoll...

Re: Invalid JSON Data

Re: How do I interpret a hot spot analysis of OLS ...

Three Reasons to use Jupyter Notebooks as a GIS Us...