On the fly transform of data

931
9
05-19-2023 02:29 PM
Status: Open
StefanieWenker
Occasional Contributor

It would be great to be able to link to a data source and transform the data before you bring it into ArcGIS Pro. Although the tools for data engineering in Pro are amazing you often still need to export or change your source data when you want to make changes. I would like to do the same operations, but without having to create a duplicate copy. To keep this duplicate copy up to date I need to set a scheduled task to do this same operation every time I want my final table refreshed. Similarly if I want to import an excel file into ArcGIS Pro I first need to make sure the column names are all correct and my data types are all correct. If I got the option to transform all that when I first bring the excel file into Pro it would save a lot of time - and it would also avoid me having to make a special copy of my file "for GIS" (I have many excel files in my folders that have _forGIS on them.

The Transform capability of Excel and PowerBI is something that I really like. I link straight into a database and can create complex queries, pivots, merging queries rename columns etc on a raw data source and everytime I open up PowerBI/Excel  it runs the query and creates the table that I want to see at the end. 

 

 

9 Comments
TanuHoque

@StefanieWenker 

I might have misunderstood your workflow.

Are you saying that this?

  • you actually have your data stored in enterprise databases.
  • they are simply linked from your excel worksheets via some complex sql queries?
  • and you want to have similar capabilities in ArcGIS Pro so that you wouldn't need to make duplicate copy of your data in your enterprise database.

please let me know if I got it completely wrong.

Thank

 

StefanieWenker

@TanuHoque  

thanks for the quick reply. I was mainly talking about data not specifically managed in an Esri database, but it does apply to that as well.

Say you have a feature class in your enterprise database and for whatever you are doing you need to transform the data, but you don't have permissions to make changes to the source data. Many of the geoprocessing tools that I would use result in copies of the dataset.

Here are a few examples for which the solution now is to make a copy of the dataset (meaning it is no longer connected to the enterprise datasource):

  • We have a water main dataset that has the diameter as a text field - if I want to use that field in my analysis, I need to make a copy of my dataset and do a calculate field on a new field to turn it into a number field.
  • Say there is a typo in the field that I want to use for my labeling. I don't know who maintains the data source so if I could do a replace without having to change the source data I could fix it for my map.
  • Creating a category field or grouping several values together. We have a signs dataset and there is a domain with 100 different values in it, but I am really interested in the sign category (i.e. streetname sign vs, traffic sign vs. parking signs). 

Off course all of these things wouldn't be necessary if your data was perfect to begin with, but it rarely is (especially when it is not managed by Esri databases/infrastructure), and even if it is Esri data - updating enterprise data values for data that you are not responsible for is hard, and updating enterprise schema even more so when the data is widely used.

In ArcGIS Online webmap you can sort of do this where you create additional attributes that aren't necessarily in the original datasource using arcade epxressions, but I am talking beyond just adding fields and replacing values.

For any tabular data source you are adding (including sql query to non-gis database, excel file) if you could transform (manipulate) it when you are adding it into Pro without having to save a copy of the data source that would be a huge benefit! Every time you would open Pro - Pro would look at the data source, apply your transformations and show you the final dataset you need to see and work with. This could also greatly improve query layers and joining data on attributes (currently limited to using 1 field only).

Like I said PowerBI and Excel have this function when you load data through Power Query.

In my case I am trying to bring in Cityworks Inspection data and to be able to use it in ArcGIS Pro I need to transpose/pivot it, but my options right now are:

1. Do this in Excel using Power Query (and create another place where this data is stored) - at least this one is sort of a live link into the actual sql database.

2. Bring the excel file into ArcGIS Pro as is and use the pivot geoprocessing tool - would create a copy of the data

3. Set up a view with the query in the sql database, but it computationally expensive when you try to pivot 32000 records. Excel can do this in a few seconds only....

 

TanuHoque

thanks @StefanieWenker for the details.

I can't say much about pivot/transpose and its performance issue as I don't think I have worked with that.

As per the other ones, I think we should be able to do that using Query Layers. Please let me know if you tried before but not worked for you

https://pro.arcgis.com/en/pro-app/latest/help/mapping/layer-properties/what-is-a-query-layer-.htm

StefanieWenker

Thanks for the suggestion. I will try them out. Haven't done much with them before. Hopefully they have the functions that I would need to do some of these things.

TanuHoque

@StefanieWenker 

pls let me know how it goes. In general, any sql including database specific functions etc., that can be used inside a subquery can be used to define a query layer. All this layer does is it pushes the entire sql query that you provided, to the underlying database by wrapping that in side a subquery...

e.g. if you used a query like this:

select job_title, sum(income) as total_income from employees group by job_title

it will be pushed to the database like this

select * from (select job_title, sum(income) as total_income from employees group by job_title) a

Bud
by

Would another way of looking at this idea be something like this?

"Add pivot/unpivot functionality to Make Aggregation Query Layer (Data Management)"


Related: 

What kind of database is your underlying CityWorks data stored in? Oracle, SQL Server, etc?

StefanieWenker

I was just in an ArcGIS Insights session at the Esri UC and their Data Engineering tool is somewhat what I am thinking, but it looks like you still need to make a copy of your final data to be used in an Insights report.

I would like to be able to string operations together on a dataset and for them to be applied when I open Pro automatically. This is the same behaviour as Power Query in Power BI https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-query-overview.

Pretty much it is an interface that builds the query for you when you add operations to your source dataset. Not only does it allow you to transform one dataset, but you can also combine queries --> ie. union or join tables without having to permanently change your source data or make a copy of the data. 

I think query layers have most of this capability, but may be harder to use. The operation I was trying to do - pivot, is simple in PowerBI and Excel, but doing it in SQL has proven to be a lot harder.

StefanieWenker

@Bud 

Thanks for the suggestions. Our Cityworks data is stored in SQL Server database.

Yes your idea follows along the lines of what I am thinking. It would be great to be able to add virtual attributes as part of the transform as well. I have upvoted your idea.

Bud
by