I would dig into some Python. We have a similar situation with our cadastral data, in that a lot of other tables on another server feed into our hosted polygons layer, and updating them regularly is important for our users to see the latest ownership, assessments, etc. Our update nightly via Pyton and the task scheduler, and we then use view layers to create more focused layers for particular uses.
The key tools in this process are the ArcGIS Python API, pandas and the spatially enabled dataframe. The latter is really just ESRI's own custom extension of pandas itself.
The steps, in plain English:
- Query other server(s) into a non-spatial table using pandas, specifically pandas.read_sql.
- Query the existing hosted feature layer using arcgis.features.FeatureLayer
- Only bring in the primary key and geometry, not the other fields
- Join the two dataframes together using DataFrame.merge
- Apply the merged table to the existing hosted feature layer using FeatureLayer.edit_features
And here it is in some simplified code:
import os
import sys
import pandas as pd
from arcgis import GIS, GeoAccessor
# log in to portal
gis = GIS('portal-url', 'user', 'password')
# get hosted layer; not actually using arcgis.features.FeatureLyaer directly in this case
lyr = gis.content.get('itemID').layers[layer-index]
# SQL DB connection string
constr = 'mssql+pymssql://user:pass@server-location/database'
# Read standalone SQL file to query variable
with open(os.path.join(sys.path[0], 'query.sql'), mode='r') as q:
sql_query = q.read()
## If you are simply copying an entire table, there's no need to establish a query string.
## read_sql can take either a query string or a table name as its parameter
# Query SQL into dataframe
df = pd.read_sql(sql_query, constr)
# Pull hosted feature layer into spatial dataframe, only pulling primary key(s) needed for joining
sdf = lyr.query(out_fields=['primaryID'], as_df=True)
# Merge dataframes
merged = sdf.merge(df, how='inner', on='primaryID')
## If your primary keys have different names, you can use left_on= and right_on= to specify the matching keys from the two dataframes.
## You may need to reshape the data here, depending on the needs of your output.
# Update hosted feature layer
## We do this in 100-row chunks. API docs suggest limiting the number of features being added/edited in a single go.
## Also includes some feedback messages to let you know if the features are updating properly
## For automated scripts, you can have these messages go to an external log rather than simply printing to the console
n = 0
err = False
while n < len(merged):
fs = merged.iloc[n:n+100].spatial.to_featureset()
updt = lyr.edit_features(updates=fs)
msg = updt['updateResults'][0]
print(f"Rows {n:4} - {n+100:4} : {msg['success']}")
if 'error' in msg:
print(msg['error']['description'])
err = True
if err:
print('Some or all updates did not apply successfully.')
else:
if len(merged) > 0:
print(f'{len(merged)} features updated.')
You can extend this even further with logging / notifications, or using pandas' compare function to selectively update only those features that have been updated in the SQL database. We have nearly 60k rows in a table we update nightly, but only a few hundred update on any given day, so being able to isolate the update to actual changes is a big plus.
- Josh Carlson
Kendall County GIS