Polling feature services for "Incremental Updates" (Updated August 2023)

11589
10
08-20-2015 03:25 PM
RJSunderman
Esri Regular Contributor
4 10 11.6K

I've configured a 'Poll an ArcGIS Server for Features input to 'Get Incremental Updates'. Is there a way to prevent the input from polling all of the features in a feature class when GeoEvent Server is restarted?

(Updated August 2023) 

The short answer is: No.  When GeoEvent Server's services are restarted (or the server on which GeoEvent Server is running is rebooted), inputs which poll an ArcGIS Server map/feature service for feature records lose a key value (maintained in memory) used by the input to determine which feature records are new or recently edited. Following a system restart the input must retrieve a complete feature record set from the source map/feature service in order to iterate through the data records, find the greatest object identifier or date/time value, and cache this value for use when making the next query. This means that feature records ingest, adapted, and processed previously will be processed a second time.

This key value can also be lost simply by editing the input's configuration. Suppose that an existing Poll an ArcGIS Server for Features input were modified, or deleted and replaced with a new input. These actions may also cause the in-memory key used to poll for feature records incrementally to be lost resulting in feature records previously processed to be ingest, adapted, and processed a second time.

When is this likely to be a problem?

The capability to 'Get Incremental Updates' is unique to the Poll an ArcGIS Server for Features inbound connector. Do not confuse this input's incremental polling capability with the 'Receive New Data Only' parameter available when configuring a Poll an External Website for JSON input for example. The latter polls external server's web services (vs. an ArcGIS Server's map/feature services) and relies on the external web service to include a specific HTTP property (Last-Modified) in its response header. (More information on this is available in comments in the thread Re: Receive RSS Inbound Connector.)

The issue we are exploring here deals only with the Poll an ArcGIS Server for Features input, or potentially a custom input you develop using the GeoEvent Server SDK which uses the FeatureService transport to poll an ArcGIS Server map/feature service. The input's unique ability to poll only for newly added or recently updated feature records can be useful when you do not want to ingest, adapt, and process feature records which have been processed previously and you do not want to have to delete previously processed feature records from the data source. This capability is limited, however. The ability to poll incrementally is not resilient when it comes to service restart or server machine reboot.

Not all real-time event record processing solutions you configure will exhibit a problem if a server machine is rebooted or edits to an input obliterate a key value cached in memory. Solution architects must recognize, though, that when an input loses its cached key all available feature records from a feature service must be polled and processed a second time. This duplicative event record processing might not be a problem for a solution configured to update feature records using data from processed event records. Such a solution is simply going to update the target feature records with data already held in those feature record’s attributes.

A solution which sends e-mail notifications, on the other hand, is different. If a server machine were rebooted, and a GeoEvent Server input configured to poll only newly added (or recently updated) feature records, and the input were to ingest and adapt a feature service’s complete feature record set a second time, then re-processing some number of dozen (or hundred, or thousand) feature records would generate duplicate e-mail notifications for every data record that was re-processed. Sending duplicate e-mail notifications every time a server machine is rebooted is obviously not ideal.

Are system reboots the only time polling incrementally is likely to be a problem?

No. The Poll an ArcGIS Server for Features input is also vulnerable when using object identifiers to determine which feature records have been recently added. A solution configured to poll only newly added feature records based on a database feature record’s object identifier can fail when ArcGIS Server invokes a mitigation strategy intended to support concurrent editing.

In order to support multiple concurrent editors, ArcGIS Server assigns each editor a different block of object identifiers. One editor might create feature records with object identifiers in the range 1, 2, 3, ... 100. A second editor will be assigned a different range of identifiers allowing feature records to be created with object identifiers 401, 402, 403, (etc).  A third concurrent editor will be allowed to create feature records with object identifiers 801, 802, 803 to avoid race conditions where each editor asks what the next available OBJECTID is and proceeds to create a feature records with (potentially) an identifier being used by another editor concurrently.

If you configure a Poll an ArcGIS Server for Features input to poll incrementally based on OBJECTID, and the value ‘803’ from the third contributor is cached as the key to use when determining newly added feature records, it is possible that GeoEvent Server’s input will never poll feature records created by the first or second editors whose assigned object identifier range(s) are less than the cached key.

Recommended Approach

When a solution needs to be more resilient to system restart, or operate independently of feature record object identifier values the recommended approach, rather than configuring a Poll an ArcGIS Server for Features input to conduct incremental polling, is to write data into the feature records being polled which mark individual feature records as having been processed.

This way, regardless of if or when GeoEvent Server’s services are restarted, its server machine is rebooted, or edits are made to feature records by concurrent editors, the history of which feature records have been processed is stored in the geodatabase.

What you want to do, essentially, is add an attribute field to your map/feature service’s schema named something like hasBeenProcessed. Configure the feature service to write a default value ‘0’ into this field when new feature records are created, then as part of a GeoEvent Service configure your event record processing to overwrite the hasBeenProcessed attribute field’s value with a ‘1’ to mark it as a feature record you do not want a GeoEvent Server input to include in a future poll. You do this by changing the input’s Query Definition from its default 1=1 to hasBeenProcessed < 1. If you ever do find that you want to re-process a feature record, perhaps because some important attributes of the data record have been updated or changed, just make sure the feature editing workflow returns the hasBeenProcessed attribute back to its initial/default value ‘0’ and GeoEvent Server’s input will automatically include that feature record in its next poll.

- - -
If you have other approaches you have developed to deal with this particular behavior, your comments are welcome. Please also consider information and user comments in these other threads:


As always, I hope this information helps.
- RJ

10 Comments