Data Reviewer Speed

6796
11
Jump to solution
04-01-2015 02:28 PM
AnthonyBarron
New Contributor III

This is a multi-part question and I am hoping someone may be able to point me in the right direction. I am working with Data Reviewer 10.2 and I have a Batch Job consisting of Table To Table, Geometry on Geometry, Duplicate Geometry, and Execute SQL checks. The Batch File checks for errors in ~ 375,000 points and ~58,000 Points. My concern is the amount of time required to run the batch file. I'm running my test now and it's been 8 hours with only 209,000 points validated.

My questions are as follows:

1) How can I expedite the batch file?

2) If I run the batch file against "Changed Features Only" will it speed up the results? If so, how does Data Reviewer determine which features have been "Changed"?

0 Kudos
1 Solution

Accepted Solutions
KumarBhardwaj
Occasional Contributor II

Hi Anthony,

We have made some performance improvements in our latest version of Data Reviewer (10.3) for some of the checks that you are using in your batch job. However if you are validating data in a versioned geodatabase, you can also choose to run a check only on features that have been edited. When you check the Changed Features Only check box and run the check, features in the current version are compared to the parent version to identify what features have been changed.

Thanks,

Kumar

View solution in original post

11 Replies
KumarBhardwaj
Occasional Contributor II

Hi Anthony,

We have made some performance improvements in our latest version of Data Reviewer (10.3) for some of the checks that you are using in your batch job. However if you are validating data in a versioned geodatabase, you can also choose to run a check only on features that have been edited. When you check the Changed Features Only check box and run the check, features in the current version are compared to the parent version to identify what features have been changed.

Thanks,

Kumar

AnthonyBarron
New Contributor III

Kumar,

Thank you so much for the input. that's exactly what I needed. Unfortunately, we are not operating against a versioned GDB. However, we do have an attribute "Date Last modified." So within the SQL Parameters, I created a statement to only select records modified last 24 hours from Feature Class 1. And compare those records to All records in feature class 2. The batch file will be run every 24 hours. Of course this is a huge improvement in speed. Many thanks! 

0 Kudos
MichelleJohnson
Esri Contributor

Hi Anthony.

I see you mentioned Duplicate Geometry as one of the checks you are using.  And it looks like you are validating Address points.  If you have ~375,000 features, that will take a long time to run the duplicate geometry check.  Try using the Geometry on Geometry check to look for duplicate points.  I converted all of my Duplicate Geometry checks where I was validating point features to Geometry on Geometry checks and it  is so much faster.  All you need to do is set feature class 1 and feature class 2 and use the Intersects spatial relation.  Were you comparing fields too in the Duplicate Geometry check?  You can do that too in the Geometry on Geometry check.

Also something to consider when configuring Data Reviewer checks with two different feature classes, use the smaller feature class for feature class 1 and the larger feature class for feature class 2 if you can.  This may not pertain to your domain, but an example would be to find overlapping service lines on mains.  Use mains as feature class 1 and service lines for feature class 2.  There are many more features in the service line feature class than there are in the main line feature class.

And finally regarding the Table to Table Attribute check, check to see if the fields you are using to link/compare the records are indexed.  Indexing the fields can help with the performance.

Hope this helps.

michellej.

0 Kudos
AnthonyBarron
New Contributor III

Michelle, thank you so much for your input. I followed your recommendations and with a few other modifications, my batch file is running in minutes. thank you!

GennadyMogilevich
New Contributor III

We are experiencing the similar issues with the Data Reviewer SQL and Table checks taking a very long time to complete.  We are running the checks on a versioned 10.3.1 SDE (Oracle 11g) database and are selecting to run the checks only on changed features between the child and parent versions.  One feature class has about 80K point features, but the others are more like 12K and 30K.  I've tried to run the checks individually and in batch mode, but even individually it seems each one is taking longer than I would expect (probably about 30 to 40 minutes each). 

Is this normal, or is there something wrong in either my Data Reviewer check configuration or the actual database?  We run the typical Compress, Analyze, and rebuild statistics on a regular basis, but that doesn't seem to improve performance as far as how quickly the Data Reviewer checks run.  One notable thing to mention is that the actual Version Differences check under the main Data Reviewer menu usually runs rather quickly, so I'm at a loss as to why a the Table to Table check run on only changes within the database would take 20x longer to complete.

Any guidance or suggestions are welcome.

Thank you in advance,

Henry M.

0 Kudos
MichelleJohnson
Esri Contributor

Hello Gennady.  The Table to Table Attribute check is known to take a while to run.  And it will take longer to run on an enterprise geodatabase versus a file geodatabase.  You are already following good practice by regularly compressing, analyzing, and rebuilding statistics on the enterprise gdb.  Another thing that can be looked at is how the enterprise geodatabase was set up with regards to where the tablespaces were created.  Are all the data tables in one tablespace on one disk.  For large databases, you may want to consider having multiple tablespaces across multiple disks to help with the read/write performance.  In addition, you can make sure the fields you are using in the check is indexed.  

In regards to Changed Features Only, there is a threshold where it's not efficient to use this feature in Data Reviewer.  I believe it depends on the extent that you are using this option.  If it's a large area and there are not many edits, it will take longer to identify which features were edited than just to run the check on current extent.  So something to keep in mind.

Good Luck,

michellej.

0 Kudos
SteveSalas
Occasional Contributor

We've been making out editors run scripted Data Reviewer against their SDE versions prior to posting for over a year now, and have frustrated enough users that we are investigating a process to export the changed records in the version out to File GDB just to run Data Reviewer.  Our test version summary:

    100 edited records in TEST1_POLY (of 68125 total records)
    100 edited records in TEST2_POLY (of 83631)
    100 edited records in TEST3_POLY (of 54762)

There are about 15 checks in the RBJ for each feature class, most are SQL checks.

~ 30 minutes to run Data Reviewer against the user's version (changed records only)

~ 2 minutes to export the changed data to file GDB and run Data Reviewer against it... and more than half of the 2 min. was to do the export.

Needless to say, that's enough evidence that we're going to throw some more resources at this prototype.  We do recognize that some checks will not work against the exported file GDB data (such as validating unique values across the entire feature class) and we might need to have two separate RBJ files - one run against the database, another against the exported file GDB data.  There will need to be a step to link the file GDB results back to the SDE objectids for the final report, but I can't imagine that will be significant.  There are other issues to sort out as well.

If anyone has attempted something similar or has a different idea on how to work-around the abysmally slow Data Reviewer in SDE processing, we'd love to hear about it...

-Steve

0 Kudos
MichelleJohnson
Esri Contributor

There has been reports that using the Changed Features Only option actually slows down the performance of a batch job/data check.  It depends on the extent/number of features it has to assess to determine if it has changed. And based on the totals records you are reporting, I think this is the reason. I would recommend running it on current extent or possible on full database, without the Changed Features Only option.  If you are running a script to execute your batch job, you can pass in an analysis area to focus the validation on that extent only.

michellej.

0 Kudos
SteveSalas
Occasional Contributor

We are scripted (part of a larger ArcObjs version management system), so another option might be to run the review on the extent of the changed features, and then filter the results to only report errors on the new & modified records back to the user.  Given our workflow and spatial patterns of data updates, I cannot hold the editor responsible for everything within the extent when evaluating the version for pass/fail - too much legacy data that is substandard and will never be updated is intermingled in the spatial extent they would have edited.

0 Kudos