CalculateField extremely slow

1197
7
09-01-2010 06:08 AM
KoenVolleberg
New Contributor
Hi all,
I've got a Python-script (sample below) which appends all points from a list of Route Event Layers into one in_memory feature class. Before the append, I need to calculate some field, using CalculateField.
However, CalculateField is extremely slow (say 5min for 5000 records).

I also tried the UpdateCursor, but same lack of performance.
Anyone a suggestion how to speed up my process in ArcGIS 9.3?

Thanks,
Koen


allPoints = "in_memory/allPoints"
gp.CreateFeatureClass("in_memory","allPoints","POINT")
gp.AddField(allPoints,"ORIGID","LONG")
gp.AddField(allPoints,"ALLELEV","DOUBLE",15,6)
gp.AddField(allPoints,"ORIGFC","TEXT",50)
i = 1
for item in elevationPointList:
desc = gp.Describe(item)
oidFieldName = desc.OidFieldName
flds = gp.ListFields(item)
origIDExist = False
allHExist = False
origFcExist = False
for fld in flds:
if fld.Name == "ORIGID":
origIDExist = True
if fld.Name == "ALLELEV":
allHExist = True
if fld.Name == "ORIGFC":
origFcExist = True
if origIDExist == False:
gp.AddField(item,"ORIGID","LONG")
if allHExist == False:
gp.AddField(item,"ALLELEV","DOUBLE",15,6)
if origFcExist == False:
gp.AddField(item,"ORIGFC","TEXT",50)
writeLog(item + "0 " + dt.now().ctime())
gp.CalculateField(item,"ORIGID",'[' + oidFieldName + ']')
if item.count("_events") > 0:
gp.CalculateField(item,"ORIGFC",'\"' + item.replace("lyr_","") + '\"')
gp.CalculateField(item,"ALLELEV",'[TOPELEV]')
else:
gp.CalculateField(item,"ORIGFC",'\"' + getBaseName(item) + '\"')
gp.CalculateField(item,"ALLELEV",'[ELEV]')
writeLog(item + " " + dt.now().ctime())

gp.Append_Management(item,allPoints,"NO_TEST")
gp.DeleteField_Management(item,"%s;%s;%s" %("ORIGID","ALLELEV","ORIGFC"))
0 Kudos
7 Replies
KoenVolleberg
New Contributor
Found out that it's also related to the fact that the data on which CalculateField is performed, is stored in a File Geodatabase. The File Geodatabase seems to be extremely slow.
I copied the data into an in_mem dataset before calculating, which improved performance from 1min to 2sec! Only problem is that by copying the features, the ID's are messed up, since I need these to make notifications with.

(saw 5 reasons to go for FGDB, but one of the reasons not to is the terrible performance).
Koen
0 Kudos
ChrisSnyder
Regular Contributor III
A FGDB should have excellent performance. In my experience, an in_memory table is only about 20-30% faster than a FDGB or Shapefile (not 5x). This may sound silly, but in your script I noticed that you were specifying precision and scale parameters for the ALLELEV field:

gp.AddField(allPoints,"ALLELEV","DOUBLE",15,6)

FGDB format doesn't honor precision and scale (should just ignore them), but... Try leaving them off and see what happens in FGDB format.

Also, maybe out of habit rather than necessity, I always enter the gp tool parameters as text. I would write your line:

gp.AddField(allPoints,"ORIGFC","TEXT",50)

as

gp.AddField(allPoints,"ORIGFC","TEXT","50")

It would be useful to know what specific tool is causing the holdup. You can retrieve the tool messages (only after the tool completes) using the gp.getmessages() command. For example:

gp.AddField(allPoints,"ORIGFC","TEXT","50"); print gp.getmessages()
0 Kudos
RichardFairhurst
MVP Honored Contributor
I agree with Chris that FGDBs have excellent performance and that what you are describing is unusual (for me calculations on 120,000 records on an unjoined table normallly complete in under 2 minutes on my FGDBs).  Is your FGDB on a local drive or network drive?  If it is on a network many other things could be causing the performance degradation other than the FGDB itself.  You need to provide a bit more description on how you set up your FGDB, verify that there are no system limitations that are affecting its performance, and test a few other alternative setups prior to concluding that FGDBs are at fault and fundamentaly flawed for use with the Field Calculator.
0 Kudos
KoenVolleberg
New Contributor
Hi Richard, Chris and others,
sorry about my somewhat negative posts, but I was kind of frustrated that things did not work as they should be. I hope with all your suggestions, the problem may be solved.

I'm running an XP-machine (2core 2*2.8, 2GB mem, XP SP3, ArcGIS 9.3 SP1), with my FGDM stored on my local disk, so network-problems do not occur.

In my FGDB are stored 7 FD's, all containing around 5 FC's (structure must be maintained). In one of the FD's are routes stored, with the route-eventtables directly in the FGDB root.

My script converts these event-tables to event-layers (MakeRouteEventLayers), before the part that I have shown does it's job on the EventLayer. Could this cause the slow-down?
I tried to find out which tool causes the slowdown: it's the set of CalculateField-commands, and not the AddField-commands, which do their job (almost) instantly.

Hope someone has some more suggestions. In the meanwhile, I'll fiddle around with other options. If I got something that solves it, I'll post the solution here.
Thanks, Koen
0 Kudos
ChrisSnyder
Regular Contributor III
What sort of object is 'item' in the line:

for item in elevationPointList:

Is it a featureclass, cursor row, or?

Also, could you re post your code using the code tags (#) so that the indentation is preserved?
0 Kudos
RichardFairhurst
MVP Honored Contributor
My script converts these event-tables to event-layers (MakeRouteEventLayers), before the part that I have shown does it's job on the EventLayer. Could this cause the slow-down?


I also use linear reference event layers and do find that they slow down calculations regardless of what database is used to store them.  I find that LR event layers want to refresh with calculations and that this is a relatively slow process.  They seem to treat every field as though it is affecting the positional data of the LR event and seem to perform some internal check to reverify the event goemetry.  I also find LR event layers do not like joins used in conjunction with calculations and throw errors.  So LR event layers are a bit testy.

Normally I try to calculate values that are not dependent on the LR event geometry directly into the underlying table (use the tableview creation tool in modelbuilder to access the table for calculations in your script).  The main problem with this approach can be that any precreated LR event layers may not immediately reflect the calculated values because the calculation does not always trigger a refresh of the LR event layer in memory (which can be good for performance but not so good for immeidate feedback on symboology or labeling of your map).  If I want to calculate a field that will be immediately reflected on the map, I usually have to calculate it directly on the LR event layer and take the performance hit, or else destroy the LR event layer and recreate it after I have calculated the value in the underlying table.  The second approach is often faster.  At the very least, test a calculation done directly to your underlyinng table and then creating your LR event layer for a contrast on the speed of the two approaches.
0 Kudos
KoenVolleberg
New Contributor
Hi all,
as Richard also stated, my LR-event layers are definately slowing down the process. Thanks for the sugegstion!
I performed the calculations directly on the tables, instead of the event-layers, which dramatically improved performance. This opened my eyes for the performance-issues (but not big ones) for other scripts which process LR-event layers.

Cheers, Koen
0 Kudos