Joining Large files (900,000 records)

4081
14
04-03-2017 11:52 AM
BrianRizzo
New Contributor II

Join operation takes about 12 hours to complete in Pro.  (900,000 records with about 10 variables).  Desktop operation takes about 6 hrs.  Is this normal?  If so, how does this speak to BIG data as this is not really representative of BIG data.

platform - recent I7, 16 GB, SSD.

0 Kudos
14 Replies
DanPatterson_Retired
MVP Emeritus

Attribute join? or spatial join?

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

In addition to Dan's question, can you provide some more information on where the data is stored (file geodatabase, enterprise geodatabase, etc...) and how exactly you are implementing the join (using a geoprocessing tool, dialog box in UI, etc...).

0 Kudos
BrianRizzo
New Contributor II

Point fc file stored in a file GDB.  Attempting to join an attribute from a csv. 

0 Kudos
DanPatterson_Retired
MVP Emeritus

I am wondering if the csv is the issue, have you tried to convert it to a geodatabase table and then join?

0 Kudos
BrianRizzo
New Contributor II

While the join itself works and seems to happen very quickly, saving the join is the real problem. Converting the csv file to a GDB table doesn't seem to provide any gains.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Can you elaborate on "saving the join is the real problem."  Are you referring to exporting the data after it has been joined? 

0 Kudos
BrianRizzo
New Contributor II

yes.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

What is your workflow for exporting the data after the join?  Are you using a tool like Feature Class to Feature Class or the Export Data dialog box from the GUI?  Where are you exporting the data to, file geodatabase, shape file, enterprise geodatabase?  If a file geodatabase or shape file, is the output location on a local or network drive?

0 Kudos
KoryKramer
Esri Community Moderator

Brian Rizzo‌ Joshua's question about what format you're exporting to is important.  IMHO, the most important question hasn't been asked: is this a 1:1 or 1:M join?  If 1:1, blows my theory.  But if 1:M, since you're joining to a file gdb feature class, you will still only see the 900,000 records in the attribute table, but depending on the number of tabular records that are joined with each feature, you could be generating a table that is millions (perhaps many many millions) of records.  These ARE stored in the file gdb even though you can't see them.  If you export to a shapefile, each record would actually show up as an output feature (but depending on the number of records, you might exceed the limitation of shapefile storage and crash or freeze).  

Anyway, let us know about the cardinality of the join and even send some data if you want to dig further.

About joining and relating tables—Help | ArcGIS Desktop describes what I'm talking about "However, it is possible to create a join under these circumstances. When you create a join in such a case, there are differences between how tools and other layer-specific settings work depending on the data source. If you are using geodatabase data to create the join, all matching records are returned. If you are using nondatabase data, like shapefiles or dBASE tables, to create the join, only the first matching record is returned.

This means that if you have created a 1:M or M:M join with geodatabase data and you generate a report, you see multiple records in the report, one for each corresponding match. The multiple matches are also seen when using a join field while symbolizing a joined layer, labeling, identifying features, generating a graph, and using either the Find or Hyperlink tool. If you are using the joined layer as input to a geoprocessing tool or in an export operation, the multiple matching records are used.

Caution:

In all cases of 1:M joins, only the first matching record is joined and displayed in the layer's attribute table."

Maybe this helps?

0 Kudos