Performance at a Price: File Geodatabases and SQL Support

2246
1
09-27-2016 04:28 PM
Labels (1)
JoshuaBixby
MVP Esteemed Contributor
5 1 2,246

Between starting a new position and enjoying summer, it has been a while since I last blogged.  Although I do plan on finishing the Iterable Cursor series, I ran square into a limitation, and subsequent bug, of the file geodatabase that I thought worth sharing.  First, the limitation....

When the file geodatabase (FGDB) was introduced back in 2006, it was touted as an enhanced, high-performance alternative to the personal geodatabase (PGDB).  It is true that large data sets and large collections of data sets pose challenges for the personal geodatabase.  For one, Microsoft Access data files (*.mdb;*.accdb) are limited to 2 GB in size, which wasn't a whole lot in 2006 but is practically nothing today.  In addition to the file size limit, personal geodatabase performance starts to degrade around 500 MB of total data (see Types of geodatabases).  Another likely factor, but one Esri doesn't address, is the deprecation of the Jet database engine in the mid-2000s.

Taking a stroll down memory lane, or Esri Blogs as it may be, one finds Five reasons why you should be using the File Geodatabase:

Size

The database size is only limited by the available disk space. By default, individual tables and feature classes can be up to 1 TB. With the use of configuration keywords this can be expanded to 256 TB.

Versatility

Works on many different operating systems including Windows and UNIX (Solaris and Linux)

Speed

Provides excellent performance and scalability. For example, to support individual datasets containing well over 300 million features and datasets that can scale beyond 500 GB per file with very fast performance....

Edit Model

The File Geodatabase uses an edit model similar to shapefiles, supporting one editor and multiple readers.  Each standalone feature class, table and feature dataset can be edited by different editors simultaneously but can only have one editor performing edits on them at any given time....

Compression

File Geodatabases also allow users to compress feature classes and tables to a read-only format to reduce storage requirements even further. This reduces the Geodatabase’s overall foot-print on disk without reducing the performance.

I can't argue with any of the five reasons proselytized in the blog post, I think each is accurate and a good reason for Esri to work on creating a new (at the time), file system-based geodatabase format.  But as some of the eight year-old comments in the blog point out, the file geodatabase wasn't perfect then and it still has its faults today. 

For some, the biggest fault of the file geodatabase is proprietorship.  This isn't really a change from personal geodatabases, since Access/Jet is also proprietary, but it does represent a missed opportunity.  After several years Esri did finally release the file geodatabase API (File Geodatabase API details), but a specification has never been released.  Releasing an API is quite a bit less open than releasing a specification.  Additionally, the FGDB API only implements a portion of the functionality within the file geodatabase.

While Esri was taking steps to improve speed, size, and other areas with the file geodatabase, they were making compromises along the way.  The same Access/Jet backend that presents some challenges with large data sets in a personal geodatabase also provides rich SQL support when working with data in and out of ArcGIS software.  The decision to walk away from Access/Jet and create a new file system-based geodatabase format meant Esri had to roll its own SQL support.  Unfortunately, it was lacking right after release and not much has changed in the decade since.

When it comes to learning about SQL support in the file geodatabase, there are a few different places to look:

In looking through the various documentation/links above, there is a fair amount of overlap in content.  That said, there are important statements that only exist in one place or another, and knowing that can be important when troubleshooting errors or trying to understand spurious results from data stored in file geodatabases.

It isn't my intent to shine a light on every limitation of the file geodatabase, SQL support or otherwise.  For starters, it would take a fair amount of effort and space to lift up the rug and write on everything underneath it.  Just as the personal geodatabase has its limitations, so does the file geodatabase, and it just happens that SQL support is a big limitation of the latter.  For power users or developers, it is important to understand that limitation because one's experience, and possibly results, may vary when switching between personal or enterprise geodatabases and file geodatabases. 

1 Comment
About the Author
I am currently a Geospatial Systems Engineer within the Geospatial Branch of the Forest Service's Chief Information Office (CIO). The Geospatial Branch of the CIO is responsible for managing the geospatial platform (ArcGIS Desktop, ArcGIS Enterprise, ArcGIS Online) for thousands of users across the Forest Service. My position is hosted on the Superior National Forest. The Superior NF comprises 3 million acres in northeastern MN and includes the million-acre Boundary Waters Canoe Area Wilderness (BWCAW).