Finding All Geodatabases Inside Main Folder and Subfolders

4143
11
07-23-2010 06:41 AM
JeremyKoontz
New Contributor
Good morning everyone,

I am working on trying to figure out how to create a list of full-path geodatabases that are inside a user-given folder.  I would like it to be able to look down into any number of subfolders to find geodatabases ( I.E. if the main folder is "D:\Data\, I would like it to find a geodatabase that is located in "D:\Data\this\is\my\deep\subfolder\path\something.gdb" ).  I currently have it working to pick up all personal databases by using regular expression matching, but the file geodatabases are giving me a lot of trouble.  Here is the code that I have so far:

for root, dirname, filenames in os.walk ( folder ):
    for file in filenames:
        if ( re.match ( "([A-Za-z0-9_]*).gdb", file ) ):
            geodatabases.append ( os.path.join ( root, file ) )
        elif ( re.match ( "([A-Za-z0-9_]*).mdb", file ) ):
            geodatabases.append ( os.path.join ( root, file ) )


What its doing is that it will find all of the files that are in the file geodatabase folder (which are conveniently named *.gdbindex and such, which are caught by my regular expression) and put those in the list.  I just want it to be able to figure out that it is a file geodatabase and put the path of the geodatabase in the list.

Thank you for your time!

Jeremy
0 Kudos
11 Replies
LoganPugh
Occasional Contributor III
First problem is the regexp doesn't match, for it to match your pattern must account for the start of the string. Use re.search instead if you don't want to tweak your regexp.

Also, since file GDBs are actually folders, you can look for them in your root variable within the main loop.

e.g.

for root, dirname, filenames in os.walk(folder):
    if (re.search("([A-Za-z0-9_]*).gdb", root)):
        geodatabases.append(root)
    else:
        for file in filenames:
            if (re.match("([A-Za-z0-9_]*).mdb", file)):
                geodatabases.append(os.path.join(root,file))
JeremyKoontz
New Contributor
Logan,
Thank you for your help!  With a minor modification to your code, I was able to get it working!  The issue with it was that by using search and just appending the root variable to my list, it was matching all of the files inside the file geodatabase, which have the extensions .gdbtables, .gdbindexes, etc.  It was picking all of those files up in the search and was appending the same geodatabase to the list multiple times.  I just did a simple check to see if the database had already been added.  It's not a huge deal now since it works correctly, but is there some regular expression to allow it to match ONLY .gdb and not .gdb******?  I can't seem to figure that one out, but it works now, so it's quite alright if no one knows it!

Here is the modified code that will search through a parent folder and find all personal and file geodatabases and append them to a list:

for root, dirname, filenames in os.walk ( folder ):
    for file in filenames:
        if ( re.search ( "([A-Za-z0-9_]*).gdb", root ) ):
            if ( root not in geodatabases ):
                geodatabases.append ( root )
        elif ( re.match ( "([A-Za-z0-9_]*).mdb", file ) ):
            geodatabases.append ( os.path.join ( root, file ) )


Thanks again Logan, you're a life saver!

Jeremy
0 Kudos
LoganPugh
Occasional Contributor III
Hmm, the code I posted works for me. The key was not to search for *.gdb in file in the filenames loop, but root names in the main loop, because file geodatabases are folders, not files.

In response to your other question, you can use the $ character to match the end of the string, meaning the last character in the match must be at the end of the string. So to make my example code a little more robust (for example if someone had renamed test.gdb to test.gdb.backup, the test.gdb.backup would not be matched):

for root, dirname, filenames in os.walk(folder):
    if (re.search("([A-Za-z0-9_]*).gdb$", root)):
        geodatabases.append(root)
    else:
        for file in filenames:
            if (re.match("([A-Za-z0-9_]*).mdb", file)):
                geodatabases.append(os.path.join(root,file))
0 Kudos
JeremyKoontz
New Contributor
Logan,
I'm sorry, you were completely right!  I didn't notice that you changed the loop like you did, I thought the only changes were the "file -> root" and "re.match -> re.search".  Using the code that you posted worked perfectly as well, and the knowledge of the $ will be useful too!

Thank you very much for your time and knowledge!  I appreciate it!

Jeremy
0 Kudos
LoganPugh
Occasional Contributor III
Glad to help!

One thing I forgot to mention. "." is a special character in regular expressions, it means to match any character except a newline. So you should escape the . with a backslash (\.)

A description of Python's regular expression syntax is here: http://docs.python.org/release/2.5.1/lib/re-syntax.html

And because Access .mdb files will be found by your second regular expression, you should test whether they are valid personal geodatabases or not. However I am not aware of any foolproof way of doing that in Python. I was hoping the Describe properties for a Workspace would be of help but they do not differentiate between normal Access .MDBs and valid personal geodatabases.
0 Kudos
ChrisSnyder
Regular Contributor III
To be ESRI-compliant, you should really use pgdbList = gp.listworkspaces("","ACCESS") and fgdbList = gp.listworkspaces("","FILEGDB") as part of the os.walk command, but how about this:

import os
rootDir = r"D:\csny490"
gdbList = []
mdbList = []
for dirPath, dirNames, fileNames in os.walk(rootDir, topdown=True):
   if dirPath.endswith(".gdb") or ".gdb." in dirPath:
      gdbList.append(dirPath)
   for file in fileNames:
      if file.endswith(".mdb"):
         mdbList.append(dirPath + "\\" + file)
0 Kudos
JeremyKoontz
New Contributor
Logan,

I did know that '.' was a special character in regular expressions, but I wasn't sure how to make it so it would recognize the '.' in a file extension and not the special character!  Thanks for the info!

Chris,

Thank you for letting me know about listWorkspaces, that hopefully is the answer to Logan's problem with making sure that they are actual personal geodatabases and not just access databases!  This is what I came up with by using your code and the listWorkspaces method:

for dirPath, dirNames, fileNames in os.walk ( folder ):
    gp.workspace = dirPath
    geodatabases2 = gp.listWorkspaces ( "*", "Access" )
    if ( len ( geodatabases2 ) > 0 ):
        geodatabases.append ( geodatabases2 )
    geodatabases2 = gp.listWorkspaces ( "*", "FileGDB" )
    if ( len ( geodatabases2 ) > 0 ):
        geodatabases.append ( geodatabases2 )


It walks through each directory and changes the workspace according to which folder it is in, and then checks to see if there are any personal or file geodatabases in there.  If there aren't, it will not append anything to the main list, but if there is, it will add that path to the main list.  It seems to work perfectly, and by using the ESRI method, it should only find Access databases that are valid to Arc and not ones that were created outside.  I hope this bit of code helps someone in the long run, it has definitely helped me!  Thank you both for your support!

Jeremy
0 Kudos
LoganPugh
Occasional Contributor III
From my testing (on 9.3.1 SP2, may be different in 10) gp.ListWorkspaces("*", "Access") still returns normal Access mdb files, not just personal geodatabases. It's also dreadfully slow if you're walking through a fair amount of directories.

Using .endswith is a good idea though, it's bound to be a lot faster than using an uncompiled regular expression.
0 Kudos
LoganPugh
Occasional Contributor III
Curious to see if anyone/ESRI knows of a foolproof way in Python to determine whether an MDB file is a geodatabase or a plain old Access database.
0 Kudos