Batch merge files with same name in folders, subfolders etc.

8286
8
Jump to solution
11-02-2015 05:48 AM
SiyangTeo
Occasional Contributor

Hi, I am trying to build a script that combs through all shapefiles within a folder, and all it's subdivisions, and merge them together into a new shapefile. I modified it slightly from another script (sorry can't rem the source to acknowledge...). However, it prompted me this error "ExecuteError: Failed to execute. Parameters are not valid. ERROR 000400: Duplicate inputs are not allowed Failed to execute (Merge)."

The script is as stated below. Can anyone tell what is wrong with it?

Thank you!

import arcpy, os

workspace = r'C:\Users\xxx\Desktop\boundary'

output_folder = r'C:\Users\xxx\Desktop\New folder'

Dict = {}

for root, dirs, files in os.walk(workspace):

    for dir in dirs:

        arcpy.env.workspace = os.path.join(root,dir)

        for fc in arcpy.ListFeatureClasses():

            if not fc in Dict:

                Dict[fc] = []

                Dict[fc].append(os.path.join(root,fc))

            else:

                Dict[fc].append(os.path.join(root,fc))

for key in Dict:

    output = os.path.join(output_folder,key[:-4]) + '_merge'

    arcpy.Merge_management(Dict[key], output)

    print output + " created"

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
DarrenWiens2
MVP Honored Contributor

The following seems to work for me, merging all similarly named shapefiles (e.g. 'points.shp merged with 'folder/points.shp' merged with 'folder/folder/points.shp'), which I think is the point (but I could be wrong), not only merging those within the same directory. The only real difference was changing 'root,fc' to 'root,dir,fc'. By recording the 'dir' part within the path, it is not just one entry in the list, it is all unique paths.

import arcpy, os

workspace = r'C:\junk\folder'

Dict = {}

for root, dirs, files in os.walk(workspace):
    for dir in dirs:
        arcpy.env.workspace = os.path.join(root,dir)
        for fc in arcpy.ListFeatureClasses():
            if not fc in Dict:
                Dict[fc] = []
                Dict[fc].append(os.path.join(root,dir,fc))
            else:
                Dict[fc].append(os.path.join(root,dir,fc))

for key in Dict:
    output = os.path.join(workspace,key[:-4]) + '_merge'
    arcpy.Merge_management(Dict[key], output)

View solution in original post

8 Replies
JoshuaBixby
MVP Esteemed Contributor

A few comments:

  • The error message, in this case, is fairly specific and descriptive.  You are putting one or more datasets as inputs multiple times to the tool.  Since I don't know your folder and file structure, I suggest you add some print statements to find out the files you are adding so you can see where the duplicative inputs are coming from.

  • Python is case sensitive and Dict != dict, but Dict is pretty close to dict.  It is generally not a good idea to shadow built-in names, like dict. Although in this case Dict isn't shadowing the built-in dict, I think it is close enough to avoid using a variable with that name.

  • ArcGIS 10.1 SP1 introduced a Walk function in the ArcPy Data Access module (arcpy.da.Walk).  The ArcPy Walk function is a geospatial aware version of os.walk, and I encourage you to use it instead.
SiyangTeo
Occasional Contributor

Thanks for the suggestions!

Especially about arcpy.da.Walk, that will be very helpful for my future scripts.

0 Kudos
LukeSturtevant
Occasional Contributor III

I think you'll want a variation of this code found here​. I've modified the code and it worked for my test folder which has multiple subfolders and shapefiles.

import arcpy, os
workspace = r'C:\Users\xxx\Desktop\boundary'
output_folder =  r'C:\Users\xxx\Desktop\New folder'

Dict = {}
allFolders = [os.path.join(workspace, name) for name in os.listdir(workspace) if os.path.isdir(os.path.join(workspace, name))]

def recursive_list_fcs(workspace, wild_card=None, feature_type=None):
    """Returns a list of all feature classes in a tree.  Returned
    list can be limited by a wildcard, and feature type.
    """
    preexisting_wks = arcpy.env.workspace
    arcpy.env.workspace = workspace

    try:
        for root, dirs, files in os.walk(workspace):
            arcpy.env.workspace = root
            for fc in arcpy.ListFeatureClasses(wild_card, feature_type):           
                Dict[workspace].append(arcpy.Describe(fc).catalogPath)

            # Pick up workspace types that don't have a folder
            #  structure (coverages, file geodatabase do)
            subFolders = set(arcpy.ListWorkspaces()) - \
                         set(arcpy.ListWorkspaces('', 'FILEGDB')) -\
                         set(arcpy.ListWorkspaces('', 'COVERAGE'))

            for subFolder in subFolders:
                arcpy.env.workspace = os.path.join(root, workspace)
                for fc in arcpy.ListFeatureClasses(wild_card,feature_type):
                    Dict[workspace].append(arcpy.Describe(fc).catalogPath)

            for dataset in arcpy.ListDatasets('', 'FEATURE'):
                for fc in arcpy.ListFeatureClasses(wild_card,feature_type,dataset):
                    Dict[workspace].append(arcpy.Describe(fc).catalogPath)
        if len(Dict[workspace]) > 0:
            arcpy.Merge_management(Dict[workspace], os.path.join(output_folder,folderName+"_Merge.shp"))
            print (folderName+"_Merge.shp" + " created")

    except Exception as err:
        raise err
    finally:
        arcpy.env.workspace = preexisting_wks    

for folder in allFolders:   
    folderName = os.path.basename(folder).split(".")[0]
    print folderName
    Dict[folder]=[]
    recursive_list_fcs(folder,wild_card = None, feature_type = "Point")
DarrenWiens2
MVP Honored Contributor

I believe your most direct change may be to insert the 'dir' in between 'root' and 'fc' (which should result in the full path):

Dict[fc].append(os.path.join(root,dir,fc))
LukeSturtevant
Occasional Contributor III

I definitely agree with Darren, but as Siyang has is set up right now he will produce a dictionary with keys for every feature class name found and the only item entry will be the same feature class path. This will essential attempt to merge a single feature class for each dictionary key.

If Siyang's folder structure is set up with a workspace folder with multiple folders containing many shapefiles of the same type then he could modify his code like this:

import arcpy, os

workspace = r'C:\Users\xxx\Desktop\boundary'
output_folder = r'C:\Users\xxx\Desktop\New folder'

Dict = {}

for root, dirs, files in os.walk(workspace,topdown=True):
    for dir in dirs:
        env.workspace = os.path.join(root,dir)
        Dict[dir]=[]
        for fc in arcpy.ListFeatureClasses():
            if not fc in Dict[dir]:
                Dict[dir].append(os.path.join(root,dir,fc))
            else:
                Dict[dir].append(os.path.join(root,dir,fc))

for key in Dict:
    output = os.path.join(output_folder,key) + '_merge'
    arcpy.Merge_management(Dict[key], output)
    print output + " created"

The code I previously posted would do the same thing, but it also allows for feature type filtering and also for a folder structure with a top level workspace with multiple folders each containing multiple subfolders with their own feature classes in them. Hope this helps!

0 Kudos
DarrenWiens2
MVP Honored Contributor

The following seems to work for me, merging all similarly named shapefiles (e.g. 'points.shp merged with 'folder/points.shp' merged with 'folder/folder/points.shp'), which I think is the point (but I could be wrong), not only merging those within the same directory. The only real difference was changing 'root,fc' to 'root,dir,fc'. By recording the 'dir' part within the path, it is not just one entry in the list, it is all unique paths.

import arcpy, os

workspace = r'C:\junk\folder'

Dict = {}

for root, dirs, files in os.walk(workspace):
    for dir in dirs:
        arcpy.env.workspace = os.path.join(root,dir)
        for fc in arcpy.ListFeatureClasses():
            if not fc in Dict:
                Dict[fc] = []
                Dict[fc].append(os.path.join(root,dir,fc))
            else:
                Dict[fc].append(os.path.join(root,dir,fc))

for key in Dict:
    output = os.path.join(workspace,key[:-4]) + '_merge'
    arcpy.Merge_management(Dict[key], output)
LukeSturtevant
Occasional Contributor III

Oh Okay I must have misunderstood what Siyang was trying to get at. Darren your code definitely works if he is searching for shapefiles of the same name throughout the workspace.

0 Kudos
SiyangTeo
Occasional Contributor

Wow, thanks!

That explains why I have the "duplicated inputs" error message. There are many shapefiles of the same name but residing in different subfolders (or sub-subfolders). I totally missed out adding the root to the os.path.join to list the full path. Thanks for pointing my error out.

Works perfectly fine now. Bless you two, Luke and Darren.

0 Kudos