How to compare files in two folders?

5237
7
09-22-2013 11:29 PM
MaciejMarkowski
New Contributor
I'm pretty new user of ArcGIS. My intention is compare contents of the files (csv) from one folder with files in another folders . In those folders exist files with the same name but in second folder files have different endings.

First Folder:

    almukantarat.csv
    almukantarat1.csv

Second Folder:

    almukantarat_A.csv
    almukantarat1_A.csv

I don't know, maybe it can be done in Python or only in ModelBuilder (using Compare files?). Please, could You help me how script or model should look like? I have ArcGIS Desktop 10.0.
In one file could be thousands objects, in each folder could be hundred files, so from my site it isn't possible to compare it manually.Thanks a lot
Tags (2)
0 Kudos
7 Replies
JakeSkinner
Esri Esteemed Contributor
You can iterate through the first folder and then iterate through the second folder, find the table name that contains the name of the file from the first folder, and then perform the comparison.  Here's an example:

import arcpy, os
from arcpy import env

folder1 = r"C:\temp\python\A"
folder2 = r"C:\temp\python\B"

env.workspace = folder1

for file in arcpy.ListFiles("*.csv"):
    fileName = file.split(".")[0]
    env.workspace = folder2
    for file2 in arcpy.ListFiles("*.csv"):
        if fileName in file2:
            arcpy.FileCompare_management(folder1 + os.sep + file, file2, "ASCII", "CONTINUE_COMPARE", r"C:\Temp\Python" + os.sep + fileName + "_compare.txt")
    env.workspace = folder1
0 Kudos
MaciejMarkowski
New Contributor
Hi, Thank You so much, I will try to implement Your suggestion, and of course I will let You know if it is working. Thanks.
0 Kudos
MaciejMarkowski
New Contributor
I think that in this example every each file from folder1 is comparing with all files from folder2. My intention is comparing first file from folder1 with first file from folder2, than second file from folder1 with second folder from folder2.
0 Kudos
RDHarles
Occasional Contributor
The big question here is, how are your .csv files named in each folder.  If they are all named like your example, it wouldn't be hard to loop through the first folder, split the name on the underscore to get the base name and then use that to grab the compare file in folder 2 for the tool.  Is that the case?
0 Kudos
MaciejMarkowski
New Contributor
I tried to use this script and it seems that everything is working. Thank You. I have one question more why in this script "fileName = file.split(".")[0]" exist "[0]. What is changing by this?
Maybe it is not smart question, but as I mentioned I just started in python.
0 Kudos
curtvprice
MVP Esteemed Contributor
I tried to use this script and it seems that everything is working. Thank You. I have one question more why in this script "fileName = file.split(".")[0]" exist "[0]. What is changing by this?
Maybe it is not smart question, but as I mentioned I just started in python.


Jake used the string .split() method to extract the base name of the file (without the extension).

A very cool thing about Python is you can try this yourself interactively at the prompt to figure out how it works:

>>> "filename.txt".split(".")
['filename', 'txt']
>>> "filename.txt".split(".")[0]
'filename'


An alternative way to do this is to use the os.splitext() method, which will give you correct results even if the file name contains a ".", for example: "file.name.txt":
>>> "file.name.txt".split(".")
['file', 'name', 'txt']
>>> os.path.splitext("filename.txt")
('filename', '.txt')
>>> os.path.splitext("file.name.txt")
('file.name', '.txt')


Here's my version of the script, which takes more advantage of the os module, and loops more the way you were asking. Unlike Jake's script, however, any deviations from your file naming setup will cause the script to fail. Jakes method is more forgiving. (My version isn't really better, just different.)

Another difference: I'm writing the results to folder1 instead of TEMP.

import arcpy, os
from arcpy import env

folder1 = r"C:\temp\python\A"
folder2 = r"C:\temp\python\B"

env.workspace = folder1

for file in arcpy.ListFiles("*.csv"):
    # calculate file name
    # "file_name.csv" -> "file_name_A.csv"
    rootname = os.path.splitext(file)[0] 
    file2name = "{0}_A.csv".format(rootname)
    file2path = os.path.join(folder2, file2name)
    # calculate compare file (output) file name
    compare_results = "{0}_cmp.txt".format(os.path.basename(file))
    # compare the files
    arcpy.FileCompare_management(file, file2path, 
         "ASCII", "CONTINUE_COMPARE", compare_results)
0 Kudos
MaciejMarkowski
New Contributor
Once again Thank You. Yes it's true, that it is working also. I thinking about one thing more.
If I have file:
almukantarat_B.csv
and use script:
>>> for file in arcpy.ListFiles("*.csv"):
...  rootname = file.split("_")[1]
...  rootname2 = rootname.split(".")[0]

then probably I have 3 parts:
- almukantarat (rootname[1])
- B (rootname2[0])
- csv (rootname2[1])
So How Can I join those parts to get name: "almukantarat_B.csv" . I think I can use those 3 parts and command join, but could You explain how script should look like?
I'm asking about it because then I will more possibility and the script will be more universal in case which I describe in my first post.
So then I will can compare files between two folders without caring about file from folder2. The structure of file could be: *_*.csv.
By the way: I'm very positive surprised of this forum and presence very helpful persons.:o
0 Kudos