Python script - Create attachment match table

774
2
Jump to solution
07-12-2023 08:00 AM
BarryG
by
Occasional Contributor

We are attempting to develop a Python script to create a match table for attachments to a feature class (Tax_parcels), but each modification at this point has the same result (that all file attachments are being attached to the second record only).

The GPIN field is the field to search for the attachment file names in the format "????-??-????". This is to filter the filenames, because the file attachments filenames include additional characters, and therefore not an exact match.

The sample data and Python script is attached (the Python script has been changed to a txt extension simply to be capable of attaching to the post).

0 Kudos
1 Solution

Accepted Solutions
JohannesLindner
MVP Frequent Contributor

To insert code:

JohannesLindner_0-1689266678674.png

JohannesLindner_1-1689266693900.png

 

 

This is your script, so that I can call out line numbers:

import glob
import arcpy
import os
import re

arcpy.env.overwriteOutput = True

# Set the directory to search
directory = r"G:\GIS_Projects\ArcGIS_Pro\Photo_attachments\Test"

# Set the pattern to search for
pattern = "????-??-????"

# Find all files that match the pattern
files = glob.glob(directory + "\\" + pattern)

# Print the list of matching files
print(files)

# Set the workspace and output location
workspace = r"C:\TEMP\Test_gd.gdb\Tax_parcels"
output_table = r"C:\TEMP\Test_gd.gdb\tblMatch_6"

# Set the character to search for
search_character = pattern

# Create an empty list to store attachment paths
attachment_paths = []

# Walk through the workspace directory and find attachments
for root, dirs, files in os.walk(workspace):
    for file in files:
            attachment_path = os.path.join(root, file)
            attachment_paths.append(attachment_path)

# Call the GenerateAttachmentMatchTable function with the appropriate input parameters
attachment_folder = workspace # The folder containing the attachments
attachment_key_field = "OBJECTID" # The field in the input table that corresponds to the attachment key field
file_extension = "*.jpg; *.pdf" # The file extensions of the attachments to match
relative_path = "RELATIVE" # The path type of the attachment relative to the workspace

arcpy.GenerateAttachmentMatchTable_management(workspace, directory, output_table, attachment_key_field, file_extension, relative_path)

# Open an insert cursor to populate the table with match results
with arcpy.da.InsertCursor(output_table, ["AttachmentPath", "MatchCount"]) as cursor:
    # Loop through the attachment paths and count the occurrences of the search character
    for attachment_path in attachment_paths:
        with open(attachment_path, "r") as attachment_file:
            content = attachment_file.read()
            match_count = content.count(search_character)
            cursor.insertRow([attachment_path, match_count])
            
        filename = os.path.basename(attachment_path)

 

Honestly, I'm amazed that your script runs at all.

Line 15: glob() shouldn't find anything here, because your files don't match the pattern "????-??-????", they match the pattern "????-??-????*", note the asterisk wildcard at the end. The whole operation is useless anyway (except maybe for information purposes), because you overwrite the variable in line 31.

Line 21: You set the variable workspace to your feature class, but later on you use it like it was the folder that contains the images.

Line 31: You're trying to walk through a feature class, not the directory.

Line 33/34: Now you're just appending each file in the folder, without checking for your pattern.

Line 37: Nope, still not the correct variable.

Line 38: Why OBJECTID? Your key field is GPIN!

Lines 45 - end: I have no idea what you're trying to do here. You open an InsertCursor on the generated Attachment Match Table, using non-existing fields. Then you loop through the paths you found earlier and read their content as text (but they're jpg or pdf!). And then I assume you want to search for your pattern in the content of these files, but you actually search for the literal string "????-??-????".

 

 

This all seems like there is some confusion about what the tool Generate Attachment Match Table does. The tool creates a table that matches table rows to files based on the files' names and a corresponding field in the table. The output is meant to be used for Add Attachments, to add the matched files as attachments to those rows. After that the output table can be deleted, it doesn't serve any further purpose.

 

In previous ArcGIS versions, the file name and field value had to match perfectly, else there would be no match. Also, the tool could only do a 1-1 match. If you had multiple files that belonged to the same row, you could only match one of them. If you work in ArcGIS Pro version < 3.1, I have a script here, where I do what I think you want to do. Just replace your paths in lines 1-4 and use GPIN as key_field in line 5.

 

ArcGIS Pro 3.1 added the "Match Pattern" parameter that makes the tool much more flexible. Just set it to "Any" or "Prefix" and you should get all your matches correctly:

JohannesLindner_2-1689268969655.pngJohannesLindner_3-1689268988779.png

 


Have a great day!
Johannes

View solution in original post

2 Replies
JohannesLindner
MVP Frequent Contributor

To insert code:

JohannesLindner_0-1689266678674.png

JohannesLindner_1-1689266693900.png

 

 

This is your script, so that I can call out line numbers:

import glob
import arcpy
import os
import re

arcpy.env.overwriteOutput = True

# Set the directory to search
directory = r"G:\GIS_Projects\ArcGIS_Pro\Photo_attachments\Test"

# Set the pattern to search for
pattern = "????-??-????"

# Find all files that match the pattern
files = glob.glob(directory + "\\" + pattern)

# Print the list of matching files
print(files)

# Set the workspace and output location
workspace = r"C:\TEMP\Test_gd.gdb\Tax_parcels"
output_table = r"C:\TEMP\Test_gd.gdb\tblMatch_6"

# Set the character to search for
search_character = pattern

# Create an empty list to store attachment paths
attachment_paths = []

# Walk through the workspace directory and find attachments
for root, dirs, files in os.walk(workspace):
    for file in files:
            attachment_path = os.path.join(root, file)
            attachment_paths.append(attachment_path)

# Call the GenerateAttachmentMatchTable function with the appropriate input parameters
attachment_folder = workspace # The folder containing the attachments
attachment_key_field = "OBJECTID" # The field in the input table that corresponds to the attachment key field
file_extension = "*.jpg; *.pdf" # The file extensions of the attachments to match
relative_path = "RELATIVE" # The path type of the attachment relative to the workspace

arcpy.GenerateAttachmentMatchTable_management(workspace, directory, output_table, attachment_key_field, file_extension, relative_path)

# Open an insert cursor to populate the table with match results
with arcpy.da.InsertCursor(output_table, ["AttachmentPath", "MatchCount"]) as cursor:
    # Loop through the attachment paths and count the occurrences of the search character
    for attachment_path in attachment_paths:
        with open(attachment_path, "r") as attachment_file:
            content = attachment_file.read()
            match_count = content.count(search_character)
            cursor.insertRow([attachment_path, match_count])
            
        filename = os.path.basename(attachment_path)

 

Honestly, I'm amazed that your script runs at all.

Line 15: glob() shouldn't find anything here, because your files don't match the pattern "????-??-????", they match the pattern "????-??-????*", note the asterisk wildcard at the end. The whole operation is useless anyway (except maybe for information purposes), because you overwrite the variable in line 31.

Line 21: You set the variable workspace to your feature class, but later on you use it like it was the folder that contains the images.

Line 31: You're trying to walk through a feature class, not the directory.

Line 33/34: Now you're just appending each file in the folder, without checking for your pattern.

Line 37: Nope, still not the correct variable.

Line 38: Why OBJECTID? Your key field is GPIN!

Lines 45 - end: I have no idea what you're trying to do here. You open an InsertCursor on the generated Attachment Match Table, using non-existing fields. Then you loop through the paths you found earlier and read their content as text (but they're jpg or pdf!). And then I assume you want to search for your pattern in the content of these files, but you actually search for the literal string "????-??-????".

 

 

This all seems like there is some confusion about what the tool Generate Attachment Match Table does. The tool creates a table that matches table rows to files based on the files' names and a corresponding field in the table. The output is meant to be used for Add Attachments, to add the matched files as attachments to those rows. After that the output table can be deleted, it doesn't serve any further purpose.

 

In previous ArcGIS versions, the file name and field value had to match perfectly, else there would be no match. Also, the tool could only do a 1-1 match. If you had multiple files that belonged to the same row, you could only match one of them. If you work in ArcGIS Pro version < 3.1, I have a script here, where I do what I think you want to do. Just replace your paths in lines 1-4 and use GPIN as key_field in line 5.

 

ArcGIS Pro 3.1 added the "Match Pattern" parameter that makes the tool much more flexible. Just set it to "Any" or "Prefix" and you should get all your matches correctly:

JohannesLindner_2-1689268969655.pngJohannesLindner_3-1689268988779.png

 


Have a great day!
Johannes
BarryG
by
Occasional Contributor

Thank you Johannes. We could not adapt the script to function properly, though took your advice and utilized the new tool function "Match Pattern" in the Generate Attachment Match Table geoprocessing tool with ArcGIS Pro 3.1. I didn't realize they added this function before.

0 Kudos