Downloading attachments from a URL in text field

725
7
02-09-2024 12:20 PM
Labels (2)
leahmaps
Occasional Contributor II

Hi!

TLDR: I need to download images from text fields with URLs in a feature class, all sorted by their related address and I am not sure how I could run this process

I am not sure if this is the right community to post this in, but I am starting here since I hope to accomplish this using Pro if possible.

I have a feature class with multiple text fields that hold URL links to an image. I am hoping to download these and store them for record-keeping purposes. These are NOT in a related table, just fields in feature classes. I would need to sort these images together to the right information (in this case, an address).

The steps I would need to do are:

- Get images from the text field to download.

- sort information into folders on my hard drive (hopefully like Street > house number > images with field name as image name)

Has anyone done something like these before where attachments were not actually attachments just links through URLS? I know this Esri article exists, but it looks like that is for images added through attachments and not in a URL link.

Thanks in advance for any help!

7 Replies
ChristopherCounsell
MVP Regular Contributor

Python and a http request library. Go through the table, get attribute data, request the file, rename it based on table attributes and sort it into the folders. 

You could give some prompts and a sample dataset to Chat GPT and it'll help generate the script for you.

0 Kudos
MErikReedAugusta
Occasional Contributor III

Well this is excitingly-timed.  I was just discussing this operation with a teammate at work this week.  Disclaimer: I haven't had a chance to go through said teammate's code, yet, to peek under the hood.  I also haven't used urllib, yet, so the below code is an untested first-pass from me going through the documentation tonight.

Caveat emptor!

Also, it's possible I misunderstood something in your post about how these photos will be filed away.  The below code assumes there's an "address" field to create folders from and that all the photos in the URLs have proper filenames & file extensions.

If it doesn't work as intended, let me know & I'll try to take a second look at this next week when I have datasets to R&D with and/or when I've had a chance to compare notes with that teammate.

 

import pathlib
import urllib

# Warning: The lowercase "R" that you see before these strings is important!
#          Without it, any backslashes in your path will break things.
fc = r'Path\to\your\FeatureClass\here'
outDir = r'Path\where\you\want\the\photos\saved'

# This list should contain at minimum: The URL field and whichever field(s)
# you want to use to create subdirectories (if any).
# This also assumes the filename in the URL is fine; if not, add any fields
# you need for that new filename.  I didn't write that bit of code below,
# though.
fieldLst = ['URLField',
            'Address',
           ]

# Seach Cursor will iterate through your attribute table,
# reading the fields we specified in fieldLst, above
with arcpy.da.SearchCursor(fc, fieldLst) as cursor:
  for url, address in cursor:

    # First, we figure out the name of the subfolder from the address,
    # then we create that subfolder if it doesn't already exist.
    addressDir = pathlib.Path(outDir) / address
    pathlib.Path(addressDir).mkdir(parents=True, exist_ok=True)

    # Next, let's get the filename of the photo from that URL
    # Also, lots of urls need percent-encoded characters that will cause
    # us problems, like a space becoming %20.  unquote fixes that for us.
    parsedURL = urllib.parse.urlparse(url)
    photoName = urllib.parse.unquote(pathlib.Path(parsedURL.path).name)

    # Then, we need the path to where that photo is going to be saved
    photoPath = addressDir / photoName

    # Finally, we write that photo to file
    with open(photoPath, 'rb') as img:
      # This downloads the photo from the URL so we can write it
      with urllib.request.urlopen(url) as imgURL:
        # And this actually CREATES the photo file
        img.write(imgURL.read())

    # Because I'm generally paranoid & fastidious, let's clean up all those
    # variables we created along the way.  Cursors have pulled weird tricks
    # on me as they loop through, if I'm being careless.
    del blob, photoPath, photoName, parsedURL, addressDir

 

ChristopherCounsell
MVP Regular Contributor

Nice, this is what I was thinking but glad you have one ready to go!

 

0 Kudos
leahmaps
Occasional Contributor II

Hey! Super excited that this would work out. Attaching two little pieces of information that I think could help determine if this could work. 

1) here is what information from our dataset looks like (this is from AGOL, but we have this in pro, too)

leahmaps_0-1707747584458.png

This would work as anticipated because there is a address (street) and house number, correct? There is also a full address field. 

2) an X'd out version of a URL we are working with. This should also work because there is a .jpg at the end, correct? 
https://mydoforms.XXXXXXX.com/imageViewer?blobKey=ag9zfm15ZG9mb3Jtcy1ocmRyFwsSCmJsb2Jfc3RvcmUYgIDY77KJggoM&blobName=XXXXXXXXXXXXXX$$10042018122407$$Published$$1756$$P1$$1.jpg 

Fingers crossed that this is a solution to the problem! I assume this could be run in ArcGIS Pro Python window as long as all the information is filled out? If you get to test it, I would love to hear about how it ran!

 

Thank you so much.

0 Kudos
MErikReedAugusta
Occasional Contributor III

If you add "import arcpy" to the top of that script, it could even be run outside ArcGIS.  (I forgot that line when I wrote that code over the weekend)

 

Schema & Fields

Given your schema, I would recommend using the "Address" field that's at the furthest-left.  It's possible you might have some characters in there that won't work as a folder name, though.  You'll want to look into "validating folder names" in Python, and insert that step between Lines 25 & 26.

That's another step I didn't bother with for the proof-of-concept code above but would generally be a best-practice.

Hyperlink & Photo

think that will work.  I can't remember if those dollar signs will cause any trouble and/or if the unquote method (Line 32) will take care of them.  I'd say run it and see what errors you get.

 

Unfortunately, I probably won't have a chance to do any testing on this for a few days; gonna be a much busier week than I planned, this week.

0 Kudos
leahmaps
Occasional Contributor II

Hi!

Just a few follow up questions. I am pretty new to python so I want to make sure I am understanding certain parts of this script right.

1) In the

fieldLst =

['URLField',
'Address',
]

Are you needing to reference the field names or aliases here?

2) Semi-follow up to question 1, as you can see, I have MANY fields with image URLS. Can I do all pictures for a specific address at one time so they will all be sorted into the same folder and then named with the field alias?

i.e. here is a better picture of our dataset, tons of URLS with images. (there is more than what is pictured with URLS

leahmaps_0-1707771590328.png

I also attached a sample of how I was hoping to develop the folder structure.

leahmaps_1-1707771658377.png

 

Essentially, would i have to run the script multiple times for each field with a URL? Or can this script (somehow) be utilized so that each record tied to an address downloads all related URL fields/pictures at once?

 

Sorry for so many questions! I will be happy to test things out and report on how they worked out, I just need to understand more of how I can get there 🙂

MErikReedAugusta
Occasional Contributor III

Somehow I never got a notification for this post, so I apologies for the late reply!  I'm assuming you've solved it since then, but in case you haven't, and for anyone else who stumbles on this later via Google search:

Field Name vs Field Alias

What you want in that list are the field names, because that's what SearchCursor is expecting.

 

Multiple Photo Fields:

This has a few components that would likely need changing, but for the most part, it'll just work.  The original could also just be run once for every photo field, if you have the time.  But automation is more fun.  Since I can only see Field Aliases in your screenshot, I'm going to just give them placeholder names for demonstration purposes.

Let's say the field names for those fields in the screenshot are PreInstallPhotoNewNumberPhotoNewRadioPhotoPostInstallPhotoPostInstallRadioPhotoGallonPhoto:

First, add the new values to fieldLst in Lines 14–15:

 

fieldLst = ['Address',
            'PreInstallPhoto',
            'NewNumberPhoto',
            'NewRadioPhoto',
            'PostInstallPhoto',
            'PostInstallRadioPhoto',
            'GallonPhoto'
           ]

 

 

 Next, you need to make sure the SearchCursor can unpack them all in the same order.  (Note: I moved Address to be first just to keep the photos together.  What order you choose doesn't matter, so long as it's the same in both of these places.)  The part you need to change would be Line 21:

 

for address, preInstall, newNumber, newRadio, postInstall, postRadio, gallon in cursor:

 

 

In your example, you've broken the address up into subfolders.  I'm not going to go into that here for space reasons, but it can definitely be done.  You'll need to add some code around Line 22 that reads the address and breaks it up into House Number & the rest of the address.  This can be a little tricky if your data isn't consistent.  Your screenshot shows a space between them.  If that's always the first space, then you can just break it there.  If not, things rapidly get complicated.

Beyond that, the folder part doesn't change, since the photos are all getting stored together.  But we'll need to handle each of the filenames separately.  There's a few ways to handle this, but the simplest is probably to pull this part of the script out to a separate function.  Then you don't have to write it a bunch of times.  This  function needs to go above anything that wants to call it.  I recommend inserting it between the library imports and your tool inputs.  This means inserting this at around Line 3 of the original script.

Also, since you mention that some photo fields might be NULL, I added a bit at the top.  Line 3 below checks if there's an actual NULL, and just bails out of the function if there is.

Since it's a string field, I also added a quick check at Lines 5–6 that's looking for a "fake NULL".  This would be a field where there's just an empty string or a bunch of spaces, but no actual URL.  That's a problem you can easily run into, if your dataset is set up to not allow NULLs—and sometimes even if you do allow them!

Lastly, since we added these "bail out" parts by returning something, the whole script should probably also return something upon successful completion.  I just added a return None, since we don't really need it to return anything.

 

def ExportPhoto(folder, url):
    # First, make sure you actually have a URL:
    if url is None:
      return None
    elif isinstance(url, str):
      if url = ' ' * len(url):
        return None

    # Next, let's get the filename of the photo from that URL
    # Also, lots of urls need percent-encoded characters that will cause
    # us problems, like a space becoming %20.  unquote fixes that for us.
    parsedURL = urllib.parse.urlparse(url)
    photoName = urllib.parse.unquote(pathlib.Path(parsedURL.path).name)

    # Then, we need the path to where that photo is going to be saved
    photoPath = folder / photoName

    # Finally, we write that photo to file
    with open(photoPath, 'rb') as img:
      # This downloads the photo from the URL so we can write it
      with urllib.request.urlopen(url) as imgURL:
        # And this actually CREATES the photo file
        img.write(imgURL.read())

  # Since part of the function returns a value, ALL of the function should,
  # for consistency and user-friendliness.
  return None

 

 

Since we're handling this in a separate function, now, you should remove the code from Lines 28–42 of the original code.  Instead, we'll need to call that function for each of your photos.  This goes in where you just removed the other code, so starting at Line 28.

 

ExportPhoto(addressDir, preInstall)
ExportPhoto(addressDir, newNumber)
ExportPhoto(addressDir, newRadio)
ExportPhoto(addressDir, postInstall)
ExportPhoto(addressDir, postRadio)
ExportPhoto(addressDir, gallon)

 

 

Lastly, your cleanup on Line 47 should match your unpacking on Line 21.  So make sure your variables all match both places.