Attachment download fails [FileNotFoundError] when name is local path

BijanTaheri · ‎10-16-2023

I'm using the clone_items() function, which works fine for most operations.

For items that have attachments the clone_items() function (I'm assuming) runs the download attachment function from the AttachmentManager as a part of the cloning process.

The download attachment fails if the name of the attachment is a path referering to a local image, for the person that uploaded the image. For the most part the attachments are names "Fotos1.jpg", "Fotos2.jpg" and so on.

If the photo name is called "C:\\Users\\XXX\\Desktop\\XXX\\XXX.JPG' i get the error:

FileNotFoundError: [Errno 2] No such file or directory: "C:\\Users\\XXX\\Desktop\\XXX\\XXX.JPG'

Attachment info:

source_items.layers[0].attachments.get_list(oid=145)

[{'id': 61,
'globalId': 'eea8bc6b-5d97-4410-b10e-a8bc96c22788',
'parentGlobalId': '73e0b288-2a5d-4e75-bf75-1b672eff790a',
'name': "C:\\Users\\XXX\\Desktop\\XXX\\XXX.JPG',
'contentType': 'image/jpeg',
'size': 6387442,
'keywords': '',
'exifInfo': None}]

source_items.layers[0].attachments.download(oid=145, save_path=r"C:\LOCAL\PATH")

FileNotFoundError: [Errno 2] No such file or directory: "C:\\Users\\XXX\\Desktop\\XXX\\XXX.JPG'

There is a lot of features with these types of filenames spread across a lot of (hosted) feature services, so i can't just change the names of the files.

It is in ArcGIS Online.

Does anybody have an idea of why it happens, a workaround or anything else?

BijanTaheri · ‎10-18-2023

@EarlMedinaThank you very much for this workaround. I tried it and it worked great.

I also looked a bit further down the path you started and believe I found a more generic workaround, that doesn't include hardcoding a path in the source files.

I modified the AttachmentManager.download function in the managers.py file (C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib\site-packages\arcgis\features\managers.py)

I added att_name = os.path.basename(att_name) to the code. Line 14 in example below, line 595 in the managers.py file. This makes sure that if a filepath name is added as a filename, it only uses the filename from the path. I tested it on a few different datasets, and works great.

It even works in relation to the clone_items() function and uses the new basename in the target as attachment filename.

if not return_all:
    oid = oid[0]
    paths = []
    for att in attachment_id:
        att_path = "{}/{}/attachments/{}".format(self._layer.url, oid, att)
        att_list = self.get_list(int(oid))

        # get attachment file name
        desired_att = [att2 for att2 in att_list if att2["id"] == int(att)]
        if len(desired_att) == 0:  # bad attachment id
            raise RuntimeError
        else:
            att_name = desired_att[0]["name"]
            att_name = os.path.basename(att_name)

        if not save_path:
            save_path = tempfile.gettempdir()
        if not os.path.isdir(save_path):
            os.makedirs(save_path)

I will write an enhancement request to the technical support team this week.

Thanks again for the help 🙂

View solution in original post

EarlMedina · ‎10-16-2023

Hello,

I don't believe I've seen this problem before, but as you noted the fix that comes to mind is downloading all the attachments, re-uploading, reviewing to make sure filenames appear correctly, deleting the originals. Seems like the filepath is not being parsed correctly due to the filename appearing as a path.

This can be done programmatically to expedite the process. You would do something like this:

Use AttachmentManager.search with paging and an `attachment_where` clause to get all attachments with bad filenames.
Download all attachments with bad filenames using AttachmentManager.download
Rename as needed locally.
Upload each attachment to the correct feature using AttachmentManager.add
Review and make sure the uploads were successful.
Delete the originals.
Re-attempt the clone.

Of course, even this approach is a bit laborious. I would probably only attempt it on the smallest of the Feature Layers to start.

BijanTaheri · ‎10-17-2023

@EarlMedinaThank you for your reply.

This was my thought process as well, but the problem is that the attachments can't be downloaded because the download functions looks for the attachment in the bad filename/local path instead of the actual locations.

Using the UI i can see the attachment, and by replicating the feature service to a GDB using the REST api i can also see the attachments. It is only when i use the Python API that it fails.

I also haven't seen this issue before, and it definitely isn't a regular issue. But i think the AttachmentManager download function is handling the filenames wrong.

I attached the error log, if it is any help.

EarlMedina · ‎10-17-2023

Hey @BijanTaheri , I was able to simulate your problem on my end with a bit of work. It's my opinion that you can't resolve the download issue in the API without modifying the source. Consequently, I would suggest that you open a case with Technical Support to log an enhancement to the API to add an additional check for invalid paths/illegal characters with respect to the download function (which relies on the stream_response_to_file function).

The change is simple, but I have to state that modifying the source voids any official support. I understand, however, that this is a recovery operation for you at this point. Therefore, if you want to try the download workaround I'd suggest creating a separate python environment to experiment in.

Basically, you want to add an additional step that cleans the filename. For cases where a file could have the same name, you'll get an error so you also have to introduce some uniqueness. Start by opening (as admin) the file C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib\site-packages\requests_toolbelt\downloadutils\stream.py.

Go to the stream_response_to_file function. Look for the else statement and update to appear like so:

    else:
        filename = get_download_file_path(response, path)
        from datetime import datetime
        time = datetime.now().strftime("%H%M%S%f")
        parsed_filename_w_ext = filename.split("\\")[-1]
        name = parsed_filename_w_ext.split(".")[0]
        ext = parsed_filename_w_ext.split(".")[-1]
        new_name = f"{name}_{time}.{ext}"
        dir_path_to_download_to = r"C:\Users\yourUser\directory"
        filename = os.path.join(dir_path_to_download_to, new_name)
       
        if os.path.exists(filename):
            raise exc.StreamingError("File already exists: %s" % filename)
        fd = open(filename, 'wb')

Be sure to set the value of "dir_path_to_download_to" to whatever the value of "save_path" would have been. Save as admin. Retry the download function. Change source back to original when done.

BijanTaheri · ‎10-18-2023

@EarlMedinaThank you very much for this workaround. I tried it and it worked great.

I also looked a bit further down the path you started and believe I found a more generic workaround, that doesn't include hardcoding a path in the source files.

I modified the AttachmentManager.download function in the managers.py file (C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib\site-packages\arcgis\features\managers.py)

I added att_name = os.path.basename(att_name) to the code. Line 14 in example below, line 595 in the managers.py file. This makes sure that if a filepath name is added as a filename, it only uses the filename from the path. I tested it on a few different datasets, and works great.

It even works in relation to the clone_items() function and uses the new basename in the target as attachment filename.

if not return_all:
    oid = oid[0]
    paths = []
    for att in attachment_id:
        att_path = "{}/{}/attachments/{}".format(self._layer.url, oid, att)
        att_list = self.get_list(int(oid))

        # get attachment file name
        desired_att = [att2 for att2 in att_list if att2["id"] == int(att)]
        if len(desired_att) == 0:  # bad attachment id
            raise RuntimeError
        else:
            att_name = desired_att[0]["name"]
            att_name = os.path.basename(att_name)

        if not save_path:
            save_path = tempfile.gettempdir()
        if not os.path.isdir(save_path):
            os.makedirs(save_path)

I will write an enhancement request to the technical support team this week.

Thanks again for the help 🙂

EarlMedina · ‎10-18-2023

Nice! Very glad to hear this worked out and good catch! I didn't do much digging around so having that info on where the best place to fix is will help a lot.