arcpy.mapping.ListLayers causes Python script to suddenly exit without any errors

VincentLantaca1 · ‎09-28-2018

I'm trying to write a program to get all the data sources of all layers in a .mxd map document. I am using Python 2.7.10 with ArcGIS 10.4.1. On some .mxd's it seems to work just fine, while on others, calling ListLayers on line 16 simply causes the Python script to suddenly exit. It doesn't say why it stopped, it doesn't give an error. I have tried placing the function where I call ListLayers in a try except statement and it doesn't throw an exception either.

I have put print statements to figure out where the program stops and it always stops right after the call to ListLayers (it seems to never return) Is there a bug with ListLayers or am I using it wrong?

import arcpy, os, datetime, sys


root_directory = r"M:\path_to_mxd\document.mxd"

data_sources = {} # mapping from resource names to an array of paths to resources with that name
resource_locations = [] # a running list of all the drive letters and unc locations used in resources

def store_data_sources(mxd_path):
    global data_sources, resource_locations

    print("signal 0: before MapDocument")
    mxd = arcpy.mapping.MapDocument(mxd_path)
    print("signal 1: returned from MapDocument")
    if mxd:
        layers = arcpy.mapping.ListLayers(mxd)
        print("signal 2: returned from ListLayers")
        if layers:
            for lyr in layers:
                print("signal 3: looping through layers")
                if lyr.supports("dataSource"):
                    print("signal 4: supports dataSource")
                    data_path = lyr.dataSource
                    # keep track of resource drive letters and unc locations
                    splitdrive = os.path.splitdrive(data_path)
                    if not (splitdrive[0] == ""):
                        if not splitdrive[0] in resource_locations:
                            resource_locations.append(splitdrive[0])
                    # keep a list of all resource paths in the mxd
                    if lyr.name in data_sources:
                        if data_path in data_sources[lyr.name]:
                            continue
                        else:
                            data_sources[lyr.name].append(data_path)
                    else:
                        data_sources[lyr.name] = [data_path]
                continue
            del layers
        del mxd

def dump_str_to_arr_hash(hashmap):
    for key in hashmap:
        print("\t" + key)
        array = hashmap[key]

        for string in array:
            print("\t\t" + string)

def str_to_arr_hash_count_items(hashmap):
    count = 0
    
    for key in hashmap:
        array = hashmap[key]
        for item in array:
            count += 1
    return count
            
start_time = datetime.datetime.now()
print(start_time)
print("")

print("Exploring path: '" + root_directory + "'")
arcpy.env.workspace = root_directory
try:
    store_data_sources(root_directory)
except:
    print("An exception has occurred!")
print("")

num_sources = str_to_arr_hash_count_items(data_sources) # number of unique data sources found

print("Number of data sources found: " + str(num_sources))
print("")

# print the drives that are used by resources
print("Drive letters and network paths used by resources:")
for loc in resource_locations:
    print("\t" + loc)
print("")

print("Paths to resources:")
dump_str_to_arr_hash(data_sources)

print("")
end_time = datetime.datetime.now()
print(end_time)
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

JoshuaBixby · ‎09-28-2018

I suggest you open a case with Esri Support. I have a data source mining script I created years back, and it would crash on certain MXDs. Typically, it is something in the MXD that isn't quite right, not corrupt enough to stop from opening in ArcMap but corrupt enough to cause ArcPy functions to fail. Spending time to determine why or how exactly the MXD got partially corrupted became a fool's errand, so I re-wrote my script to catalog to use multiprocessing so the crashed worker process wouldn't stop the whole exercise.

Good luck, there are countless ways that users can mess up MXDs.

VincentLantaca1 · ‎10-05-2018

Do you remember what version of arcpy you were using for that data source mining script? and were you using arcpy.mapping.MapDocument() and arcpy.mapping.ListLayers() in the worker processes?

JoshuaBixby · ‎10-06-2018

Last time I ran the script in our data center on a large volume of MXDs, I am fairly certain it was ArcGIS 10.3.1. Yes, each worker process was given a string path to an MXD and used MapDocument and ListLayers to enumerate the layers.

I think MapDocument has some kind of memory-management issue that shows itself after a process has opened a lot of MXDs. What is a lot? It seemed to vary depending on the content of the MXDs. Sometimes a single worker process could make it through hundreds before running into an issue, sometime less than 25. I knew it wasn't MXD corruption alone because if I re-submitted the MXD that was being processed when the process hung or crashed, it would work just fine. Speaking of corruption, it was also a problem. When processing hundreds of thousands of MXDs from thousands of users over years, it even surprised me what some end users could do in an MXD. How do you think an MXD with 1,200 layers would perform?

I tried numerous tactics to deal with these issues. I did worker process pool recycling after so many MXDs, that mostly addressed the first issue, but it didn't address the second issue. I eventually just settled on a time-out for each worker process. If a worker process took more than so many minutes, I would kill the process and have the pool spool up a new one. At the end of the script, I would have a list of MXDs that didn't get processed the first time, and I would re-run them to determine which ones are valid and which ones corrupt.

IraKoroleva1 · ‎02-24-2023

Hello Joshua, it seems that I'm working on a very similar process and I'm facing very similar issues, so my last idea is to add time-out to the script. While walking through a list of MXDs, my script is able to skip an MXD and continue running if it hits an error but if an MXD is just taking hours to load, it will keep trying forever. Can you please share an example of a time-out code? Are you using multiprocessing module for that?

JoshuaBixby · ‎02-25-2023

I ended up creating two lists, one for the actual multiprocessing pool and a second with metadata I created and populated about the jobs I put into the pool. One of the items I tracked in the metadata pool was the start time of when the job was submitted to the pool. As I looped over the pool to retrieve results, I would check the jobs that were still running and see when their start time was, and I would calculate elapsed time. If a running job passed a designated elapsed time, I would issue a terminate command. I would also use the metadata pool list to check whether an existing process had crashed by keeping a list of PIDs. If a process had crashed, I would print a warning.