Network Analyst in Python Loop becomes exponentially slower

FrankCardone1 · ‎09-25-2023

Hi Everyone,

I have an ArcGIS Python Toolbox that loops through a Location-Allocation problem, to analyze changes in allocation based on our growth forecasts. When I run it, the first iteration of the loop takes 10 minutes, but every subsequent loop takes much longer than the previous. By the 10th iteration, it takes multiple hours. Any idea on what is going wrong?

Code:

import arcpy

import os
import pandas as pd
import numpy as np
import arcgis
from arcgis.features import GeoAccessor, GeoSeriesAccessor



def location_allocation_tool(nd_layer, travel_mode, geodatabase, demand, scc_layer, line_barriers, search_cutoff, site_count=20):
    if 'yr0' in demand:
        year_tag = 'Capacity_Current'
    else:
        year_tag = demand.split("_")[-1]   
    print(year_tag)
    demand_lyr = 'demand_lyr'
    capacity_column = [x.name for x in arcpy.ListFields(scc_layer) if year_tag in x.name][0]
    arcpy.AddMessage("begin optimization for {}...".format(demand))
    arcpy.MakeFeatureLayer_management(demand, demand_lyr)
    out_cluster = "cluster_" + demand
    arcpy.AddJoin_management(demand_lyr, "OBJECTID", out_cluster, "SOURCE_ID")
    cluster_expression = [x.name for x in arcpy.ListFields(demand_lyr) if "CLUSTER" in x.name][0] + " = -1"
    arcpy.SelectLayerByAttribute_management(demand_lyr, "NEW_SELECTION", cluster_expression)
    mean_centers = "meancenter_" + demand
    mean_centers_lyr = mean_centers + "_lyr"
    arcpy.MakeFeatureLayer_management(mean_centers, mean_centers_lyr, arcpy.AddFieldDelimiters(mean_centers, "CLUSTER_ID") + " >= 0") # clustered stores

    loc_alloc = arcpy.nax.LocationAllocation(nd_layer)
    loc_alloc.accumulateAttributeNames = ['miles']
    loc_alloc.travelMode = travel_mode
    loc_alloc.travelDirection = arcpy.nax.TravelDirection.FromFacility
    loc_alloc.problemType = arcpy.nax.LocationAllocationProblemType.MaximizeCapacitatedCoverage
    loc_alloc.facilityCount = site_count
    loc_alloc.defaultImpedanceCutoff = search_cutoff
    loc_alloc.lineShapeType = arcpy.nax.LineShapeType.StraightLine
    loc_alloc.distanceUnits = arcpy.nax.DistanceUnits.Miles
    loc_alloc.timeUnits = arcpy.nax.TimeUnits.Minutes

    field_mappings_facilities = loc_alloc.fieldMappings(arcpy.nax.LocationAllocationInputDataType.Facilities)
    field_mappings_facilities["Name"].mappedFieldName = "scc_number"
    field_mappings_facilities["Capacity"].mappedFieldName = capacity_column
    loc_alloc.load(arcpy.nax.LocationAllocationInputDataType.Facilities, scc_layer, field_mappings_facilities, append=False)

    #load non clustered stores
    field_mappings_demand = loc_alloc.fieldMappings(arcpy.nax.LocationAllocationInputDataType.DemandPoints, use_location_fields=True)
    field_mappings_demand["Name"].mappedFieldName = demand + ".store_number"
    field_mappings_demand["Weight"].mappedFieldName = demand + ".avg_weekly_net_weight_dough"
    for field in ['SourceID', 'SourceOID', 'PosAlong', 'SideOfEdge', 'CurbApproach', 'TimeCutoff', 'DistanceCutoff', 'Bearing', 'BearingTol', 'NavLatency', 'Cutoff']:
        field_mappings_demand[field].mappedFieldName = demand + ".{}".format(field)
    loc_alloc.load(arcpy.nax.LocationAllocationInputDataType.DemandPoints, demand_lyr, field_mappings_demand, append=False)

    #load clustered stores
    loc_alloc.setLocateSettingsOverrides(arcpy.nax.LocationAllocationInputDataType.DemandPoints, allow_auto_relocate=True, search_tolerance=16000)
    field_mappings_clusters = loc_alloc.fieldMappings(arcpy.nax.LocationAllocationInputDataType.DemandPoints, use_location_fields=True)
    field_mappings_clusters["Name"].mappedFieldName = "CLUSTER_ID"
    field_mappings_clusters["Weight"].mappedFieldName = "SUM_" + demand + "_avg_weekly_net_weight_dough"
    for field in ['SourceID', 'SourceOID', 'PosAlong', 'SideOfEdge', 'CurbApproach', 'TimeCutoff', 'DistanceCutoff', 'Bearing', 'BearingTol', 'NavLatency', 'Cutoff']:
        field_mappings_clusters[field].mappedFieldName = demand + ".{}".format(field)
    loc_alloc.load(arcpy.nax.LocationAllocationInputDataType.DemandPoints, mean_centers_lyr, field_mappings_clusters)
    
    #load_line_barriers
    loc_alloc.load(arcpy.nax.LocationAllocationInputDataType.LineBarriers, line_barriers)

    result = loc_alloc.solve()     

    if result.solveSucceeded:
        arcpy.AddMessage("{} complete".format(demand))
        result.export(arcpy.nax.LocationAllocationOutputDataType.Lines, os.path.join(geodatabase, "line_results_{}".format(demand)))
        line_fields = result.fieldNames(arcpy.nax.LocationAllocationOutputDataType.Lines)
        line_data = [row for row in result.searchCursor(arcpy.nax.LocationAllocationOutputDataType.Lines, line_fields)]
        lines_df = pd.DataFrame(line_data, columns=line_fields)

        demand_df = pd.DataFrame.spatial.from_featureclass(demand)
        clusters_df = pd.DataFrame.spatial.from_featureclass(out_cluster)
        store_assignments_df = pd.merge(demand_df[['OBJECTID', 'store_number', 'avg_weekly_net_weight_dough', 'build_year']], 
                    clusters_df[['SOURCE_ID', 'CLUSTER_ID']], 
                    left_on='OBJECTID', 
                    right_on='SOURCE_ID', 
                    how='left')
        
        store_assignments_df['DEMAND_ID'] = np.where(store_assignments_df['CLUSTER_ID'] == -1, 
                                                        store_assignments_df['store_number'], 
                                                        store_assignments_df['CLUSTER_ID']).astype(int)

        lines_df[['ORIGIN', 'DESTINATION']] = lines_df["Name"].str.split(" - ", expand=True).astype(int)
        store_assignments_df = pd.merge(store_assignments_df[['store_number', 'avg_weekly_net_weight_dough', 'build_year',  'DEMAND_ID']],
                                        lines_df[['ORIGIN', 'DESTINATION']],
                                        how='left',
                                        left_on='DEMAND_ID',
                                        right_on='DESTINATION')


        #result_dict[demand]['lines'] = lines_df
        print(result.solverMessages(arcpy.nax.MessageSeverity.All))

       
    else:
        arcpy.AddMessage("Solve failed")
        arcpy.AddMessage(result.solverMessages(arcpy.nax.MessageSeverity.All))
        store_assignments_df =  pd.DataFrame(columns= ['store_number', 'avg_weekly_net_weight_dough', 'build_year',  'DEMAND_ID', 'ORIGIN', 'DESTINATION'])
        #unchosen_facilities = []
    arcpy.management.RemoveJoin(demand_lyr)
    arcpy.Delete_management(demand_lyr)
    arcpy.Delete_management(mean_centers_lyr)
    del result, loc_alloc, field_mappings_facilities, field_mappings_demand, field_mappings_clusters

    return store_assignments_df

def optimize_network(network_dataset, workspace, scc_locations, future_demand_list, line_barriers='line_barriers', search_cutoff_miles=800, site_count=20):
    arcpy.env.workspace = workspace
    nd_layer_name = "Routing_ND"
    scc_lyr = 'scc_lyr'
    line_lyr = "line_lyr"
    # demand_lyr = 'demand_lyr'
    arcpy.MakeFeatureLayer_management(scc_locations, scc_lyr)
    arcpy.MakeFeatureLayer_management(line_barriers, line_lyr)
    arcpy.nax.MakeNetworkDatasetLayer(network_dataset, nd_layer_name)
    nd_travel_modes = arcpy.nax.GetTravelModes(nd_layer_name)
    travel_mode = arcpy.nax.TravelMode(nd_travel_modes['Truck Distance'])
    params = travel_mode.attributeParameters
    params[('Driving a Truck', 'Restriction Usage')] = 5
    travel_mode.attributeParameters = params
    travel_mode.useHierarchy = True

    #future_demand_list = arcpy.ListFeatureClasses("Stores*", feature_type='Point', feature_dataset="Store_Data")
    arcpy.AddMessage("beginning model run for {} years...".format(len(future_demand_list)))

    result_dict = {}

    for demand in future_demand_list:
        result_dict[demand] = location_allocation_tool(nd_layer=nd_layer_name,
                                                       travel_mode=travel_mode,
                                                       geodatabase=workspace,
                                                       demand=demand,
                                                       scc_layer=scc_lyr,
                                                       line_barriers=line_lyr,
                                                       search_cutoff=search_cutoff_miles,
                                                       site_count=site_count)
        #result_dict[demand] = solve_result
    arcpy.Delete_management(nd_layer_name)
    arcpy.Delete_management(scc_lyr)
    arcpy.Delete_management(line_lyr)   
    return result_dict

And here is a simplified version if that's too much (just an example of my workflow):

## simplified code that doesnt run, just shows a pared down workflow
arcpy.env.workspace = workspace
nd_layer_name = "Routing_ND"
scc_lyr = 'scc_lyr'
line_lyr = "line_lyr"
# demand_lyr = 'demand_lyr'
arcpy.MakeFeatureLayer_management(scc_locations, scc_lyr)
arcpy.MakeFeatureLayer_management(line_barriers, line_lyr)
arcpy.nax.MakeNetworkDatasetLayer(network_dataset, nd_layer_name)
nd_travel_modes = arcpy.nax.GetTravelModes(nd_layer_name)
travel_mode = arcpy.nax.TravelMode(nd_travel_modes['Truck Distance'])
params = travel_mode.attributeParameters
params[('Driving a Truck', 'Restriction Usage')] = 5
travel_mode.attributeParameters = params
travel_mode.useHierarchy = True

result_dict = {}

# loop through multiple years of demand
for demand in future_demand_list:
    
    arcpy.AddMessage("begin optimization for {}...".format(demand))
    arcpy.MakeFeatureLayer_management(demand, demand_lyr)

    loc_alloc = arcpy.nax.LocationAllocation(nd_layer_name)
    loc_alloc.accumulateAttributeNames = ['miles']
    loc_alloc.travelMode = travel_mode
    loc_alloc.travelDirection = arcpy.nax.TravelDirection.FromFacility
    loc_alloc.problemType = arcpy.nax.LocationAllocationProblemType.MaximizeCapacitatedCoverage
    loc_alloc.facilityCount = 20
    loc_alloc.defaultImpedanceCutoff = 800
    loc_alloc.lineShapeType = arcpy.nax.LineShapeType.StraightLine
    loc_alloc.distanceUnits = arcpy.nax.DistanceUnits.Miles
    loc_alloc.timeUnits = arcpy.nax.TimeUnits.Minutes

    field_mappings_facilities = loc_alloc.fieldMappings(arcpy.nax.LocationAllocationInputDataType.Facilities)
    field_mappings_facilities["Name"].mappedFieldName = "scc_number"
    field_mappings_facilities["Capacity"].mappedFieldName = "capacity_column"
    loc_alloc.load(arcpy.nax.LocationAllocationInputDataType.Facilities, scc_layer, field_mappings_facilities, append=False)

    field_mappings_demand = loc_alloc.fieldMappings(arcpy.nax.LocationAllocationInputDataType.DemandPoints, use_location_fields=True)
    field_mappings_demand["Name"].mappedFieldName = demand + ".store_number"
    field_mappings_demand["Weight"].mappedFieldName = demand + ".avg_weekly_net_weight_dough"
    for field in ['SourceID', 'SourceOID', 'PosAlong', 'SideOfEdge', 'CurbApproach', 'TimeCutoff', 'DistanceCutoff', 'Bearing', 'BearingTol', 'NavLatency', 'Cutoff']:
        field_mappings_demand[field].mappedFieldName = demand + ".{}".format(field)
    loc_alloc.load(arcpy.nax.LocationAllocationInputDataType.DemandPoints, demand_lyr, field_mappings_demand, append=False)


    #load_line_barriers
    loc_alloc.load(arcpy.nax.LocationAllocationInputDataType.LineBarriers, line_barriers)

    result = loc_alloc.solve()     

    if result.solveSucceeded:
        arcpy.AddMessage("{} complete".format(demand))
        result.export(arcpy.nax.LocationAllocationOutputDataType.Lines, os.path.join(geodatabase, "line_results_{}".format(demand)))
        demand_df = pd.DataFrame.spatial.from_featureclass(demand)
        result_dict[demand] = demand_df
        print(result.solverMessages(arcpy.nax.MessageSeverity.All))

       
    else:
        arcpy.AddMessage("Solve failed")
        arcpy.AddMessage(result.solverMessages(arcpy.nax.MessageSeverity.All))
        store_assignments_df =  pd.DataFrame(columns= ['store_number', 'avg_weekly_net_weight_dough', 'build_year',  'DEMAND_ID', 'ORIGIN', 'DESTINATION'])
        #unchosen_facilities = []
    arcpy.management.RemoveJoin(demand_lyr)
    arcpy.Delete_management(demand_lyr)
    del result, loc_alloc, field_mappings_facilities, field_mappings_demand

MelindaMorang · ‎09-25-2023

I just tried to reproduce the progressive slowdown using tutorial data and a very simplified workflow, and the slowdown didn't happen. So, let's try to dig into your more complex workflow a little deeper.

First thought: Is it possible that your machine is progressively running out of memory as you store more and more stuff in that results dictionary? I don't know how large your store_assignments_df spatial dataframe is with each iteration, but if it's really big, maybe your machine is just using more and more memory. If you run your code but don't return and store the results, does it still progressively slow down?

If that doesn't help, would it be possible for you to share your data and full tool code with me along with specific instructions on how to run it? We can definitely try to debug the problem on our end, but since my attempt at a simplified script didn't reproduce the problem, I think we'd need to have all your stuff. If you don't want to share it publicly, you can send me a private message through Esri Community (and please leave a comment here letting me know you did so, since the Esri Community platform doesn't send notifications when you get a private message).

FrankCardone1 · ‎09-25-2023

Hi, thanks for your help! I'm fairly confident that it is a memory issue. I think my first course of action will be to remove that piece and just write results to disk (or not even keep them as a test) like you suggest. If that doesnt work, i'll reach out to you via PM and let you know here as well.

FrankCardone1 · ‎09-26-2023

removing the dictionary didnt speed things up. It still took almost 8 hours to run. I direct messaged you some code and sample data.

Thanks,

Frank

MelindaMorang · ‎09-26-2023

Okay, sorry to hear it wasn't that simple.

Before I spend a ton of time running your code, have you tried running it without creating those output spatial dataframes at all? Because dataframes are stored in memory, my suspicion is that the problem is something to do with that. If removing that piece eliminates the problem, we can exclude Network Analyst as the culprit and focus on the spatial dataframes to see why they are eating/leaking memory.

FrankCardone1 · ‎09-26-2023

ok i removed the spatial data frames. in the code, i replaced most of the code under the "if result.solveSucceeded" condition. now it looks like this

    if result.solveSucceeded:
        result.export(arcpy.nax.LocationAllocationOutputDataType.Lines, os.path.join(geodatabase, "line_results_{}".format(demand)))
        result.export(arcpy.nax.LocationAllocationOutputDataType.DemandPoints, os.path.join(geodatabase, "store_results_{}".format(demand)))

I also removed the return value, so the function only exports the NA result layers. To speed things up, i only included two years in the loop. the first iteration took 5 minutes as expected, but the second iteration still took about 45 minutes.

Also for context, I am running this on a brand new developer laptop with 64GB RAM.

MelindaMorang · ‎09-26-2023

Okay, I'll give it a try on my end.

MelindaMorang · ‎09-26-2023

So far not reproducible for me. Could you please tell me the following information:

What version of ArcGIS Pro are you using?
What type of network dataset are you using? (file geodatabase, mobile geodatabase/MMPK, Streetmap Premium (what type and vintage), etc.)

FrankCardone1 · ‎09-26-2023

1.ArcGIS Pro 3.1.3 (same issues with all versions of 3.x, as i have had this issue for months now)

2. My network dataset is in a file geodatabase. The underlying data is from HERE and i used it to build the network myself. maybe thats the difference, as i assume you're using SMP? but why would the first iteration run quickly?

MelindaMorang · ‎09-26-2023

Thanks. For the network, I was mostly interested to understand whether it was a mobile gdb or a licensed, compressed fgdb (sounds like it is not) since those can have different behavior in some circumstances.

One further question (sorry!): How are you running this script when you experience the slowdown? Are you running it in standalone Python, in Pro via a custom script tool or python toolbox tool, in a notebook, or some other way? I wonder if the problem only occurs when run in a certain way, indicating an issue with memory clean-up in whatever the running mechanism is.