Python multiprocessing module in toolbox

LewisTrotter · ‎10-18-2023

Hello all,

I am trying to get some basic code that uses the multiprocessing module to work in a Pro 3.1 toolbox but it just won't play nice. The code:

import time
import arcpy
import multiprocessing as mp

def work_log(work_data):
    arcpy.AddMessage(" Process %s waiting %s seconds" % (work_data[0], work_data[1]))
    time.sleep(int(work_data[1]))
    arcpy.AddMessage(" Process %s Finished." % work_data[0])

def pool_handler(work):
    p = mp.Pool(2)
    p.map(work_log, work)

if __name__ == '__main__':
    work = (["A", 5], ["B", 2], ["C", 1], ["D", 3])
    pool_handler(work)

If I run this in an .atnx or .pyt I get the same error:

PicklingError: Can't pickle <function work_log at 0x00000134B287D430>: attribute lookup work_log on __main__ failed

I also get 2 new instances of Pro open automatically if I run in a .pyt toolbox, I assume due to lack of __main__.

Does anyone have any advice on how to get something like this running in a Pro toolbox (preferably .pyt)?

I came across a similar question here https://community.esri.com/t5/python-questions/using-multiprocessing/m-p/1280108 but it has never been answered.

Thanks!

DavidSolari · ‎10-18-2023

The multiprocessing module has a lot of caveats due to how Pro manages its Python environment. This link has a lot advice and some sample scripts, this might be enough to get you started.

StaticK · ‎11-05-2023

Not sure if it's possible within a pyt due to the nature of the geprocessing environment loading the pyt classes, the __main__ requirement. As a script tool- this would be the script named MultiProc.py and referenced (not embedded) in the script tool. The first Parameter is a value table, sting int.

The messages wont display from the thread until the threads are done so you can put them in the result dictionary to print them all.

import time
import arcpy
import multiprocessing as mp
import os
import sys

mp.set_executable(os.path.join(sys.exec_prefix, 'pythonw.exe'))

def work_log(work_data):
    res_dict = {'process': f"Process {work_data[0]} waiting {work_data[1]} seconds", 'result': ''}
    time.sleep(work_data[1])
    res_dict['result'] = f"Process {work_data[0]} Finished."

    return res_dict

if __name__ == "__main__":
    import Multiproc

    work_table = arcpy.GetParameterAsText(0)

    # Create a value table with 2 columns
    value_table = arcpy.ValueTable(2)

    # Set the values of the table with the contents of the first argument
    if work_table:
        value_table.loadFromString(arcpy.GetParameterAsText(0))
    else:
        for i in [["A", 5], ["B", 2], ["C", 1], ["D", 3]]:
            value_table.addRow(i)

    pairs = []
    # Loop through the list of inputs
    for i in range(0, value_table.rowCount):
        pairs.append([value_table.getValue(i, 0), int(value_table.getValue(i, 1))])

    arcpy.AddMessage(pairs)

    with mp.Pool(2) as pool:
        jobs = [pool.apply_async(Multiproc.work_log, (pair, )) for pair in pairs]
        res = [j.get() for j in jobs]

        for r in res:
            arcpy.AddMessage(f"{r['process']} : {r['result']}")

LewisTrotter · ‎01-09-2024

Thanks StaticK. It is a shame ESRI haven't streamlined some of this via arcpy - especially considering how powerful it'd be in an arcpy toolbox.