Split comma separated values to multiple rows

995
21
Jump to solution
03-15-2024 05:59 AM
Ed_
by MVP Regular Contributor
MVP Regular Contributor

Based on the sample data below I am trying to split each comma separated value in `MUNICIPALI` column to a new as shown in desired output. However, I am getting an error that `Projects_City` cannot open, it does not exist, but then I thought line 6 and onwards will create it. How can I fix this issue? On a side note, the comma separated values are a lot more than the sample data. For, example, one of them rows have 18 or something comma separated values so, will line 10 work for that? 

Code:

 

 

 

 

import arcpy
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
out_df = r"O:\MyProject.gdb\Projects_City_Updated" # Doesn't exist, expect it to be created in line 6
fldlst = ["OBJECTID", "PRJID", "MUNICIPALI"]
with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for attrbPart in row[3].split(","):
                insertCursor.insertRow(( row[0], row[1], row[2], attrbPart))

 

 

 

 

 

Error:

 

 

 

 

Traceback (most recent call last):
  File "o:\path\SplitCommas2Row.py", line 6, in <module>
    with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
RuntimeError: cannot open 'O:\MyProject.gdb\Projects_City_Updated'

 

 

 

 

 

Update

After creating the table, please see updated code below, I am getting the following error, how can I fix it?

 

Code:

 

 

 

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI", "ZIPCODE"]

with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for attrbPart in row[3].split(","):
                insertCursor.insertRow(( row[0], row[1], row[2], attrbPart))

 

 

 

 

Error:

 

 

 

Traceback (most recent call last):
  File "o:\SplitCommas2Row.py", line 22, in <module>
    for attrbPart in row[3].split(","):
IndexError: tuple index out of range

 

 

 

Sample Data:

OBJECTIDPRJTIDMUNICIPALIZIPCODE
11A1
22A, B1,2
33A, B, C1,2,3
44A, B, C, D1,2,3,4
55A1
66C3

 

Desired output:

OBJECTIDPRJTIDMUNICIPALIZIPCODE
11A1
22A1
32B2
43A1
53B2
63C3
74A1
84B2
94C3
104D4
115A1
126C3
Question | Analyze | Visualize
Tags (1)
0 Kudos
2 Solutions

Accepted Solutions
MErikReedAugusta
Occasional Contributor III

Okay, so a few things:

  1. As a general rule, don't nest Cursors in each other.  The efficiency generally takes a nosedive.
    • I think with an InsertCursor it's less of an issue, but it's still a better practice to separate them.  If this had been an UpdateCursor, you would have run through the entire table once for every row on the table.  (Meaning for your sample of 6 records, it would've made 36 passes through the table)
    • I would first run that SearchCursor and write the necessary data to a dictionary or something similar, and then run the InsertCursor to do your operations.
  2. In line 22, you're assuming your tuple has 4 entries.  In line 23, you reference 3.  Your field list and your sample data indicate you only have 2 entries to work with in the tuple.
    • SearchCursor returns a tuple of the same length as your field list parameter, which lists the values in the same order as those fields they came from.
  3. In your sample data, you have items delimited by comma and space (", "), but in line 22, you assume they're delimited by comma only (",").  I'm assuming this is just a formatting thing in how you typed up the post here, but be careful of differences like that.  Line 27 of my code below assumes there are spaces after the commas.

 

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI"]

munis = {}

with arcpy.da.SearchCursor(in_df, fldlst) as sc:
    for prj, muni in sc:
        # NOTE: This assumes the values in PRJTID are unique.  If any repeat
        #  then you'll need to check the munis dictionary to see if you've
        #  seen this particular prjtid before, and append your muni results 
        #  to the existing record, if you have one.
        munis[prj] = muni.split(', ')

with arcpy.da.InsertCursor(out_df, fldlst) as ic:
    for prj in munis:
        for muni in munis[prj]:
            ic.insertRow((prj, muni))

 

 

View solution in original post

JoshuaBixby
MVP Esteemed Contributor

@Ed_, the updated code was close to working, you just need to use Python zip() function and change 2 lines:

 

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI", "ZIPCODE"]

with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for z in zip(*(map(str.strip, str.split(i, ",")) for i in row[1:])):
                insertCursor.insertRow(row[0:1] + z)

 

View solution in original post

21 Replies
BobBooth1
Esri Contributor

Try creating the table first and see if it works.

0 Kudos
Ed_
by MVP Regular Contributor
MVP Regular Contributor

@BobBooth1 thank you for the quick response, after creating the table, I am getting a tuple error now, please see the updated post, thank you.

Question | Analyze | Visualize
0 Kudos
BobBooth1
Esri Contributor

I think you want "row[2]" here:

for attrbPart in row[3].split(","):

 

MErikReedAugusta
Occasional Contributor III

Okay, so a few things:

  1. As a general rule, don't nest Cursors in each other.  The efficiency generally takes a nosedive.
    • I think with an InsertCursor it's less of an issue, but it's still a better practice to separate them.  If this had been an UpdateCursor, you would have run through the entire table once for every row on the table.  (Meaning for your sample of 6 records, it would've made 36 passes through the table)
    • I would first run that SearchCursor and write the necessary data to a dictionary or something similar, and then run the InsertCursor to do your operations.
  2. In line 22, you're assuming your tuple has 4 entries.  In line 23, you reference 3.  Your field list and your sample data indicate you only have 2 entries to work with in the tuple.
    • SearchCursor returns a tuple of the same length as your field list parameter, which lists the values in the same order as those fields they came from.
  3. In your sample data, you have items delimited by comma and space (", "), but in line 22, you assume they're delimited by comma only (",").  I'm assuming this is just a formatting thing in how you typed up the post here, but be careful of differences like that.  Line 27 of my code below assumes there are spaces after the commas.

 

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI"]

munis = {}

with arcpy.da.SearchCursor(in_df, fldlst) as sc:
    for prj, muni in sc:
        # NOTE: This assumes the values in PRJTID are unique.  If any repeat
        #  then you'll need to check the munis dictionary to see if you've
        #  seen this particular prjtid before, and append your muni results 
        #  to the existing record, if you have one.
        munis[prj] = muni.split(', ')

with arcpy.da.InsertCursor(out_df, fldlst) as ic:
    for prj in munis:
        for muni in munis[prj]:
            ic.insertRow((prj, muni))

 

 

Ed_
by MVP Regular Contributor
MVP Regular Contributor

Worked like charm, thank you so much 😊

Question | Analyze | Visualize
0 Kudos
Ed_
by MVP Regular Contributor
MVP Regular Contributor

@MErikReedAugusta can you please look at the modified sample data. It seems like I would also have to split the `ZIPCODE` column as well. How can I split both `MUNICIPALI` and `ZIPCODE` at the same time? Thank you

Question | Analyze | Visualize
0 Kudos
MErikReedAugusta
Occasional Contributor III

That depends on how those two fields should interact & combine.

Your sample data seems to be missing some potential combinations for PRJ 2, for example:

PRJMUNZIP
2A, B1, 2

 

I would expect the following:

PRJMUNZIP
2A1
2B1
2A2
2B2

(Your sample data is missing the second & fourth rows.)

munis = {}
# Search Cursor here; Fields [PRJTID, MUNICPALI, ZIPCODE]:
for prj, mun, zip in cursor:
    munvals = mun.split(', ')
    zipvals = zip.split(', ')

    joinvals = []

    for munval in munvals:
        for zipval in zipvals:
            joinval = f'{munval}|{zipval}'
            joinvals.append(joinval)
            del joinval

    munis[prj] = joinvals
    del joinvals, munvals, zipvals

 

Alternately,

I notice your sample data also has the same number of items for MUN & ZIP, though.  While it's not going to always be true in the real world, if your sample data guarantees a 1:1 match of municipality to ZIP, then you could do the merging that way, before you run the split.  If that's the case, you can replace the for loops at Line 9-13 with the snippet below.

Note: Because this assumes a 1:1 match, if you have 5 values in one column and 4 in the other, it'll just ignore that 5th value entirely.

index = 0
while index < len(munvals) and index < len(zipvals):
    joinval = f'{munvals[index]}|{zipvals[index]}'
    joinvals.append(joinval)
    del joinval

 

Ed_
by MVP Regular Contributor
MVP Regular Contributor

@MErikReedAugusta  Thank you so much 😊 For the new snippet I am getting warning message that `cursor` is not defined. Do I need to define `cursor` earlier in the code?

Question | Analyze | Visualize
0 Kudos
Ed_
by MVP Regular Contributor
MVP Regular Contributor

@MErikReedAugusta so I tried adding the cursor using the `with` statement and now I am getting the following error. Also, I don't if indented the `while` loop correctly or not. 

while index < len(munvals) and index < len(zipvals):
NameError: name 'munvals' is not defined

 

Code:

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI", "ZIPCODE"]

munis = {}
# Search Cursor here; Fields [PRJTID, MUNICPALI, ZIPCODE]:
with arcpy.da.SearchCursor(in_df, fldlst) as cursor:
    for prj, mun, zip in cursor:
        munvals = mun.split(',')
        zipvals = zip.split(',')

        joinvals = []

    index = 0
    while index < len(munvals) and index < len(zipvals):
        joinval = f'{munvals[index]}|{zipvals[index]}'
        joinvals.append(joinval)
        del joinval

        munis[prj] = joinvals
        del joinvals, munvals, zipvals
Question | Analyze | Visualize
0 Kudos