Remove prefixes

2Quiker · ‎01-08-2024

I am trying to remove prefixes from a table. I have the following code but it removes more than just street prefix. I my code I have E "space", "E ". I need to be able to strip the prefixes so I can sort, then delete duplicates. How can I remove just the prefixes?

with arcpy.da.UpdateCursor(table1,'STREET') as cursor:  
    for row in cursor:  
        #print row[0]  
        if row[0].startswith("S "):  
            #print "Deleting S"  
            row [0] = row[0].lstrip('S ')
        elif row[0].startswith("E "):  
            #print "Deleting E"  
            row [0] = row[0].lstrip('E ')
        elif row[0].startswith("W "):  
            #print "Deleting W"  
            row [0] = row[0].lstrip('W ')
        elif row[0].startswith("N "):  
            #print "Deleting N"  
            row [0] = row[0].lstrip('N ')
        cursor.updateRow(row) 
del cursor

After code runs i get.

Before Code	After code
E Explorer	xplorer
E Expedition	xpedtion
E Exective	xecutive
E Exchange	xchange

MErikReedAugusta · ‎01-08-2024

lstrip is treating "E " as being ["E", " "] and stripping all instances of those two characters from the left of the string, until it reaches a character other than those two.

Try this:

if (row[0].startswith('E ')
    or row[0].startswith('W ')
    or row[0].startswith('N ')
    or row[0].startswith('S ')
   ):
    # print('Deleting Cardinal')
    row[0] = row[0][2:]
    cursor.updateRow(row)  # N.B. if you indent this INSIDE the if statement,
                           # then it won't update any value it doesn't have
                           # to which can be handy if you track edits, since
                           # you didn't technically have anything to change
                           # on the ones without a cardinal direction

### The ones below this are optional, if you have full-word cardinals on your prefixes.
elif (row[0].startswith('East ')
      or row[0].startswith('West ')
     ):
    # print('Deleting Cardinal')
    row[0] = row[0][4:]
elif (row[0].startswith('North ')
      or row[0].startswith('South ')
     ):
    # print('Deleting Cardinal')
    row[0] = row[0][5:]

Since you're always looking for a 2-character* substring at the beginning, you know if you found it that you only need the remainder of the substring after it. [2:] tells it to go to index 2 (the character after your space), and then just give you the rest of the string from there.

* The elif statements at line 15 & line 20 look for 4- and 5-character substrings, so the indices at lines 19 & 24 also change—assuming you need them.

EDIT: Unrelated sidenote, it is surprisingly a pain to edit code block statements for typos on this forum. Apologies to anyone who read it before I caught them.

View solution in original post

MErikReedAugusta · ‎01-08-2024

I'm too lazy to fix the typos in the codeblock right now, but it just occurred to me that the indices for the full-word ones are wrong, because I forgot to count the space.

Line 19:

    row[0] = row[0][5:]

Line 24:

    row[0] = row[0][6:]

View solution in original post

MErikReedAugusta · ‎01-08-2024

lstrip is treating "E " as being ["E", " "] and stripping all instances of those two characters from the left of the string, until it reaches a character other than those two.

Try this:

if (row[0].startswith('E ')
    or row[0].startswith('W ')
    or row[0].startswith('N ')
    or row[0].startswith('S ')
   ):
    # print('Deleting Cardinal')
    row[0] = row[0][2:]
    cursor.updateRow(row)  # N.B. if you indent this INSIDE the if statement,
                           # then it won't update any value it doesn't have
                           # to which can be handy if you track edits, since
                           # you didn't technically have anything to change
                           # on the ones without a cardinal direction

### The ones below this are optional, if you have full-word cardinals on your prefixes.
elif (row[0].startswith('East ')
      or row[0].startswith('West ')
     ):
    # print('Deleting Cardinal')
    row[0] = row[0][4:]
elif (row[0].startswith('North ')
      or row[0].startswith('South ')
     ):
    # print('Deleting Cardinal')
    row[0] = row[0][5:]

Since you're always looking for a 2-character* substring at the beginning, you know if you found it that you only need the remainder of the substring after it. [2:] tells it to go to index 2 (the character after your space), and then just give you the rest of the string from there.

* The elif statements at line 15 & line 20 look for 4- and 5-character substrings, so the indices at lines 19 & 24 also change—assuming you need them.

EDIT: Unrelated sidenote, it is surprisingly a pain to edit code block statements for typos on this forum. Apologies to anyone who read it before I caught them.

MErikReedAugusta · ‎01-08-2024

I'm too lazy to fix the typos in the codeblock right now, but it just occurred to me that the indices for the full-word ones are wrong, because I forgot to count the space.

Line 19:

    row[0] = row[0][5:]

Line 24:

    row[0] = row[0][6:]

2Quiker · ‎01-08-2024

Thanks for the reply. I was coming back to my post to add code that worked for me.

try:
    # Update the street names in the table
    with arcpy.da.UpdateCursor(table, "STREET") as cursor:
        for row in cursor:
            street_name = row[0]

            # List of common street prefixes to be removed
            prefixes = ['N ', 'S ', 'E ', 'W ', 'North ', 'South ', 'East ', 'West ']

            # Remove prefixes from the street name
            for prefix in prefixes:
                if street_name.startswith(prefix):
                    row[0] = street_name[len(prefix):].strip()
                    cursor.updateRow(row)
                    break

Your code did work tho. Again thanks for the reply!

MErikReedAugusta · ‎01-08-2024

Efficient! I like it.

Just for fun, I tried to see if I could condense this further. Here it is!

prefixes = ['N ', 'S ', 'E ', 'W ', 'North ', 'South ', 'East ', 'West ']
with arcpy.da.UpdateCursor(table, 'STREET') as cursor:
    for row in cursor:
        row[0] = min([row[0][len(prefix):] if row[0].startswith(prefix) else row[0] for prefix in prefixes], key=len)
        cursor.updateRow(row)

First, a list comprehension creates a list of your street name with every prefix attempted to be removed from it. Then, it turns out the min() function accepts a key argument. If you give it the built-in len, it gives you the shortest entry from that list back as a single item.

Since, by definition, every item in our generated list is either the original street name or that street name minus a prefix, the shortest possible will always* be the street name minus any applicable prefix.

*An important caveat here: "N South 5th Street" would return as "South 5th Street", which may or may not be desired. But then, that's a problem of all of the code in this thread.

Also, just to be cheeky, here it is even more condensed. I saved a whole 3 lines! But I haven't tested it. And while I think the logic and syntax are all sound, it's the height of absurdity, anyway. Please don't do this to whoever has to read your code behind you. 😛

with arcpy.da.UpdateCursor(table, 'STREET') as cursor:
    [cursor.updateRow(row) for [min([row[0][len(prefix):] if row[0].startswith(prefix) else row[0] for prefix in ['N ', 'S ', 'E ', 'W ', 'North ', 'South ', 'East ', 'West ']], key=len)] in cursor]

JoshuaBixby · ‎01-08-2024

As fun as code golf can be, list comprehensions were proposed and accepted "to create lists" (PEP 202 – List Comprehensions | peps.python.org). I think many would argue that using a list comprehension to perform a mapping function isn't very Pythonic.

JoshuaBixby · ‎01-08-2024

Another approach would be to use regular expressions. Although regular expressions might be a bit overkill for this specific situation, they are much more flexible to handle more complex situations:

with arcpy.da.UpdateCursor(table1,'STREET') as cursor:  
    for row in cursor:  
        row[0] = re.sub(r"^((?:N|S|E|W|North|South|East|West) )+", "", row[0])
        cursor.updateRow(row) 

del cursor

The above will result in "N South 5th Street" becoming "5th Street". If you don't want double prefixes to be removed, just take the "+" out of the regular expression pattern.