Remove prefixes

602
6
Jump to solution
01-08-2024 08:57 AM
2Quiker
Occasional Contributor II

I am trying to remove prefixes from a table. I have the following code but it removes more than just street prefix. I my code I have E "space", "E ". I need to be able to strip the prefixes so I can sort, then delete duplicates. How can I remove just the prefixes?

with arcpy.da.UpdateCursor(table1,'STREET') as cursor:  
    for row in cursor:  
        #print row[0]  
        if row[0].startswith("S "):  
            #print "Deleting S"  
            row [0] = row[0].lstrip('S ')
        elif row[0].startswith("E "):  
            #print "Deleting E"  
            row [0] = row[0].lstrip('E ')
        elif row[0].startswith("W "):  
            #print "Deleting W"  
            row [0] = row[0].lstrip('W ')
        elif row[0].startswith("N "):  
            #print "Deleting N"  
            row [0] = row[0].lstrip('N ')
        cursor.updateRow(row) 
del cursor

 After code runs i get.

Before CodeAfter code
E Explorerxplorer
E Expeditionxpedtion
E Exectivexecutive
E Exchangexchange
0 Kudos
2 Solutions

Accepted Solutions
MErikReedAugusta
Occasional Contributor III

lstrip is treating "E " as being ["E", " "] and stripping all instances of those two characters from the left of the string, until it reaches a character other than those two.

Try this:

 

if (row[0].startswith('E ')
    or row[0].startswith('W ')
    or row[0].startswith('N ')
    or row[0].startswith('S ')
   ):
    # print('Deleting Cardinal')
    row[0] = row[0][2:]
    cursor.updateRow(row)  # N.B. if you indent this INSIDE the if statement,
                           # then it won't update any value it doesn't have
                           # to which can be handy if you track edits, since
                           # you didn't technically have anything to change
                           # on the ones without a cardinal direction

### The ones below this are optional, if you have full-word cardinals on your prefixes.
elif (row[0].startswith('East ')
      or row[0].startswith('West ')
     ):
    # print('Deleting Cardinal')
    row[0] = row[0][4:]
elif (row[0].startswith('North ')
      or row[0].startswith('South ')
     ):
    # print('Deleting Cardinal')
    row[0] = row[0][5:]

 

Since you're always looking for a 2-character* substring at the beginning, you know if you found it that you only need the remainder of the substring after it.  [2:] tells it to go to index 2 (the character after your space), and then just give you the rest of the string from there.

* The elif statements at line 15 & line 20 look for 4- and 5-character substrings, so the indices at lines 19 & 24 also change—assuming you need them.

 

EDIT: Unrelated sidenote, it is surprisingly a pain to edit code block statements for typos on this forum.  Apologies to anyone who read it before I caught them.

View solution in original post

MErikReedAugusta
Occasional Contributor III

I'm too lazy to fix the typos in the codeblock right now, but it just occurred to me that the indices for the full-word ones are wrong, because I forgot to count the space.

Line 19:

    row[0] = row[0][5:]

Line 24:

    row[0] = row[0][6:]

 

View solution in original post

6 Replies
MErikReedAugusta
Occasional Contributor III

lstrip is treating "E " as being ["E", " "] and stripping all instances of those two characters from the left of the string, until it reaches a character other than those two.

Try this:

 

if (row[0].startswith('E ')
    or row[0].startswith('W ')
    or row[0].startswith('N ')
    or row[0].startswith('S ')
   ):
    # print('Deleting Cardinal')
    row[0] = row[0][2:]
    cursor.updateRow(row)  # N.B. if you indent this INSIDE the if statement,
                           # then it won't update any value it doesn't have
                           # to which can be handy if you track edits, since
                           # you didn't technically have anything to change
                           # on the ones without a cardinal direction

### The ones below this are optional, if you have full-word cardinals on your prefixes.
elif (row[0].startswith('East ')
      or row[0].startswith('West ')
     ):
    # print('Deleting Cardinal')
    row[0] = row[0][4:]
elif (row[0].startswith('North ')
      or row[0].startswith('South ')
     ):
    # print('Deleting Cardinal')
    row[0] = row[0][5:]

 

Since you're always looking for a 2-character* substring at the beginning, you know if you found it that you only need the remainder of the substring after it.  [2:] tells it to go to index 2 (the character after your space), and then just give you the rest of the string from there.

* The elif statements at line 15 & line 20 look for 4- and 5-character substrings, so the indices at lines 19 & 24 also change—assuming you need them.

 

EDIT: Unrelated sidenote, it is surprisingly a pain to edit code block statements for typos on this forum.  Apologies to anyone who read it before I caught them.

MErikReedAugusta
Occasional Contributor III

I'm too lazy to fix the typos in the codeblock right now, but it just occurred to me that the indices for the full-word ones are wrong, because I forgot to count the space.

Line 19:

    row[0] = row[0][5:]

Line 24:

    row[0] = row[0][6:]

 

2Quiker
Occasional Contributor II

Thanks for the reply. I was coming back to my post to add code that worked for me.

 

try:
    # Update the street names in the table
    with arcpy.da.UpdateCursor(table, "STREET") as cursor:
        for row in cursor:
            street_name = row[0]

            # List of common street prefixes to be removed
            prefixes = ['N ', 'S ', 'E ', 'W ', 'North ', 'South ', 'East ', 'West ']

            # Remove prefixes from the street name
            for prefix in prefixes:
                if street_name.startswith(prefix):
                    row[0] = street_name[len(prefix):].strip()
                    cursor.updateRow(row)
                    break

 

Your code did work tho. Again thanks for the reply!

MErikReedAugusta
Occasional Contributor III

Efficient!  I like it.

Just for fun, I tried to see if I could condense this further.  Here it is!

prefixes = ['N ', 'S ', 'E ', 'W ', 'North ', 'South ', 'East ', 'West ']
with arcpy.da.UpdateCursor(table, 'STREET') as cursor:
    for row in cursor:
        row[0] = min([row[0][len(prefix):] if row[0].startswith(prefix) else row[0] for prefix in prefixes], key=len)
        cursor.updateRow(row)

 

First, a list comprehension creates a list of your street name with every prefix attempted to be removed from it.  Then, it turns out the min() function accepts a key argument.  If you give it the built-in len, it gives you the shortest entry from that list back as a single item.

Since, by definition, every item in our generated list is either the original street name or that street name minus a prefix, the shortest possible will always* be the street name minus any applicable prefix.

*An important caveat here: "N South 5th Street" would return as "South 5th Street", which may or may not be desired.  But then, that's a problem of all of the code in this thread.

 

Also, just to be cheeky, here it is even more condensed.  I saved a whole 3 lines!  But I haven't tested it.  And  while I think the logic and syntax are all sound, it's the height of absurdity, anyway.  Please don't do this to whoever has to read your code behind you.  😛

 

with arcpy.da.UpdateCursor(table, 'STREET') as cursor:
    [cursor.updateRow(row) for [min([row[0][len(prefix):] if row[0].startswith(prefix) else row[0] for prefix in ['N ', 'S ', 'E ', 'W ', 'North ', 'South ', 'East ', 'West ']], key=len)] in cursor]

 

JoshuaBixby
MVP Esteemed Contributor

As fun as code golf can be, list comprehensions were proposed and accepted "to create lists" (PEP 202 – List Comprehensions | peps.python.org).  I think many would argue that using a list comprehension to perform a mapping function isn't very Pythonic.

JoshuaBixby
MVP Esteemed Contributor

Another approach would be to use regular expressions.  Although regular expressions might be a bit overkill for this specific situation, they are much more flexible to handle more complex situations:

with arcpy.da.UpdateCursor(table1,'STREET') as cursor:  
    for row in cursor:  
        row[0] = re.sub(r"^((?:N|S|E|W|North|South|East|West) )+", "", row[0])
        cursor.updateRow(row) 

del cursor

The above will result in "N South 5th Street" becoming "5th Street".  If you don't want double prefixes to be removed, just take the "+" out of the regular expression pattern.