20 Replies Latest reply on Nov 6, 2014 5:55 AM by ord5206

    End-of-line (EOL) Problem

    jacobne
      Came across a weird EOL Error with Arcpy's CalculateField_management() function today and I was wondering if anyone out there could point me towards a work around?

      I previewed one of my tables in Catalog and at first glance the offending cells seemed fine.  However, when I copy/pasted one over to a text file a lot more text became visible. My current theory is that the Enter key is being hit when my customers edit cells in MS Excel (my typical source data) and invisible newline characters are being carried into my geodatabase tables.  Tricky thing is I can't see them in Excel or Arc, so I'm not sure how to strip or replace the backslashes via Python.

      Any advice would be much appreciated,
      - Nick
        • Re: End-of-line (EOL) Problem
          mdenil
          Try dumping the table to a text file, if you think there is stuff going on in there you cannot see.

          There is an 're' module in python that handles regular expressions. That is the tool set you want for weeding out pesky newlines ('\n').

          It may be easier to weed them in the table or in the text dump (which could then be re-imported to a table).
          • Re: End-of-line (EOL) Problem
            jacobne
            Try dumping the table to a text file, if you think there is stuff going on in there you cannot see.

            There is an 're' module in python that handles regular expressions. That is the tool set you want for weeding out pesky newlines ('\n').

            It may be easier to weed them in the table or in the text dump (which could then be re-imported to a table).


            Thanks for the quick response!  I copied a cell over to Notepad++ and it turns out each line is proceeded with a carriage return and line feed (CR LF).  Ordinarily, I would just clean-up data by hand, but in this case I need a script to filter this sort of thing.  Have you ever seen this done via field calculator by chance?  I've been experimenting with regular expressions, but no luck so far.  I keep getting "EOL while scanning string literal" related errors.  One of my many attempts below:


            #ESRI Codeblock
            codeblock="""def trimNewline(val):
                import re
                newVal = re.sub('(?m)[\r\n]',"",val)
                return newVal"""
            
            #Expression parameter
            expression = "trimNewline(str(!FIELD!))
            
            # CalculateField_management(in_table,field,expression,{expression_type},{code_block})
            arcpy.CalculateField_management(mytable,"FIELDNAME",expression,"PYTHON",codeblock)
            
            • Re: End-of-line (EOL) Problem
              curtvprice
              There's a nice python string method that trims whitespace around strings:

              >>> x
              '\r\nhere is my real text\n'
              >>> x.strip()
              'here is my real text'


              So one approach you could use would be:

              arcpy.CalculateField_management(mytable,"FIELDNAME","!FIELDNAME!.strip()","PYTHON")


              One could also use VBScript:
              arcpy.CalculateField_management(mytable,"FIELDNAME","Trim([FIELDNAME]")
              • Re: End-of-line (EOL) Problem
                jacobne
                -------
                Update
                -------

                Just a quick update to anyone that may come across this post down the road (and big thank you to mdenil and curtvprice).  The short-term conclusion I've come to is that it's not possible to feed hidden newline characters ('\n') into the Field Calculator.  Also, because carriage returns and newlines are special, some functions can't access them like I had anticipated, such as strip(). 

                Workaround was to convert the problem strings to hexadecimal and swap out values with an Update Cursor. See example below:

                rows = arcpy.UpdateCursor(fc)
                for row in rows:
                    if len(row.NAME) >= 255:
                        hexString = str(row.NAME).encode("hex")
                        if "0a" in hexString: # "0a" is hex equivalent of '\n'
                            hexString = hexString.replace("0a","")
                    row.NAME = hexString.decode("hex")
                    rows.updateRow(row)
                


                Definitely not ideal, but seems reliable so far.  Another interesting thing is that my problem carriage returns magically disappeared somewhere earlier in my script, when converting an Excel worksheet to an in-memory feature class.  This might suggest another arcpy function stripped them out, but not sure which one.
                  • Re: End-of-line (EOL) Problem
                    ord5206

                    This is the solution that worked for me, though I removed the len(row.NAME) >= 255 requirement.

                     

                    Here's the working example:

                     

                    rows = arcpy.UpdateCursor("Assets\Welds")
                    for row in rows:
                     hexString = str(row.REMARKS).encode("hex")
                     if "0a" in hexString:
                     hexString = hexString.replace("0a","")
                     row.REMARKS = hexString.decode("hex")
                         rows.updateRow(row)
                    

                     

                    Replace "Assets\Welds" with the appropriate fc name and replace row.REMARKS with row.(insert field name here)

                     

                    You may or may not need to run:

                    import arcpy
                    import string
                    
                    

                    I just wanted to make this idiotproof because I struggled for a bit (I copied and pasted your example expecting it to work).  I'm still somewhat new to (and still learning) Python so I'm sure I'm not the only one who will forget to change the appropriate variables to their ArcGIS data.

                      • Re: End-of-line (EOL) Problem
                        curtvprice

                        This method seems a little dangerous as I can envision where the string "0a" could occur in your hex string by accident, for example the two byte hex "b0 ac" represented this way would be an invalid match.

                         

                        Here's a function implementation of your approach, not using hex codes. You could paste this function into the ArcMap python window and use it there. Note I'm using chr(10) and chr(13) for "\n" and "\r" so this function can also be used inside the Calculate Value tool in modelbuilder... as the usual use of "\n" breaks the geoprocessing messaging string representation in the code...

                         

                        I'm also using arcpy.da.UpdateCursor because it is very fast compared to the 10.0 flavor.... and the "with" construct helps you out by closing the cursor even if the code fails -- avoiding the possibility of a nasty hanging file lock!

                         

                        import arcpy
                        def strip_newlines(tbl, field, eolchar=""):
                          with arcpy.da.UpdateCursor(tbl, [field]) as rows:
                            for row in rows:  
                              row[0] = row[0].replace(chr(10), eolchar).replace(chr(13), eolchar) 
                              rows.updateRow(row)  
                        
                        
                        
                        1 of 1 people found this helpful
                    • Re: End-of-line (EOL) Problem
                      wroe
                      This is a useful thread.  I've run in to related problems in how to find and correct them (VB vs Python). When performing Python field calculations on fields containing \n, \x, etc, I get the same errors as Nick described--but I don't get them when using VB (because \n et al aren't special in VB?). So a few points:

                      1) I didn't even realize string fields could support carriage returns! Apparently this is new as of 9.2? For example, my address string field can contain:

                      568 N Courier Ct
                      and I can type more here
                      and here if I use "ctrl+enter"
                      or if the incoming table
                      had \n or \r for line breaks

                      ... but when I visually inspect the cell, all I see is "568 N Courier Ct" since it's on the first line. It is not visually apprent without starting an edit session, highlighting text, and dragging and/or arrowing. This is a real hazard when importing from text files or Excel; this could really drive somebody nuts when geocoding imported addresses.

                      2) Besides \n, I've also run in to problems with \x in the fields as well. I guess any of the reserved characters would be problematic. ArcGIS SQL is a little helpful for spotting these, in that

                      SELECT * FROM street_features WHERE
                      "street" LIKE '%
                      %'

                      finds all instances of carriage returns in the "street" field ... but I have no idea how to write a similar query to find instances of \x, \b, etc.

                      3) Nick's hexidecimal conversion seems to work for \n but, again, how can we apply it to other special characters? Or is it just easier to use VB instead of Python for field calculations?
                      • Re: End-of-line (EOL) Problem
                        jacobne
                        Cool idea using SQL to search for carriage returns.  Later on, I came to the conclusion that I shouldn't bother stripping out the carriage returns, or any of the other random problems with the data for that matter.  At some point I needed to shift the responsibility of data maintenance over to the end user of whatever model or script I was writing.  So, in the end I think I included some exception handling to specifically watch for them.

                        As for the never ending programming language of preference debate, does it really matter?  Who honestly writes more than a handful of scripts or models per year anyways?
                        • Re: End-of-line (EOL) Problem
                          curtvprice
                          Or is it just easier to use VB instead of Python for field calculations?


                          I believe (and I think many would agree) that Python has far superior string manipulation capabilities.

                          Here are some pretty good Python methods for stripping non printables:

                          http://stackoverflow.com/questions/92438/stripping-non-printable-characters-from-a-string-in-python
                          http://stackoverflow.com/questions/1276764/stripping-everything-but-alphanumeric-chars-from-a-string-in-python
                          • Re: End-of-line (EOL) Problem
                            janvanlinge

                            3) Nick's hexidecimal conversion seems to work for \n but, again, how can we apply it to other special characters? Or is it just easier to use VB instead of Python for field calculations?


                            I ran into the same problem when using python to add hyperlinks. As hyperlinks contain "\" characters it sometimes happened that a "\n" was in the hyperlink. I solved it by passing in the hyperlink as a raw string instead of a normal string:

                            hyperlink = "c:\somehyperlink\name_of_file"
                            arcpy.CalculateField_management(TableToEdit, "HYPERLINK_FIELD", r"r'" + hyperlink + r"'", "PYTHON")
                            • Re: End-of-line (EOL) Problem
                              curtvprice
                              I ran into the same problem when using python to add hyperlinks. As hyperlinks contain "\" characters it sometimes happened that a "\n" was in the hyperlink. I solved it by passing in the hyperlink as a raw string instead of a normal string:

                              hyperlink = "c:\somehyperlink\name_of_file"
                              arcpy.CalculateField_management(TableToEdit, "HYPERLINK_FIELD", r"r'" + hyperlink + r"'", "PYTHON")


                              The above code embeds a newline (\n) in the string:

                              >>> print "c:\somehyperlink\name_of_file"
                              c:\somehyperlink
                              ame_of_file


                              I think this may work better:

                              >>> print 'r"{0}"'.format(r"c:\somehyperlink\name_of_file")
                              r"c:\somehyperlink\name_of_file"



                              hyperlink = r"c:\somehyperlink\name_of_file"
                              arcpy.CalculateField_management(TableToEdit, "HYPERLINK_FIELD", '{0}"'.format(hyperlink), "PYTHON")


                              I thought I'd add one more thing to this post: how to specify newlines in the Calculate Field tool in ModelBuilder. The interactive tool dialog parser converts "\n" to real newlines in the code box, which doesn't work, so the workaround is to to use chr(10). I've used this as a quick and dirty way to have model builder print a message:

                              Expression: msg()

                              def msg():
                                # text = "\n\nThis is\na message to you.\n"  # does not work
                                text = "{0}{0}This is{0}a message to you.{0}".format(chr(10))
                                return text
                              • Re: End-of-line (EOL) Problem
                                mnakleh
                                I do not believe that this problem has been sufficiently adressed.

                                Using the Field Calculator box, with text containing a CR-LF (Characters 10-13), I have tested all of the following:

                                • Using a replace ('\r\n', '')

                                • Using strip

                                • Using filter

                                • Using a comprehension that decompiles and checks individual characters


                                Even using a Codeblock, these tactics did not seem to work.

                                However, if I go from the console and set up something like:
                                rows = arcpy.UpdateCursor(fc)
                                for row in rows:
                                    if '\r\n' in row.TextString:
                                         row.setValue('TextString', row.TextString.replace('\r\n', ' '))
                                         rows.updateRow(row)
                                    del row
                                del rows

                                It works exactly as one would expect. But I would love to know more about why this doesn't seem to work from the Field Calculator window.
                                • Re: End-of-line (EOL) Problem
                                  iggz
                                  Yeah, still not working.
                                  • Re: End-of-line (EOL) Problem
                                    curtvprice

                                    However, if I go from the console and set up something like:
                                    rows = arcpy.UpdateCursor(fc)
                                    for row in rows:
                                        if '\r\n' in row.TextString:
                                             row.setValue('TextString', row.TextString.replace('\r\n', ' '))
                                             rows.updateRow(row)
                                    del row, rows
                                    

                                    It works exactly as one would expect. But I would love to know more about why this doesn't seem to work from the Field Calculator window.


                                    The problem is that you cannot use Python escape codes like "\r" in the Field Calculator code block or the Calculate Value code block. I'm assuming this has something to do with the parsing of python arguments into string representation in the arcpy/gp messaging framework.

                                    If you need to access escape characters, use the chr() function instead.

                                    This will probably work fine:

                                    rows = arcpy.UpdateCursor(fc)
                                    for row in rows:
                                        newline = chr(13) + chr(10)
                                        if newline in row.TextString:
                                             row.setValue('TextString', row.TextString.replace(newline, ' '))
                                             rows.updateRow(row)
                                    del row, rows
                                    
                                    • Re: End-of-line (EOL) Problem
                                      ianbroad
                                      The problem is that you cannot use Python escape codes like "\r" in the Field Calculator code block or the Calculate Value code block. I'm assuming this has something to do with the parsing of python arguments into string representation in the arcpy/gp messaging framework.

                                      If you need to access escape characters, use the chr() function instead.

                                      This will probably work fine:

                                      rows = arcpy.UpdateCursor(fc)
                                      for row in rows:
                                          newline = chr(13) + chr(10)
                                          if newline in row.TextString:
                                               row.setValue('TextString', row.TextString.replace(newline, ' '))
                                               rows.updateRow(row)
                                          del row
                                      del rows


                                      Thanks Curtis, I'll give that a shot.
                                      • Re: End-of-line (EOL) Problem
                                        mnakleh
                                        Hello Curtis,

                                        I think I was too vague: the cursor example I provided works fine. However, NOTHING I have tried in the Field Calculator box worked. Even if I replace references of '\r\n' to (chr13) + chr(10), it still doesn't work.

                                        One of the responses on GIS StackExchange describing the exact same problem recommends the same thing as you, and those who tried it seemed to have just as little luck as I did.

                                        I set up a quick test just to make sure that I was isolating the issue:

                                        1. In a new shapefile, I add 2 text fields (TEXTFIELD, NEWTEXTF)

                                        2. I create a single feature

                                        3. I type the following text in Notepad: "This is a[ENTER]test" (where [ENTER] represents pressing the Enter button)

                                        4. I copy-paste this text (which is on two lines) into the feature's TEXTFIELD value

                                        5. I then run the following in FieldCalculator: NEWTEXTF = !TEXTFIELD!.upper()



                                        This generates the following error message:
                                        Executing: CalculateField test NEWTEXTF !TEXTFIELD!.upper() PYTHON_9.3 #
                                        Start Time: Thu Jul 18 12:35:16 2013
                                        ERROR 000539: Error running expression: "This is a
                                        test".upper() <type 'exceptions.SyntaxError'>: EOL while scanning string literal (<string>, line 1)
                                        Failed to execute (CalculateField).
                                        Failed at Thu Jul 18 12:35:16 2013 (Elapsed Time: 0,00 seconds)


                                        Any attempts to replace the newlines, using either either escape sequences or chr() calls, result in the same error.
                                        It looks as if the CalculateField is passing along the newlines unescaped, which breaks the interpreter.

                                        So, a couple of questions come to mind:

                                        1. Do you get the same behaviour as me for the basic case of !TEXTFIELD!.upper()?

                                        2. If yes, does this mean that ALL CalculateField calls that use the Python interpreter need to have their input sanitized to remove newlines? Or that we should just switch to Cursors in all cases to avoid any errors or difficulties?

                                        3. Could you paste the actual working code you used to get the example working properly?



                                        If you'd prefer, we can correspond directly by e-mail too, so I can send you the samples I have.

                                        Thanks so much!
                                        • Re: End-of-line (EOL) Problem
                                          ksed
                                          Dang it!

                                          Why does the field calculator for Python reject standard strings with \t , \n, etc. characters? The only one it seems to accept is \r.

                                          e.g. f.write(!textfield1!+', '+!textfield2!+'\r') works
                                          but f.write(!textfield1!+',\t'+!textfield2!+'\t') doesn't work

                                          This bug severely limits the possibilities for writing output from a table to a file or an email.

                                          What is the point of castrating Python's string operators?
                                          • Re: End-of-line (EOL) Problem
                                            curtvprice
                                            Could you paste the actual working code you used to get the example working properly?


                                            import os
                                            import arcpy
                                            
                                            tbl = arcpy.CreateScratchName("","","table","in_memory")
                                            arcpy.CreateTable_management("in_memory",os.path.basename(tbl))
                                            arcpy.AddField_management(tbl,"TESTFIELD","TEXT")
                                            Rows = arcpy.InsertCursor(tbl)
                                            Row = Rows.newRow()
                                            Rows.insertRow(Row)
                                            del Row, Rows
                                            arcpy.CalculateField_management(tbl,"TESTFIELD","chr(10) + chr(13)","PYTHON_9.3")
                                            print arcpy.GetMessages()
                                            Rows = arcpy.SearchCursor(tbl)
                                            Row = Rows.next()
                                            print "Field value: ",repr(Row.TESTFIELD)
                                            del Row, Rows
                                            


                                            Results:

                                            Executing: CalculateField in_memory\xx0 TESTFIELD "chr(10) + chr(13)" PYTHON_9.3 #
                                            Start Time: Mon Aug 05 10:49:33 2013
                                            Succeeded at Mon Aug 05 10:49:33 2013 (Elapsed Time: 0.00 seconds)
                                            Field value:  u'\n\r'
                                            


                                            Also this worked fine for me:

                                            [ATTACH=CONFIG]26465[/ATTACH]

                                            I think the problem you're running into is using the value of the field !TESTFIELD! if the field contains newlines - the tool will substitute in the value of the field into the expression - the geoprocessing messaging and Python interpreter can't deal with this.

                                            I think the method you found is a good approach, that is, using Python with cursors (using the Calculate Value tool if in ModelBuilder) instead of using Calculate Field.  I don't see another way around this, which is a design issue with the way Calculate Field accesses field values, and its connection with the geoprocessing message environment in how things are passed to Python.

                                            Seems to me this is a good enhancement request for Calculate Field, i.e. have non printables converted to escape codes as part of the !FIELDNAME! -> value substitution process.
                                            • Re: End-of-line (EOL) Problem
                                              curtvprice
                                              Why does the field calculator for Python reject standard strings with \t , \n, etc. characters?

                                              What is the point of castrating Python's string operators?


                                              This is a limitation of the ArcGIS geoprocessing tool setup, which has to pass all tool parameters in string representation. The expression and code block are Calculate Field tool parameters. (String reps are easily used to pass tool parameters across the web, XML, etc.)

                                              The fix is to use the chr() function instead of escape codes in your expression or Calculate Field code block.