Re: Dynamic segmentation with respect to a control factor Additional Requirements

7513
56
05-08-2015 08:06 AM
ChukwuemekaEzeiruaku
New Contributor II

I was able to figure out the error from earlier, but i have one more request.

If i have other fields in the table i want to also show on the out_table for example see table below:

   

DataUniqueIDItem IDBegin EndInput Table
Observation1555100001
555100105
555100217
555100345
555100435
Observation2600200103
600200226
600200335

   

DataUniqueIDBegin EndItem IDOutput Table
Observation1555011001
555131001; 1002
555341001; 1002; 1004
555451001; 1002; 1003; 1004
555571002
Observation2600022001; 2002
600232001; 2002
600352002; 2003
600562003

Is there a way to differentiate between records that are within a certain range(with UniqueID) as shown in the table above, plus have other fields for that range (with same uniqueID) in the output table?

I am really grateful for your help thus far guys, Thanks.

0 Kudos
56 Replies
ChukwuemekaEzeiruaku
New Contributor II

I know my request never ends, but we are (with Blake T) very close to getting a solution for this and i appreciate every help I've got so far... Thanks guys

@Blake T: I have a few questions/request.

the following code has been very useful to me but i observed a logic that wasn't considered and has resulted to  some weird output.

Python - Code

  1. def main(): 
  2.     import arcpy 
  3.     import os 
  4.  
  5.     # Local variables 
  6.     sourcegdb = r"N:\TechTemp\BlakeT\Work\TEMP.gdb" 
  7.     table_in = os.path.join(sourcegdb, "SegmentSequence_input"
  8.  
  9.     # Create output table 
  10.     arcpy.CreateTable_management(sourcegdb, "SegmentSequence_output"
  11.     table_out = os.path.join(sourcegdb, "SegmentSequence_output"
  12.     arcpy.AddField_management(table_out, "UniqueID", "LONG"
  13.     arcpy.AddField_management(table_out, "Begin", "DOUBLE"
  14.     arcpy.AddField_management(table_out, "End", "DOUBLE"
  15.     arcpy.AddField_management(table_out, "SegmentItems", "TEXT", "", "", 255
  16.     arcpy.AddField_management(table_out, "MaxDate", "DATE"
  17.  
  18.     # Identify observation groups 
  19.     sqlprefix = "DISTINCT UniqueID" 
  20.     sqlpostfix = "ORDER BY UniqueID" 
  21.     observations = tuple(uid[0
  22.         for uid in arcpy.da.SearchCursor( 
  23.             table_in, 
  24.             ["UniqueID"], 
  25.             sql_clause=(sqlprefix, sqlpostfix) 
  26.         ) 
  27.     ) 
  28.  
  29.     fields_in = [ 
  30.         "ItemID"
  31.         "Begin_Station_Num_m"
  32.         "End_Station_Num_m"
  33.         "TestDate" 
  34.     ] 
  35.     fields_out = [ 
  36.         "UniqueID"
  37.         "Begin"
  38.         "End"
  39.         "SegmentItems"
  40.         "MaxDate" 
  41.     ] 
  42.     with arcpy.da.InsertCursor(table_out, fields_out) as i_cursor: 
  43.         for obsv in observations: 
  44.             # Read table into dictionary with rows as item: (begin, end, date) 
  45.             where_clause = "UniqueID = {}".format(obsv) 
  46.             itemsdict = {r[0]:(r[1], r[2], r[3]) 
  47.                 for r in arcpy.da.SearchCursor( 
  48.                     table_in, 
  49.                     fields_in, 
  50.                     where_clause 
  51.                 ) 
  52.             } 
  53.  
  54.             # Identify segments 
  55.             allsegments = [s[0] for s in itemsdict.values()] + [s[1] for s in itemsdict.values()] 
  56.             segments = tuple(sorted(set(allsegments)))  ## creates only unique segments 
  57.             del allsegments 
  58.  
  59.             # Identify items and date in each segment 
  60.             for i in range(len(segments)-1😞 
  61.                 begin = segments 
  62.                 end = segments[i + 1
  63.                 seg_itemsdict = {k: v[2
  64.                     for k, v in itemsdict.items() 
  65.                     if v[0] <= begin and v[1] >= end 
  66.                 } 
  67.  
  68.                 # Write segment items to output table 
  69.                 itemstext = str(seg_itemsdict.keys()).strip('[]'
  70.                 itemsdates = [i for i in seg_itemsdict.values() if i is not None
  71.                 ## Do not attempt to find max date if there 
  72.                 ## are no dates for the items in the segment. 
  73.                 if len(itemsdates) > 0
  74.                     itemsdates_max = max(itemsdates) 
  75.                 else
  76.                     itemsdates_max = None 
  77.                 row = (obsv, begin, end, itemstext, itemsdates_max) 
  78.                 i_cursor.insertRow(row) 
  79.  
  80.  
  81. if __name__ == '__main__'
  82.     main()

From the input table I observed the following occurrence (see input table below).

     

UniqueIDItemIDBegin_StaEnd_StatiTestScoreTestDate
104400258384648.4593416.329744/27/1968
104400258493416.32 109218.79744/26/1968
1044002585109218.7109494.29745/1/1968
10500025850117.9589745/1/1968
1050002586117.95822900.849748/24/1968
105000258710764.0110903.619742/1/2003
105000258712647.3712749.789742/1/2003
105100258622900.8428677.729748/24/1968
105100258828677.7239774.279748/26/1968
105100258939774.2748264.479748/28/1968

From the input table(above), I observed that there could be more than one instance of a certain ItemID within a UniqueID (see highlighted cells). This has resulted in weird output such as where one of the ItemID value is omitted in the segmentation and is not captured in the final output (see output table below for UniqueID 105000)

UniqueIDBeginEndSegmentItemsTestDate
1050000117.95825855/1/1968
105000117.95812647.3725868/24/1968
10500012647.3712749.782586, 25872/1/2003
10500012749.7822900.8425868/24/1968

And the following row was omitted in the segmentation because of the repetition within a UniqueID, I think( see omitted row)

     

UniqueIDItemIDBeginEndTestScoreTestDate
105000258710764.0110903.619742/1/2003

Again thank you and I hope I didn't get you confused even more

0 Kudos
BlakeTerhune
MVP Regular Contributor

I found the problem was in creating the itemsdict dictionary. A dictionary has to have unique keys so the duplicate values in your ItemID field were getting overwritten when put into the dictionary. To solve the problem, I used a named tuple to store the input data rows instead of a dictionary. Please test again and let me know how it goes.

def main():
    import arcpy
    import os
    from collections import namedtuple

    # Local variables
    sourcegdb = r"N:\TechTemp\BlakeT\Work\TEMP.gdb"
    table_in = os.path.join(sourcegdb, "SegmentSequence_Input_20150516")
    table_out_name = "SegmentSequence_Output_20150516"

    # Create output table
    arcpy.CreateTable_management(sourcegdb, table_out_name)
    table_out = os.path.join(sourcegdb, table_out_name)
    arcpy.AddField_management(table_out, "UniqueID", "LONG")
    arcpy.AddField_management(table_out, "Begin", "DOUBLE")
    arcpy.AddField_management(table_out, "End", "DOUBLE")
    arcpy.AddField_management(table_out, "SegmentItems", "TEXT", "", "", 255)
    arcpy.AddField_management(table_out, "MaxDate", "DATE")

    # Identify observation groups
    sqlprefix = "DISTINCT UniqueID"
    sqlpostfix = "ORDER BY UniqueID"
    observations = tuple(
        uid[0] for uid in arcpy.da.SearchCursor(
            table_in,
            ["UniqueID"],
            sql_clause=(sqlprefix, sqlpostfix)
        )
    )

    fields_in = [
        "ItemID",
        "Begin_Station",
        "End_Station",
        "TestDate"
    ]
    row_tpl = namedtuple('row_tpl', 'iID, begin, end, tDate')
    fields_out = [
        "UniqueID",
        "Begin",
        "End",
        "SegmentItems",
        "MaxDate"
    ]
    with arcpy.da.InsertCursor(table_out, fields_out) as i_cursor:
        for obsv in observations:
            # Read table into dictionary with rows as item: (begin, end, date)
            where_clause = "UniqueID = {}".format(obsv)
            print("Processing {}".format(where_clause))
            with arcpy.da.SearchCursor(table_in, fields_in, where_clause) as s_cursor:
                table_in_rows = map(row_tpl._make, s_cursor)

            # Identify segments
            allsegments = [row.begin for row in table_in_rows] + [row.end for row in table_in_rows]
            segments = tuple(sorted(set(allsegments)))  ## creates only unique segments
            del allsegments

            # Identify items and date in each segment
            for i in range(len(segments)-1):
                begin = segments
                end = segments[i + 1]
                seg_itemsdict = {
                    row.iID: row.tDate for row in table_in_rows
                    if row.begin <= begin and row.end >= end
                }

                # Write segment items to output table
                itemstext = str(seg_itemsdict.keys()).strip('[]')
                itemsdates = [i for i in seg_itemsdict.values() if i is not None]
                ## Do not attempt to find max date if there
                ## are no dates for the items in the segment.
                if len(itemsdates) > 0:
                    itemsdates_max = max(itemsdates)
                else:
                    itemsdates_max = None
                row = (obsv, begin, end, itemstext, itemsdates_max)
                i_cursor.insertRow(row)

    print("Done. Output table at {}".format(table_out))


if __name__ == '__main__':
    main()
0 Kudos
BlakeTerhune
MVP Regular Contributor
# Identify observation groups
sqlprefix = "DISTINCT UniqueID"
sqlpostfix = "ORDER BY UniqueID"
observations = tuple(
    uid[0] for uid in arcpy.da.SearchCursor(
        table_in,
        ["UniqueID"],
        sql_clause=(sqlprefix, sqlpostfix)
    )
)

This section uses the SQL prefix and postfix options in the SQL Clause of an arcpy.da.SearchCursor() to bring back only distinct (no duplicate) values in the UniqueID field and order them in ascending order. That search cursor is inside something called a generator expression, which is coding shorthand to create a fully populated iterator. This makes a tuple, but you can also do it with lists and dictionaries. In this case, I'm putting all of the values returned by the search cursor (which, remember, are only distinct) into a tuple so I don't have to have two search cursors open on the same table.

fields_in = [
    "ItemID",
    "Begin_Station",
    "End_Station",
    "TestDate"
]
row_tpl = namedtuple('row_tpl', 'iID, begin, end, tDate')
fields_out = [
    "UniqueID",
    "Begin",
    "End",
    "SegmentItems",
    "MaxDate"
]
with arcpy.da.InsertCursor(table_out, fields_out) as i_cursor:
    for obsv in observations:

Here I'm defining the field names as a list for the input and output tables. I also create the named tuple object and define the value names in there. The field name lists are used for the arcpy.da cursors. At the end, I open an insert cursor on the output table, which will be used to insert rows as they are processed. Before processing the segments and items, I need to do it only for a particular UniqueID (or "Observation"). That last line there is iterating the distinct observations and doing whatever code is inside for each one.

# Read table into dictionary with rows as item: (begin, end, date)
where_clause = "UniqueID = {}".format(obsv)
print("Processing {}".format(where_clause))
with arcpy.da.SearchCursor(table_in, fields_in, where_clause) as s_cursor:
    table_in_rows = map(row_tpl._make, s_cursor)

Here is where I populate the named tuple with all of the rows returned from the input table. I start by defining the where clause for the search cursor so it only bring back the rows for the current observation (UniqueID). Now that I think about it, this also probably could have used a generator expression.

# Identify segments
allsegments = [row.begin for row in table_in_rows] + [row.end for row in table_in_rows]
segments = tuple(sorted(set(allsegments)))  ## creates only unique segments
del allsegments

This section first combines all begin and end segment values into a single list; you can see the concatenation of the two lists with a simple +. However, this list contains duplicates and can be in any order. The duplicates are eliminated using the set() method. They are ordered ascending by the sorted() function. Then, the distinct, sorted list is converted to a tuple to preserve the order (which, in hindsight, might not be necessary. I did it as a precaution but never tested it without). I also delete the allsegments variable to free up memory and because the data is no longer needed; it was just a scratch-pad to create the distinct segments tuple.

# Identify items and date in each segment
for i in range(len(segments)-1):
    begin = segments
    end = segments[i + 1]
    seg_itemsdict = {
        row.iID: row.tDate for row in table_in_rows
        if row.begin <= begin and row.end >= end
    }

Most everything up until now has just been gathering and manipulating the data. This is where the real "logic" is that identifies all the ItemID values that fall within a segment range. It also records the date for each item that will be compared to find the max in the next step. It's a little hard to describe with text, so let me know if you want something specific clarified here.

# Write segment items to output table
itemstext = str(seg_itemsdict.keys()).strip('[]')
itemsdates = [i for i in seg_itemsdict.values() if i is not None]
## Do not attempt to find max date if there
## are no dates for the items in the segment.
if len(itemsdates) > 0:
    itemsdates_max = max(itemsdates)
else:
    itemsdates_max = None
row = (obsv, begin, end, itemstext, itemsdates_max)
i_cursor.insertRow(row)

Finally, this is where the data is actually written to the output table. It starts by formatting the list of ItemID's as plain text so it looks nice in the table. Then I use another generator expression that puts all of the date values for the items in the segment into a list. That list is then first checked for length to see if there are any dates. If there are, then it finds the maximum, which is what will be written to the output table. The last line is where we call the insert cursor to write the row.

0 Kudos
ChukwuemekaEzeiruaku
New Contributor II

I've looked at a few things online and books and yet to find good stuff, if you wouldn't mind recommending a stuff that can help me especially in understanding this sect of the script(see below):

   # Identify observation groups 

    sqlprefix = "DISTINCT UniqueID" 

    sqlpostfix = "ORDER BY UniqueID" 

    observations = tuple(uid[0] 

        for uid in arcpy.da.SearchCursor( 

            table_in, 

            ["UniqueID"], 

            sql_clause=(sqlprefix, sqlpostfix) 

        ) 

    ) 

 

    fields_in = [ 

        "ItemID", 

        "Begin_Station_Num_m", 

        "End_Station_Num_m", 

        "TestDate" 

    ] 

    fields_out = [ 

        "UniqueID", 

        "Begin", 

        "End", 

        "SegmentItems", 

        "MaxDate" 

    ] 

    with arcpy.da.InsertCursor(table_out, fields_out) as i_cursor: 

        for obsv in observations: 

            # Read table into dictionary with rows as item: (begin, end, date) 

            where_clause = "UniqueID = {}".format(obsv) 

            itemsdict = {r[0]:(r[1], r[2], r[3]) 

                for r in arcpy.da.SearchCursor( 

                    table_in, 

                    fields_in, 

                    where_clause 

                ) 

            } 

 

            # Identify segments 

            allsegments = [s[0] for s in itemsdict.values()] + [s[1] for s in itemsdict.values()] 

            segments = tuple(sorted(set(allsegments)))  ## creates only unique segments 

            del allsegments 

 

            # Identify items and date in each segment 

            for i in range(len(segments)-1): 

                begin = segments 

                end = segments[i + 1] 

                seg_itemsdict = {k: v[2] 

                    for k, v in itemsdict.items() 

                    if v[0] <= begin and v[1] >= end 

                }

0 Kudos
ChukwuemekaEzeiruaku
New Contributor II

Thanks a lot Blake Helm

0 Kudos
ChukwuemekaEzeiruaku
New Contributor II

Hi Blake T hope you're doing great? just want to ask if you have any knowledge about mobile app creation?

0 Kudos
BlakeTerhune
MVP Regular Contributor

I don't have any experience coding native mobile apps. There is a GeoNet community space for native app developers with subspaces for each mobile operating system (iOS, Android, etc).

0 Kudos