I was able to figure out the error from earlier, but i have one more request.
If i have other fields in the table i want to also show on the out_table for example see table below:
Data | UniqueID | Item ID | Begin | End | Input Table |
Observation1 | 555 | 1000 | 0 | 1 | |
555 | 1001 | 0 | 5 | ||
555 | 1002 | 1 | 7 | ||
555 | 1003 | 4 | 5 | ||
555 | 1004 | 3 | 5 | ||
Observation2 | 600 | 2001 | 0 | 3 | |
600 | 2002 | 2 | 6 | ||
600 | 2003 | 3 | 5 |
Data | UniqueID | Begin | End | Item ID | Output Table |
Observation1 | 555 | 0 | 1 | 1001 | |
555 | 1 | 3 | 1001; 1002 | ||
555 | 3 | 4 | 1001; 1002; 1004 | ||
555 | 4 | 5 | 1001; 1002; 1003; 1004 | ||
555 | 5 | 7 | 1002 | ||
Observation2 | 600 | 0 | 2 | 2001; 2002 | |
600 | 2 | 3 | 2001; 2002 | ||
600 | 3 | 5 | 2002; 2003 | ||
600 | 5 | 6 | 2003 |
Is there a way to differentiate between records that are within a certain range(with UniqueID) as shown in the table above, plus have other fields for that range (with same uniqueID) in the output table?
I am really grateful for your help thus far guys, Thanks.
I know my request never ends, but we are (with Blake T) very close to getting a solution for this and i appreciate every help I've got so far... Thanks guys
@Blake T: I have a few questions/request.
the following code has been very useful to me but i observed a logic that wasn't considered and has resulted to some weird output.
Python - Code
From the input table I observed the following occurrence (see input table below).
UniqueID | ItemID | Begin_Sta | End_Stati | TestScore | TestDate |
104400 | 2583 | 84648.45 | 93416.32 | 974 | 4/27/1968 |
104400 | 2584 | 93416.32 | 109218.7 | 974 | 4/26/1968 |
104400 | 2585 | 109218.7 | 109494.2 | 974 | 5/1/1968 |
105000 | 2585 | 0 | 117.958 | 974 | 5/1/1968 |
105000 | 2586 | 117.958 | 22900.84 | 974 | 8/24/1968 |
105000 | 2587 | 10764.01 | 10903.61 | 974 | 2/1/2003 |
105000 | 2587 | 12647.37 | 12749.78 | 974 | 2/1/2003 |
105100 | 2586 | 22900.84 | 28677.72 | 974 | 8/24/1968 |
105100 | 2588 | 28677.72 | 39774.27 | 974 | 8/26/1968 |
105100 | 2589 | 39774.27 | 48264.47 | 974 | 8/28/1968 |
From the input table(above), I observed that there could be more than one instance of a certain ItemID within a UniqueID (see highlighted cells). This has resulted in weird output such as where one of the ItemID value is omitted in the segmentation and is not captured in the final output (see output table below for UniqueID 105000)
UniqueID | Begin | End | SegmentItems | TestDate |
105000 | 0 | 117.958 | 2585 | 5/1/1968 |
105000 | 117.958 | 12647.37 | 2586 | 8/24/1968 |
105000 | 12647.37 | 12749.78 | 2586, 2587 | 2/1/2003 |
105000 | 12749.78 | 22900.84 | 2586 | 8/24/1968 |
And the following row was omitted in the segmentation because of the repetition within a UniqueID, I think( see omitted row)
UniqueID | ItemID | Begin | End | TestScore | TestDate |
105000 | 2587 | 10764.01 | 10903.61 | 974 | 2/1/2003 |
Again thank you and I hope I didn't get you confused even more
I found the problem was in creating the itemsdict dictionary. A dictionary has to have unique keys so the duplicate values in your ItemID field were getting overwritten when put into the dictionary. To solve the problem, I used a named tuple to store the input data rows instead of a dictionary. Please test again and let me know how it goes.
def main(): import arcpy import os from collections import namedtuple # Local variables sourcegdb = r"N:\TechTemp\BlakeT\Work\TEMP.gdb" table_in = os.path.join(sourcegdb, "SegmentSequence_Input_20150516") table_out_name = "SegmentSequence_Output_20150516" # Create output table arcpy.CreateTable_management(sourcegdb, table_out_name) table_out = os.path.join(sourcegdb, table_out_name) arcpy.AddField_management(table_out, "UniqueID", "LONG") arcpy.AddField_management(table_out, "Begin", "DOUBLE") arcpy.AddField_management(table_out, "End", "DOUBLE") arcpy.AddField_management(table_out, "SegmentItems", "TEXT", "", "", 255) arcpy.AddField_management(table_out, "MaxDate", "DATE") # Identify observation groups sqlprefix = "DISTINCT UniqueID" sqlpostfix = "ORDER BY UniqueID" observations = tuple( uid[0] for uid in arcpy.da.SearchCursor( table_in, ["UniqueID"], sql_clause=(sqlprefix, sqlpostfix) ) ) fields_in = [ "ItemID", "Begin_Station", "End_Station", "TestDate" ] row_tpl = namedtuple('row_tpl', 'iID, begin, end, tDate') fields_out = [ "UniqueID", "Begin", "End", "SegmentItems", "MaxDate" ] with arcpy.da.InsertCursor(table_out, fields_out) as i_cursor: for obsv in observations: # Read table into dictionary with rows as item: (begin, end, date) where_clause = "UniqueID = {}".format(obsv) print("Processing {}".format(where_clause)) with arcpy.da.SearchCursor(table_in, fields_in, where_clause) as s_cursor: table_in_rows = map(row_tpl._make, s_cursor) # Identify segments allsegments = [row.begin for row in table_in_rows] + [row.end for row in table_in_rows] segments = tuple(sorted(set(allsegments))) ## creates only unique segments del allsegments # Identify items and date in each segment for i in range(len(segments)-1): begin = segments end = segments[i + 1] seg_itemsdict = { row.iID: row.tDate for row in table_in_rows if row.begin <= begin and row.end >= end } # Write segment items to output table itemstext = str(seg_itemsdict.keys()).strip('[]') itemsdates = [i for i in seg_itemsdict.values() if i is not None] ## Do not attempt to find max date if there ## are no dates for the items in the segment. if len(itemsdates) > 0: itemsdates_max = max(itemsdates) else: itemsdates_max = None row = (obsv, begin, end, itemstext, itemsdates_max) i_cursor.insertRow(row) print("Done. Output table at {}".format(table_out)) if __name__ == '__main__': main()
# Identify observation groups sqlprefix = "DISTINCT UniqueID" sqlpostfix = "ORDER BY UniqueID" observations = tuple( uid[0] for uid in arcpy.da.SearchCursor( table_in, ["UniqueID"], sql_clause=(sqlprefix, sqlpostfix) ) )
This section uses the SQL prefix and postfix options in the SQL Clause of an arcpy.da.SearchCursor() to bring back only distinct (no duplicate) values in the UniqueID field and order them in ascending order. That search cursor is inside something called a generator expression, which is coding shorthand to create a fully populated iterator. This makes a tuple, but you can also do it with lists and dictionaries. In this case, I'm putting all of the values returned by the search cursor (which, remember, are only distinct) into a tuple so I don't have to have two search cursors open on the same table.
fields_in = [ "ItemID", "Begin_Station", "End_Station", "TestDate" ] row_tpl = namedtuple('row_tpl', 'iID, begin, end, tDate') fields_out = [ "UniqueID", "Begin", "End", "SegmentItems", "MaxDate" ] with arcpy.da.InsertCursor(table_out, fields_out) as i_cursor: for obsv in observations:
Here I'm defining the field names as a list for the input and output tables. I also create the named tuple object and define the value names in there. The field name lists are used for the arcpy.da cursors. At the end, I open an insert cursor on the output table, which will be used to insert rows as they are processed. Before processing the segments and items, I need to do it only for a particular UniqueID (or "Observation"). That last line there is iterating the distinct observations and doing whatever code is inside for each one.
# Read table into dictionary with rows as item: (begin, end, date) where_clause = "UniqueID = {}".format(obsv) print("Processing {}".format(where_clause)) with arcpy.da.SearchCursor(table_in, fields_in, where_clause) as s_cursor: table_in_rows = map(row_tpl._make, s_cursor)
Here is where I populate the named tuple with all of the rows returned from the input table. I start by defining the where clause for the search cursor so it only bring back the rows for the current observation (UniqueID). Now that I think about it, this also probably could have used a generator expression.
# Identify segments allsegments = [row.begin for row in table_in_rows] + [row.end for row in table_in_rows] segments = tuple(sorted(set(allsegments))) ## creates only unique segments del allsegments
This section first combines all begin and end segment values into a single list; you can see the concatenation of the two lists with a simple +. However, this list contains duplicates and can be in any order. The duplicates are eliminated using the set() method. They are ordered ascending by the sorted() function. Then, the distinct, sorted list is converted to a tuple to preserve the order (which, in hindsight, might not be necessary. I did it as a precaution but never tested it without). I also delete the allsegments variable to free up memory and because the data is no longer needed; it was just a scratch-pad to create the distinct segments tuple.
# Identify items and date in each segment for i in range(len(segments)-1): begin = segments end = segments[i + 1] seg_itemsdict = { row.iID: row.tDate for row in table_in_rows if row.begin <= begin and row.end >= end }
Most everything up until now has just been gathering and manipulating the data. This is where the real "logic" is that identifies all the ItemID values that fall within a segment range. It also records the date for each item that will be compared to find the max in the next step. It's a little hard to describe with text, so let me know if you want something specific clarified here.
# Write segment items to output table itemstext = str(seg_itemsdict.keys()).strip('[]') itemsdates = [i for i in seg_itemsdict.values() if i is not None] ## Do not attempt to find max date if there ## are no dates for the items in the segment. if len(itemsdates) > 0: itemsdates_max = max(itemsdates) else: itemsdates_max = None row = (obsv, begin, end, itemstext, itemsdates_max) i_cursor.insertRow(row)
Finally, this is where the data is actually written to the output table. It starts by formatting the list of ItemID's as plain text so it looks nice in the table. Then I use another generator expression that puts all of the date values for the items in the segment into a list. That list is then first checked for length to see if there are any dates. If there are, then it finds the maximum, which is what will be written to the output table. The last line is where we call the insert cursor to write the row.
I've looked at a few things online and books and yet to find good stuff, if you wouldn't mind recommending a stuff that can help me especially in understanding this sect of the script(see below):
# Identify observation groups
sqlprefix = "DISTINCT UniqueID"
sqlpostfix = "ORDER BY UniqueID"
observations = tuple(uid[0]
for uid in arcpy.da.SearchCursor(
table_in,
["UniqueID"],
sql_clause=(sqlprefix, sqlpostfix)
)
)
fields_in = [
"ItemID",
"Begin_Station_Num_m",
"End_Station_Num_m",
"TestDate"
]
fields_out = [
"UniqueID",
"Begin",
"End",
"SegmentItems",
"MaxDate"
]
with arcpy.da.InsertCursor(table_out, fields_out) as i_cursor:
for obsv in observations:
# Read table into dictionary with rows as item: (begin, end, date)
where_clause = "UniqueID = {}".format(obsv)
itemsdict = {r[0]:(r[1], r[2], r[3])
for r in arcpy.da.SearchCursor(
table_in,
fields_in,
where_clause
)
}
# Identify segments
allsegments = [s[0] for s in itemsdict.values()] + [s[1] for s in itemsdict.values()]
segments = tuple(sorted(set(allsegments))) ## creates only unique segments
del allsegments
# Identify items and date in each segment
for i in range(len(segments)-1):
begin = segments
end = segments[i + 1]
seg_itemsdict = {k: v[2]
for k, v in itemsdict.items()
if v[0] <= begin and v[1] >= end
}
Thanks a lot Blake Helm
Hi Blake T hope you're doing great? just want to ask if you have any knowledge about mobile app creation?
I don't have any experience coding native mobile apps. There is a GeoNet community space for native app developers with subspaces for each mobile operating system (iOS, Android, etc).