Enhancing a Network Dataset to Include From- and To-Nodes

SBattersby · ‎02-06-2024

Snap Tracks is a tool in ArcGIS GeoAnalytics Engine that allows you to “snap” input track points to line segments. For instance, you can take time-enabled GPS points and shift them so that they align with the road segment(s) that they follow. This is a great way to clean up messy input point data and create coherent network paths.

However, in order to use tools like Snap Tracks, you need a network dataset with from- and to-nodes defined for all of the linestrings. Not all network datasets have these attributes included. We need the from- and to-nodes to allow GeoAnalytics Engine to identify how the segments connect to form a traversable network. These from- and to-nodes are located at the end vertices of lines and are used to indicate the directionality and connectivity of the line segments. The start of the line is the "from-node" and the end of the line is the "to-node".

While from- and to-nodes represent directionality of the line segments, that does not mean that the routing on the segments (e.g., driving direction) is defined so that it always goes from the “from” and towards the “to.” The from- and to-nodes simply provide a way to reference and attribute the traveling direction on segments (e.g., both directions, or one way from-to-to, or to-to-from).

Description of how from- and to-nodes can represent directionality

Since not all line network datasets come with these attributes, you may need to calculate them yourself. In this blog post, we’ll walk through a way to calculate and add these values to your network dataset.

Data requirements

There are a few things that you need in place to create and use the from- and to-nodes in your network dataset. First, you need to start with a network dataset with linestrings.

Additionally, the individual segments in the network dataset should have appropriate intersections created. For instance, if there is a street segment that crosses over or under a freeway segment there should not be a node in each segment where they cross since that is not a valid intersection.

Road segments without intersections (and no nodes)Road segments with valid intersection points and corresponding nodes

General workflow

Before we dive into the how-to, here is a general overview of the workflow to set context:

Load a network dataset (without from- and to-nodes defined). In this workbook, we will use a public dataset from the Cuyahoga County Open Data Portal. Specifically, we will use the Addressing Sites Streets dataset.
Using this dataset, we will create a new dataset with only the nodes for the start and end of each segment.
Then we connect the node ID from each to the street network using spatial joins so that each segment has the ID for its from- and to-node.
Finally, we’ll use the Snap Tracks tool with the newly enriched dataset to confirm that it works as expected.

Access and explore the network

As noted above, for this example, we are going to use a street network from the Cuyahoga County Open Data Portal. While we are using a street network, the network dataset doesn’t have to be streets – it just has to be a connected network of something (railway lines, pipes, electrical systems, etc.)

Using the feature service URL, we can load the data into GeoAnalytics Engine directly:

URL = "https://gis.cuyahogacounty.us/server/rest/services/CUYAHOGA_BASE/ADDRESSING_SITES_STREETS/FeatureServer/1"

# transform the roads to 4326 to match the point dataset we’ll use for Snap Tracks
df_roads = spark.read.format("feature-service").load(URL) \
    .withColumn("Shape", ST.transform("Shape", 4326))
df_roads.count()

This reads in the network data and provides a count of features: 50,504 segments in the dataset.

The dataset is rich in attributes, but doesn’t include details on the from- and to-nodes for each segment, and we need those for Snap Tracks.

Street segment attributes

In addition to looking at the attributes, we can also quickly plot the data to get a sense of the extent and distribution:

Plot of the streets in the dataset

Create nodes for each of the segments

To identify the ID for the unique nodes in the street network, we need to pull out the first and last coordinate in each linestring geometry. This can be done using the ST_StartPoint and ST_EndPoint functions.

You can do this by creating two separate data frames (one with start points, and one with end points). Then you will union them together to create a single data frame with all of the points:

start_points = df_roads.select("SEGMENT_ID ", ST.start_point("Shape").alias("NODE"))
end_points = df_roads.select("SEGMENT_ID ", ST.end_point("Shape").alias("NODE"))

# union together
all_points = start_points.union(end_points)

Or, you can create a single data frame with both the start and end points at the same time:

# make a dataframe with all start (to) and end (from) nodes
all_points = df_roads.select("SEGMENT_ID", 
F.explode(F.array(ST.start_point("Shape"), ST.end_point("Shape"))).alias("NODE"))

Our new data frame only has two fields: SEGMENT_ID and NODE. It also has a lot of duplicates. Here is a portion of the data frame with five nodes in the same location:

Coordinates for overlapping node locations

If we look at a plot, we can see five line segments all meeting at the same location:

Five overlapping nodes at the end of line segments

In the end, however, we only want one copy of each unique node so that we have a single value to reference on each segment for the from- and to-nodes. In the image above, instead of five red points stacked on top of one another, we simply need one node representing that location for every line segment that it connects.

We can drop the duplicates using dropDuplicates:

# drop duplicate nodes so we have a clean list of unique nodes
no_dupes = all_points.dropDuplicates(["NODE"]).select("SEGMENT_ID", "NODE")
no_dupes.sort("SEGMENT_ID").show(10, False)

This reduces the number of points from over 100,000 to approximately 36,000 unique points.

The last step is to add a unique ID to each of the unique nodes, and this can be done with the monotonically_increasing_id function

# add ID number, drop SEGMENT_ID since we don't need the original OBJECTID
# the SEGMENT_ID was from the input road network and we should either rename or delete so we don't have a conflict when we join the nodes back to the road network
no_dupes = no_dupes.withColumn("ID",F.monotonically_increasing_id()).drop("SEGMENT_ID ")

Resulting in:

Table with duplicate nodes removed

Connect the nodes to their line segments

Now, we need to match up these nodes and their unique IDs with our line segments. We know the location of the first and last node in each segment, but we don't know what ID it belongs to in the list of nodes that we created.

We'll do the following steps to add details for node IDs for from- and to-nodes (FNODE and TNODE) to our original network dataset

Find the FNODE: Join the nodes dataset to our roads dataset where the starting point for a line segment is equal to a node in the nodes dataset (ST.equals(ST.start_point("Shape"), "NODE"))
Find the TNODE: Join the nodes dataset to our roads dataset where the ending point for a line segment is equal to a node in the nodes dataset (ST.equals(ST.end_point("Shape"), "NODE"))

We’ll start by joining in the from-node (the “start point”) of each segment:

roads_fnode = df_roads.join(no_dupes, ST.equals(ST.start_point("Shape"), "NODE"))\
.select("SEGMENT_ID", F.col("NODE").alias("FNODE_geom"), F.col("ID").alias("FNODE"), "Shape")

Then we’ll join in the to-node (the “end point”) of each segment:

# join in the end node (to node) for each segment (which node ID is in the same exact spot as the end ID for any line segment)
roads_f_t_node = roads_fnode.join(no_dupes, ST.equals(ST.end_point("Shape"), "NODE"))\
    .select(F.col("SEGMENT_ID").alias("SEGMENT_ID_Node"), 
            "FNODE_geom", "FNODE",  
            F.col("NODE").alias("TNODE_geom"), 
            F.col("ID").alias("TNODE"))

And, finally, we join the nodes back to the network. We can do this using the SEGMENT_ID:

# join from and to node details back to roads via SEGMENT_ID
df_roads_nodes = df_roads.join(roads_f_t_node, F.col("SEGMENT_ID”) == F.col("SEGMENT_ID_node"))

We now have a single data frame with from- and to-nodes to allow us to use the Snap Tracks tool:

Table showing streets with From and To node ids

Test Snap Tracks with our enriched roads dataset

The last step is to confirm that our new dataset works with Snap Tracks. We’ll use a test dataset of GPS points and snap them to our road segments. For the example below, we are using a 10-meter search distance and the new FNODE and TNODE (from- and to-node) fields that we generated for our street network.

# Snap tracks using the trip_id as the track identifier
gps_snap_tracks = SnapTracks() \
            .setTrackFields("trip_id") \
            .setSearchDistance(search_distance=10, search_distance_unit="Meters") \
            .setDistanceMethod(distance_method="Geodesic") \
            .setConnectivityFields(from_node="FNODE", to_node="TNODE") \
            .setAppendFields("STR_NAME", "SEGMENT_ID")\
            .setOutputMode(output_mode="AllPoints") \
            .run(df_gps_all, df_roads_nodes)

The result is that our original GPS points (red) are “snapped” over to align with the road segment (grey), as seen in the image on the right. (Blue lines have been drawn between the original point and the snapped point for reference.)

Nodes snapped to road centerlines

Conclusion

If you work with GPS data collected along routes you probably know that it can be messy as a result of GPS drift. The process of cleaning up that data is improved by tools like Snap Tracks, by matching GPS points to line segments. To do this, you need a network dataset with from- and to-nodes defined. In this post we explored how to add that information to network datasets.

While augmenting a network with from- and to-nodes is necessary for using the Snap Tracks tool, it’s also useful for any analysis based on connectivity within a network. The from- and to-nodes give a linestring directionality, which is important for establishing and using topological relationships.

Hopefully, this post has been helpful in working with your data and enriching it with attributes necessary to fuel your analytics!

We’d love to know how you can use this in your analysis work, or if you have questions about other GeoAnalytics Engine tools and functions! Please feel free to provide feedback or ask questions in the comment section below.