Temporal Join with Geoanalytics Desktop Tool "Join Features" produces wrong results (missing data) when dataset is larger

816
2
Jump to solution
09-19-2022 03:56 PM
Adam1
by
New Contributor II

I am trying to connect a car-data-feature (point, speed, ...) to data of changing digital traffic signs on a motorway (e. g. changing speed limits). The motorway data is in 1-minute intervalls (instant-time field) and the car-data in "random" instant time stamps. I want to connect the "matching times" via "Near after 59 seconds" and a matching cross section field in both data sets.

Now it occured to me that when the car-dataset is larger the results are not correct anymore, even when the "extra data" is not even used in the join-operation. This should be explained with the following pictures (they are also attached if it is impossible to read because of poor quality):

expected correct resultexpected correct resultunwanted wrong resultunwanted wrong result

4c439408-0eb3-4197-bcf9-ba1e235fce60.png

If I am not missing something, I just did exactly the same in both cases, just with different datasets.

I am on Windows 10 and use ArcGIS Pro 3.0.1. Any ideas are greatly appreciated.

Edit: And I might just add that of course this is just a small snippet of all the data which I need to work with. The complete car-dataset has around 3,700,000 rows and the traffic-cross-section dataset has around 650,000 rows. If I join them together, around 700,000 rows are missing afterwards. But none should be missing, the result should have 3,700,000 rows and to every one of those one of the 650,000 rows should be matched.

0 Kudos
1 Solution

Accepted Solutions
MichaelPark
Esri Contributor

Great question. I don't believe this is related to the size of the dataset. The `NearAfter` relationship is exclusive in that it does not include records that have the exact same timestamp. If your layer's time definition is set to use the "Datetime" field, then the results are expected. 

One issue I see is that the "Datetime" field appears to have dropped the seconds and is truncated to minutes. This will give you trouble when using a near duration of 59 seconds. 

To achieve the results you want, I think you'll need to do the following:

  • Fix the datetime field so that it includes seconds
  • Use the temporal "Near" relation with an extra join condition (in advanced options) that removes everything before. Something like "StartTime($join) >= StartTime($target)" should work.

Hope this helps. 

View solution in original post

2 Replies
MichaelPark
Esri Contributor

Great question. I don't believe this is related to the size of the dataset. The `NearAfter` relationship is exclusive in that it does not include records that have the exact same timestamp. If your layer's time definition is set to use the "Datetime" field, then the results are expected. 

One issue I see is that the "Datetime" field appears to have dropped the seconds and is truncated to minutes. This will give you trouble when using a near duration of 59 seconds. 

To achieve the results you want, I think you'll need to do the following:

  • Fix the datetime field so that it includes seconds
  • Use the temporal "Near" relation with an extra join condition (in advanced options) that removes everything before. Something like "StartTime($join) >= StartTime($target)" should work.

Hope this helps. 

Adam1
by
New Contributor II

Thank you for your quick answer!

I changed the parameters to "Near" and your Join condition (but for me with a <= instead of >=) and for the data snippet it worked. But I didn't do anything to the datetime field, as I don't really know how I should fix it so it includes seconds. I thought there was no way for it to not include seconds. Or do you mean changing the number format in the field definitions?

I will now try that out with the whole data and mark your answer as a solution when I am sure that worked as well (it will probably). 

Edit: Also maybe, while we are at it: Do you happen how to be sure to clear all filters? The Join features dialog always shows me that my inputs have a filter, but I did not set a definition query, have no time or visible-extent filters active or have only some records selected and the number that shows for "records to be processed" seems to be the number of all records anyway. 

Edit2: I get it now, it is because if you activate time for the layer it sets a time extent. And Join features interprets that as a filter, even if it includes the whole (calculated) time span and all the data.

(sorry for constantly editing my post)

Thank you again!