Building a Data Driven Organization, Part #5: Model Relationships As Graphs

2072
0
08-24-2021 02:27 PM
BruceHarold
Esri Regular Contributor
1 0 2,072
Spoiler
This blog describes Pro 2.9 (unreleased at writing) functionality - ask your Esri representative about ArcGIS Knowledge!

Some things in information technology seem to be perennial examples of delivering data, not information, and if you're going to be data-driven it's information that you want.  My case in point here is visualizing and analyzing relationships amongst georelational datasets, a key space ArcGIS inhabits and something I have always struggled with beyond the basics.  You can apply joins and relates to map layers but these seem to run out of steam pretty fast in terms of power and usability, like how to visualize cardinality and how to do performant queries.  Plus, if your datasets are from different sources it gets even harder.  I have taken detours into coded approaches but learned they don't scale.

ArcGIS Data Interoperability and ArcGIS Knowledge to the rescue!

How come that pairing?  Well Data Interoperability at Pro 2.8+ includes the Tech Preview version of the Esri Knowledge graph database reader/writer and solves the 'all-source' problem for building and maintaining Knowledge graph databases.  Not only is the reader/writer flexible, it is also fast.  At writing, Knowledge is still under construction and I'm using alpha Pro 2.9 software, but the topic is such a fit for being data-driven I couldn't resist.

Here are some graphics from my ETL work to populate a graph, the workspaces are in the post download.  The graph I'm building is property data, the nodes are centroids of cadastral titles plus other entities for owners and encumbrances (usually leases and mortgages), with relationships like 'owns' and 'encumbers'.

Loading Entities (Nodes)Loading Entities (Nodes)

Loading RelationshipsLoading Relationships

Source Property NodesSource Property Nodes

There are over 10 million entities and over 10 million relationships in the graph.  I simplified my data model a little to ignore some legal details (there is a thing called an estate which allows more complex relationships between titles and owners) to give me with a graph where property title points (the blue dots) have one or more ownership shares over them and ownership shares have zero or more encumbrances.  Title points are obviously spatial, owners and encumbrances are tabular.  Here are some feature counts:

Data ModelData Model

 

You'll see the data load was in two parts, first entities then relationships.  This is because entity relationships are made using automatically generated GlobalID fields, so entities have to be created first, you'll get the idea from the workspaces.  Entity GlobalIDs become relationship origin and destination GlobalIDs.

Graphs live in an Enterprise Portal, I'm using Enterprise 10.9.1/Pro 2.9 as my portal and client.

There are innumerable queries you might make of your graph, this is facilitated interactively using either a thing called a Link Chart or using the Cypher query language.

First a simple link chart.  My data isn't really the type where connections will be freshly discovered interactively via link chart exploration, all relationships are already known, but you can select and add entities to a link chart to investigate your data.  This is my first foray into Knowledge so I'm keeping it simple.  Here is who owns some titles somewhere:

Basic Link ChartBasic Link Chart

 

I didn't use the interactive tools to build the chart entities, I used a Cypher query:

match (ee:Encumbrancee {name:'Her Majesty The Queen'})-[oe:owns_encumbrance]-(e:Encumbrance)-[he:has_encumbrance]-(t:Title {land_district:'Otago'}) return ee,oe,e,he,t limit 5

This found 5 titles in a specific land district encumbered by a single encumbrancee.  I'll leave link charts behind at this point, but they come with tools to populate them and are a great way to explore connections.

There are bigger patterns to discover!  For example where are titles encumbered a lot?

Unencumbered (green) and Encumbered (red) titlesUnencumbered (green) and Encumbered (red) titles

If I was doing this for real I might join demographic variables to my title points before loading them into my graph, which would let me analyze population segments.

There are interesting things to learn without studying demographics.  An advantage of graph databases is they are fast to query for aggregate statistics compared to equivalent SQL statements, for example lets look at the distribution of encumbrancee (financial institution or lessor) market share.

Encumbrance Holdings By InstitutionEncumbrance Holdings By Institution

This result dropped out of my graph in a few seconds using the query you can see in the control:

match (e:Encumbrance) where e.name is not null return e.name, count(*) as book order by book desc

The encumbrance holder data has a long tail, lets say we are interested in the big, commercial lenders who I'll say have 10,000 or more encumbrances and are companies.

Big LendersBig Lenders

That summary is over the entire dataset, you'll notice three institutions are neck and neck in the market, then market share drops off quickly and there are 12 who make my cut.  Is there anything different about my study area?  I made a query to find out:

match (e:Encumbrance)
with e.name as lender , count(*) as book where book > 10000 and lender contains 'Limited'
with collect(lender) as biglenders
match (t:Title {land_district:'Otago'})-[:has_encumbrance]-(e:Encumbrance) where e.name in biglenders
return t, e

Here is the map and a chart:

The Lending LandscapeThe Lending Landscape

 

I might be able to make a case for hot spots where some institutions are doing better than others but its the chart that is interesting, the top four institutions' holdings are not following the national distribution.

We can look into encumbered holdings in a study area:

match (o:Owner)-[:has_owner]-(t:Title {land_district:'Otago'})-[:has_encumbrance]-(e:Encumbrance) where e.name contains 'Limited'
with o.prime_other_names + ' ' + o.prime_surname as owner, o.corporate_name as company, count(*) as holdings
return owner, company,holdings order by holdings desc

 

HoldingsHoldings

 

Or a related query, which non-farm titles are encumbered by the big lenders?

match (e:Encumbrance)
with e.name as lender , count(*) as book where book > 10000 and lender contains 'Limited'
with collect(lender) as biglenders
match (t:Title {land_district:'Otago'})-[:has_encumbrance]-(e:Encumbrance)
where e.name in biglenders
with t,e
match (o:Owner)-[:has_owner]-(t)-[:has_encumbrance]-(e)
where not (o.corporate_name contains 'Farm' or o.corporate_name contains 'Pasture')
return o,t,e

I hasten to add this still caught a lot of farms as the data model doesn't really support land-use classification queries but of course if I have land-use areas I could do an overlay of title points beforehand and do it for real.

 

Non-farm Encumbrances by BankNon-farm Encumbrances by Bank

So while at writing I'm ahead of the required software release I hope this gives you an idea of the art of the possible with ArcGIS Knowledge to make querying complex relationships in big data simple and fast.  This was my first foray into Knowledge and I learned a lot, including the basics of the OpenCypher graph query language you see above.  As I say in the spoiler alert, reach out to your Esri representative about release plans (Pro 2.9) and if you're really keen the Early Adopter Community at Pro 2.8.