Impossible to use dplyr join functions on data frames?

1466
2
Jump to solution
06-13-2017 11:10 AM
IanDavies
New Contributor III

I've just begun to use the R-ArcGIS bridge package arcgisbinding and am running into a problem when I try to join feature class data with the dplyr package. When I join them, the shape attributes in the data frame are dropped and I can only export it as a table, not as a feature class or shapefile.

Here is some toy reproducible code. Below, I'm trying to get the ozone measurement columns from two shapefiles into a single data frame, then export the data frame as a shapefile.

library(dplyr)
library(arcgisbinding)
arc.check_product()

fc <- arc.open(system.file("extdata", "ca_ozone_pts.shp", package="arcgisbinding"))
d <- arc.select(fc, fields=c('FID', 'ozone'))
p<-arc.select(fc,fields=c('FID', 'ozone'))
p$ozone<-p$ozone*2
p<-left_join(p,d,by="FID")
arc.write(tempfile("ca_new", fileext=".shp"), p)
# original dataframe has shape attributes
str(d)
# new dataframe does not
str(p)

From the arcgisbinding package, p and d above are data frame objects with shape attributes. The problem is that once I use left_join, I lose the spatial attribute data in the joined data frame. Is there a way around this?

0 Kudos
1 Solution

Accepted Solutions
ShaunWalbridge
Esri Regular Contributor

Hello Ian,

Thanks for a detailed example of what you're trying to do, very helpful. From what I understand, dplyr expects data frames that are very close to the base R representation. This affects other rich representations, like sp objects. I don't know of an immediate solution to bridge this discrepancy, but fortunately there's another way. Michael Sumner has created the spdplyr package, which lets you use some of the functionality of dplyr on sp objects. Here's your script instead using the sp representations:

library(spdplyr)
library(arcgisbinding)
arc.check_product()

fc <- arc.open(system.file("extdata", "ca_ozone_pts.shp", package="arcgisbinding"))
d <- arc.select(fc,fields=c('FID', 'ozone'))
d.sp <- arc.data2sp(d)

p <-arc.select(fc,fields=c('FID', 'ozone'))
p.sp <- arc.data2sp(p)
p.sp$ozone <- p$ozone*2

joined <- left_join(p.sp, d.sp, by="FID", copy=TRUE)
joined.df <- arc.sp2data(joined)

arc.write(tempfile("ca_ozone_pts_joined", fileext=".shp"), joined.df)

Let us know if that'll work for your needs, or you need something different for what you're trying to do. You can do everything with just plain data frames, then later join on FID to bring that back into a single data source, but this is a nicer approach if it'll work for you.

Cheers, Shaun

View solution in original post

2 Replies
ShaunWalbridge
Esri Regular Contributor

Hello Ian,

Thanks for a detailed example of what you're trying to do, very helpful. From what I understand, dplyr expects data frames that are very close to the base R representation. This affects other rich representations, like sp objects. I don't know of an immediate solution to bridge this discrepancy, but fortunately there's another way. Michael Sumner has created the spdplyr package, which lets you use some of the functionality of dplyr on sp objects. Here's your script instead using the sp representations:

library(spdplyr)
library(arcgisbinding)
arc.check_product()

fc <- arc.open(system.file("extdata", "ca_ozone_pts.shp", package="arcgisbinding"))
d <- arc.select(fc,fields=c('FID', 'ozone'))
d.sp <- arc.data2sp(d)

p <-arc.select(fc,fields=c('FID', 'ozone'))
p.sp <- arc.data2sp(p)
p.sp$ozone <- p$ozone*2

joined <- left_join(p.sp, d.sp, by="FID", copy=TRUE)
joined.df <- arc.sp2data(joined)

arc.write(tempfile("ca_ozone_pts_joined", fileext=".shp"), joined.df)

Let us know if that'll work for your needs, or you need something different for what you're trying to do. You can do everything with just plain data frames, then later join on FID to bring that back into a single data source, but this is a nicer approach if it'll work for you.

Cheers, Shaun

IanDavies
New Contributor III

Excellent! And not very hacky. Thanks Shaun.

0 Kudos