Parquet & Pro

981
10
Jump to solution
01-09-2024 01:56 PM
Trevor_Hart
Occasional Contributor

Hey All,

Ive created a multi-file connection (mfc) to a directory with some parquet files in Pro 3.1.3

The mfc creates fine and I can preview the layer (see below) but when I add the layer to a map I get no geometries and if I open the attribute table I get an unknown error.

Pro recognizes the geometry column fine according to the layer properties.

Ive tried;

- Different projections (original data is not in wgs84)

- Adding an objectid

- Removing the geometry column

- Removing the compression

But I will get the same error and no geometries.

Trevor_Hart_1-1704835641018.png

Trevor_Hart_0-1704835382584.png

 

0 Kudos
1 Solution

Accepted Solutions
Trevor_Hart
Occasional Contributor

@DerekGourley finally solved (kind of), Pro seems to have issues with the geometry column.

I exported the data as Parquet minus the geometry column and that worked fine in Pro.

If I export and include WKB or WKT I get the error/crash behaviour described above.

What I did instead was calculated a field new field as GeoJSON and exported as Parquet (with no geometry column).

Pro recognizes the GeoJSON field and works as expected - MFC is happy, attribute table is happy and draws in the map.

Do you have any guidance on supported geometry columns?

Trevor_Hart_0-1705374840165.png

Pro suggests that WKT and WKB are supported?

Trevor_Hart_1-1705374936634.png

 

View solution in original post

0 Kudos
10 Replies
DanPatterson
MVP Esteemed Contributor

related thread which may be of interest

Solved: Using parquet files in an ArcGIS Pro Big Data Conn... - Esri Community


... sort of retired...
0 Kudos
Trevor_Hart
Occasional Contributor

Thanks Dan. I have seen that post. I don’t have an issue creating the connection to the data. It just doesn’t draw and throws an unknown error when I try to open the attribute table. Pro recognizes the data (fields, geometry type etc) and can preview it fine though. 

0 Kudos
DerekGourley
Esri Contributor

Hi Trevor,

Thanks for the question.

I see that you are using Pro 3.1.3, which doesn’t have support for adding parquet-based Multifile Feature Connection datasets to the map (including opening the attribute table), visualizing them, or using them directly in tools that are not a part of the GeoAnalytics Desktop tools. The following Pro 3.1 documentation has notes on what is supported: https://pro.arcgis.com/en/pro-app/3.1/help/data/big-data-connections/use-big-data-connections.htm#GU...

However, with Pro 3.1 you can use them for your analysis or for visualization by first running a GeoAnalytics Desktop tool such as Copy Dataset From Multifile Feature Connection to save the parquet dataset as a shapefile or to a File Geodatabase, which can then be used to do further analysis or visualization.

The good news is that Pro 3.2 added extended support for parquet-based Multifile Feature Connection datasets so that they can be visualized in maps and used as input for most geoprocessing tools. This is covered as a bullet point in the what’s new in Pro 3.2 documentation: https://pro.arcgis.com/en/pro-app/latest/get-started/whats-new-in-arcgis-pro.htm#GUID-954D7B4B-E653-...


Thanks,
Derek Gourley
GeoAnalytics Product Engineer
0 Kudos
Trevor_Hart
Occasional Contributor

Thanks for the reply @DerekGourley unfortunately Pro 3.2 is no better even fully patched.

The mfc connection doesnt even list the layers...

The connection under 3.1.3, shows the layers

Trevor_Hart_0-1705007273028.png

Pro 3.2 doesnt list anything when you try to expand the connection - notice the arrow is not visible even though I have expanded the connection in Pro

Trevor_Hart_1-1705007331580.png

 

 

0 Kudos
DerekGourley
Esri Contributor

Hi Trevor,

To help troubleshoot this issue, would you be able to share what happens when you try the following:

When you refresh the MFC in the Catalog, does the arrow appear?

If the arrow does not appear, could you please try right-clicking on the MFC in the Catalog and then selecting the “Show in file explorer” option, followed by opening the .mfc file in a text editor. With the .mfc file open in a text editor, does the file contain a list of datasets or is it an empty list ([])?

The image you shared has two MFC files, was one of them re-created using Pro 3.2? I might suggest also trying a MFC file name that doesn’t have a period in it, such as “Test_32” or “Test_3_2”.

If the MFC file was re-created with Pro 3.2, did the messages from the GP tool (or the MFC UI, depending on how it was created) show that datasets were successfully identified? When you originally created the MFC file with Pro 3.1.3, did it show that the datasets were successfully identified?

If the MFC file was not re-created with Pro 3.2, can you try re-creating it and let me know if it lists any datasets that were successfully identified? Would you be able to share any messages or warnings that you see?

 

I also wanted to mention a few other things to try / check just in case:


Thanks,
Derek Gourley
GeoAnalytics Product Engineer
0 Kudos
Trevor_Hart
Occasional Contributor

Hi @DerekGourley 

I believe the folder structure is correct and the files are not encrypted. As mentioned they previewed fine in 3.1.3. The 3.1.3 and 3.2 are different machines. All data is local. The MFC connections were recreated in each version of Pro.

For sanity I downloaded the samples from here;

https://www.tablab.app/datasets/sample/parquet

These work fine in Pro 3.2 except for the Iris file which has periods in the column names. If I remove the periods it works fine in Pro.

Trevor_Hart_1-1705365344537.png

This is what they look like in Pro

Trevor_Hart_0-1705365149828.png

Im going to retry the other files that are not working.

0 Kudos
Trevor_Hart
Occasional Contributor

So if I add one of the Parquet files I am testing to that structure

Trevor_Hart_2-1705366618126.png

And then try and Sync Pro does this

Trevor_Hart_3-1705366641914.png

The MFC has updated though

Trevor_Hart_4-1705366766972.png

And the geometry has been recognized

Trevor_Hart_5-1705366791542.png

If I close and open the Pro project again Pro just crashes, I have submitted an error report.

If I delete the Parquet file but dont sync the MFC then Pro opens the project fine and the folder is shown in the MFC (but there are no files behind it).

Trevor_Hart_7-1705367261777.png

 

Trevor_Hart_6-1705367148779.png

 

 

0 Kudos
Trevor_Hart
Occasional Contributor

@DerekGourley finally solved (kind of), Pro seems to have issues with the geometry column.

I exported the data as Parquet minus the geometry column and that worked fine in Pro.

If I export and include WKB or WKT I get the error/crash behaviour described above.

What I did instead was calculated a field new field as GeoJSON and exported as Parquet (with no geometry column).

Pro recognizes the GeoJSON field and works as expected - MFC is happy, attribute table is happy and draws in the map.

Do you have any guidance on supported geometry columns?

Trevor_Hart_0-1705374840165.png

Pro suggests that WKT and WKB are supported?

Trevor_Hart_1-1705374936634.png

 

0 Kudos
DerekGourley
Esri Contributor

Hi Trevor,

Thanks for following up with all of the additional information and for including the screenshots. I'm going to send you a message with my email to see if it would be possible to share this dataset with us and to see if we can learn more about how the parquet data was generated.


Thanks,
Derek Gourley
GeoAnalytics Product Engineer
0 Kudos