Populating dataframe with item search results - values missing

489
7
Jump to solution
02-15-2024 09:46 AM
JillianStanford
Occasional Contributor III

I am stuck on a weird issue.

I am searching for some items, adding them to a pandas dataframe and saving them to a csv.

This is all working except that the size value is only populated for the first item.

JillianStanford_1-1708018480454.png

I can inspect the size property for each item and see that it has a value. In fact, once I access the size property, it shows up in the dataframe.

JillianStanford_2-1708018673704.png

Why does this happen and what is the workaround?

Thanks!

0 Kudos
1 Solution

Accepted Solutions
EarlMedina
Esri Regular Contributor

I'm not sure why this doesn't work for you? I can reproduce your original issue and the fix works on my end.

You can also try

df = pd.DataFrame([{"id": item.id, "size": item.size, "type": item.type} for item in items])

 

This also works for me.

My version of the python api is 2.3.0; the version of pandas is 2.0.2.

 

View solution in original post

7 Replies
EarlMedina
Esri Regular Contributor

items is a list of objects - sometimes this will work, sometimes not. The dictionary representation of the data is more reliable.

There are many ways to fix this, but the most straightforward is probably to do:

df = pd.DataFrame([vars(item) for item in items], columns=["id", "size", "type"])

 vars() is  just a handy function that returns the __dict__ attribute on each item object.

0 Kudos
JillianStanford
Occasional Contributor III

Hi,

Thank you so much for your reply.

When I use this method of populating the dataframe, I don't get the size populated at all, not even in the first row.

I will look more at accessing the dictionary, instead of the object.

Thanks!

JillianStanford_0-1708030083316.png

 

0 Kudos
EarlMedina
Esri Regular Contributor

I'm not sure why this doesn't work for you? I can reproduce your original issue and the fix works on my end.

You can also try

df = pd.DataFrame([{"id": item.id, "size": item.size, "type": item.type} for item in items])

 

This also works for me.

My version of the python api is 2.3.0; the version of pandas is 2.0.2.

 

JillianStanford
Occasional Contributor III

That syntax works!

Thank you, really appreciate the help.

JillianStanford_0-1708037468963.png

 

0 Kudos
JillianStanford
Occasional Contributor III

@EarlMedina- A follow up question...

I integrated your syntax for creating the dataframe from query results into my script and it kind of works but accessing the size property takes forever.

In this example, I'm querying ArcGIS Online content and dumping it into a dataframe. If I leave out size, it takes 3 seconds. If I include size, it takes 24 minutes.

JillianStanford_0-1708464064562.png

I ran this against an ArcGIS Enterprise org and I finally had to kill the script after 4 hours. It never finished and never generated an error.

Any ideas for a workaround? Thanks!

 

 

0 Kudos
EarlMedina
Esri Regular Contributor

Hi @JillianStanford , 

It looks like that property isn't there by default. You can try hydrating each item object ahead of time as this seems to set the required size property. So, you would do something like this first:

for item in items:
    item._hydrate()

 

 

0 Kudos
JillianStanford
Occasional Contributor III

Hi

Calling the _hydrate method didn't seem to do the trick. The code block still took 24 minutes.

I inspected the network traffic and I didn't realize that certain properties weren't returned by a search and that a subsequent request was required to retrieve them. It makes sense to me now why it takes so long and I can't think of a workaround. Even if I was accessing the REST endpoint directly, there is an "exclude" fields parameter but not an "include" fields parameter.

Thanks for your help!

Jill

0 Kudos