Populating dataframe with item search results - values missing

JillianStanford · ‎02-15-2024

I am stuck on a weird issue.

I am searching for some items, adding them to a pandas dataframe and saving them to a csv.

This is all working except that the size value is only populated for the first item.

I can inspect the size property for each item and see that it has a value. In fact, once I access the size property, it shows up in the dataframe.

Why does this happen and what is the workaround?

Thanks!

EarlMedina · ‎02-15-2024

I'm not sure why this doesn't work for you? I can reproduce your original issue and the fix works on my end.

You can also try

df = pd.DataFrame([{"id": item.id, "size": item.size, "type": item.type} for item in items])

This also works for me.

My version of the python api is 2.3.0; the version of pandas is 2.0.2.

View solution in original post

EarlMedina · ‎02-15-2024

items is a list of objects - sometimes this will work, sometimes not. The dictionary representation of the data is more reliable.

There are many ways to fix this, but the most straightforward is probably to do:

df = pd.DataFrame([vars(item) for item in items], columns=["id", "size", "type"])

vars() is just a handy function that returns the __dict__ attribute on each item object.

JillianStanford · ‎02-15-2024

Hi,

Thank you so much for your reply.

When I use this method of populating the dataframe, I don't get the size populated at all, not even in the first row.

I will look more at accessing the dictionary, instead of the object.

Thanks!

EarlMedina · ‎02-15-2024

I'm not sure why this doesn't work for you? I can reproduce your original issue and the fix works on my end.

You can also try

df = pd.DataFrame([{"id": item.id, "size": item.size, "type": item.type} for item in items])

This also works for me.

My version of the python api is 2.3.0; the version of pandas is 2.0.2.

JillianStanford · ‎02-15-2024

That syntax works!

Thank you, really appreciate the help.

JillianStanford · ‎02-20-2024

@EarlMedina- A follow up question...

I integrated your syntax for creating the dataframe from query results into my script and it kind of works but accessing the size property takes forever.

In this example, I'm querying ArcGIS Online content and dumping it into a dataframe. If I leave out size, it takes 3 seconds. If I include size, it takes 24 minutes.

I ran this against an ArcGIS Enterprise org and I finally had to kill the script after 4 hours. It never finished and never generated an error.

Any ideas for a workaround? Thanks!

EarlMedina · ‎02-20-2024

Hi @JillianStanford ,

It looks like that property isn't there by default. You can try hydrating each item object ahead of time as this seems to set the required size property. So, you would do something like this first:

for item in items:
    item._hydrate()

JillianStanford · ‎02-21-2024

Hi

Calling the _hydrate method didn't seem to do the trick. The code block still took 24 minutes.

I inspected the network traffic and I didn't realize that certain properties weren't returned by a search and that a subsequent request was required to retrieve them. It makes sense to me now why it takes so long and I can't think of a workaround. Even if I was accessing the REST endpoint directly, there is an "exclude" fields parameter but not an "include" fields parameter.

Thanks for your help!

Jill