How to Introspect a Describe "Object"

5958
11
Jump to solution
06-22-2017 12:41 PM
ThomasLaxson
New Contributor III

Though the documentation refers to a "Describe object", it must be some sort of C class (apparently via the arcgisscripting.create function), instead of a Python class. The standard Python introspection methods don't turn up much information (see below).

If one wishes to summarize all the known info of a dataset, how might one dynamically identify which properties apply to the current instance--without writing a long series of if statements and/or going through all the potential properties?

For reference: given a variable named "desc" returned from a call to arcpy.Describe, here are the results of some inspection and introspection operations:

>>> type(desc)
<type 'geoprocessing describe data object'>

>>> dir(desc)
[]

>>> desc.__name__
'Describe Object'

>>> desc.__class__
Traceback (most recent call last):  
 File "<interactive input>", line 1, in <module>
AttributeError: DescribeData: Method __class__ does not exist

>>> desc.__repr__
Traceback (most recent call last):  
 File "<interactive input>", line 1, in <module>
AttributeError: DescribeData: Method __repr__ does not exist

>>> isinstance(desc, object)
True

>>> desc.__dict__
Traceback (most recent call last):  
 File "<interactive input>", line 1, in <module>
AttributeError: DescribeData: Method __dict__ does not exist

>>> help(desc)
Help on geoprocessing describe data object object:
Describe Object = class geoprocessing describe data object(object)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

>>> print(inspect.getmodule(desc))
None

>>> inspect.getclasstree(desc)
Traceback (most recent call last):  
 File "<interactive input>", line 1, in <module>  
 File "C:\Python27\ArcGISx6410.4\Lib\inspect.py", line 726, in getclasstree    
    for c in classes:
TypeError: 'geoprocessing describe data object' object is not iterable

>>> inspect.getmro(desc)
Traceback (most recent call last):  
 File "<interactive input>", line 1, in <module>  
 File "C:\Python27\ArcGISx6410.4\Lib\inspect.py", line 346, in getmro
    _searchbases(cls, result)
 File "C:\Python27\ArcGISx6410.4\Lib\inspect.py", line 337, in _searchbases
    for base in cls.__bases__:
AttributeError: DescribeData: Method __bases__ does not exist‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
1 Solution

Accepted Solutions
JoshuaBixby
MVP Esteemed Contributor

I think arcgisscripting was introduced with ArcGIS 9.0 and describe() has been around since the very beginning.  In my mind, arcgisscripting wasn't so much a Python package as much as a Python wrapper for a COM-based package, i.e., arcgisscripting wasn't very Pythonic.  The Describe Object has to be one of the least Pythonic objects in the entire ArcPy package, one of the reasons being what you have pointed out about lack of inspection and introspection.

This raises the question of, "why hasn't Esri made the Describe Object more Pythonic with any of the 9+ major releases since ArcGIS 9.0?"  Unfortunately, I don't have an answer, not even a bad one.  Part of it might have to do with "why fix what isn't broken" and another factor might be the nested hierarchy/inheritance of how the Describe Object works under the hood, but I am just speculating.

Although inspection and introspection of the Describe Object itself is non-existent, the documentation does lay it all out so it isn't a complete guessing game as to which objects support which property.  The following code scrapes the Esri documentation to extract all of the property types and properties from the documentation:

>>> import bs4
>>> import urllib2
>>> from collections import Iterable
>>> 
>>> site = "http://desktop.arcgis.com"
>>> path = "/en/arcmap/latest/analyze/arcpy-functions/describe.htm"
>>> desc_props = []
>>> 
>>> f = urllib2.urlopen(site + path)
>>> soup = bs4.BeautifulSoup(f.read())
>>> seealso = soup.find(class_="seealso bulleted")
>>> seealso_paths = [
...     (a.find(text=True), a['href']) 
...     for a in seealso("a", href=True)
... ]
... 
>>> 
>>> for type, path in seealso_paths:
...     f = urllib2.urlopen(site + path)
...     soup = bs4.BeautifulSoup(f.read())
...     proptbl = soup.find(class_="arcpyclass_proptbl")
...     if proptbl: 
...         proptbl.thead.extract()
...         desc_props += (
...             type, tuple(
...                 row.td.find(text=True)
...                 for row in proptbl("tr", recursive=False)
...             )
...         ),
... 
>>> # Number of property categories/types
>>> print(len(desc_props))
31
>>> # Total number of properties
>>> print(sum(1 for type, props in desc_props for prop in props))
248
>>> ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

As you can see, there are 31 property categories or types and 248 properties, although no one object supports/has all 248 properties.  As of ArcGIS 10.5, the categories and types from the code above are:

Describe Object Properties:
    baseName
    catalogPath
    children
    childrenExpanded
    dataElementType
    dataType
    extension
    file
    fullPropsRetrieved
    metadataRetrieved
    name
    path

ArcInfo Workstation Item:
    alternateName
    isIndexed
    isPseudo
    isRedefined
    itemType
    numberDecimals
    outputWidth
    startPosition
    width

ArcInfo Workstation Table:
    itemSet

CAD Drawing Dataset Properties:
    is2D
    is3D
    isAutoCAD
    isDGN

Cadastral Fabric Properties:
    bufferDistanceForAdjustment
    compiledAccuracyCategory
    defaultAccuracyCategory
    maximumShiftThreshold
    multiGenerationEditing
    multiLevelReconcile
    pinAdjustmentBoundary
   pinAdjustmentPointsWithinBoundary
    surrogateVersion
    type
    version
    writeAdjustmentVectors

Coverage FeatureClass Properties:
    featureClassType
    hasFAT
    topology

Coverage Properties:
    tolerances

Dataset Properties:
    canVersion
    changeTracked
    datasetType
    DSID
    extent
    isArchived
    isVersioned
    MExtent
    spatialReference
    ZExtent

Editor Tracking Properties:
    editorTrackingEnabled
    creatorFieldName
    createdAtFieldName
    editorFieldName
    editedAtFieldName
    isTimeInUTC

FeatureClass Properties:
    featureType
    hasM
    hasZ
    hasSpatialIndex
    shapeFieldName
    shapeType

GDB FeatureClass Properties:
    areaFieldName
    geometryStorage
    lengthFieldName
    representations

GDB Table Properties:
    aliasName
    defaultSubtypeCode
    extensionProperties
    globalIDFieldName
    hasGlobalID
    modelName
    rasterFieldName
    relationshipClassNames
    subtypeFieldName
    versionedView

Geometric Network Properties:
    featureClassNames
    networkType
    orphanJunctionFeatureClassName

LAS Dataset Properties:
    constraintCount
    fileCount
    hasStatistics
    needsUpdateStatistics
    pointCount
    usesRelativePath

Layer Properties:
    dataElement
    featureClass
    FIDSet
    fieldInfo
    layer
    nameString
    table
    whereClause

Mosaic Dataset Properties:
    allowedCompressionMethods
    allowedFields
    allowedMensurationCapabilities
    allowedMosaicMethods
    applyColorCorrection
    blendWidth
    blendWidthUnits
    cellSizeToleranceFactor
    childrenNames
    clipToBoundary
    clipToFootprint
    defaultCompressionMethod
    defaultMensurationCapability
    defaultMosaicMethod
    defaultProcessingTemplate
    defaultResamplingMethod
    dimensionAttributes
    dimensionNames
    dimensionValues
    endTimeField
    footprintMayContainNoData
    GCSTransforms
    isMultidimensional
    JPEGQuality
    LERCTolerance
    maxDownloadImageCount
    maxDownloadSizeLimit
    maxRastersPerMosaic
    maxRecordsReturned
    maxRequestSizeX
    maxRequestSizeY
    minimumPixelContribution
    mosaicOperator
    multidimensionalInfo
    orderBaseValue
    orderField
    processingTemplates
    rasterMetadataLevel
    referenced
    sortAscending
    startTimeField
    timeValueFormat
    useTime
    variableAttributes
    variableNames
    viewpointSpacingX
    viewpointSpacingY

Network Analyst:
    network
    nameString
    solverName
    impedance
    accumulators
    restrictions
    ignoreInvalidLocations
    uTurns
    useHierarchy
    hierarchyAttribute
    hierarchyLevelCount
    maxValueForHierarchyX
    locatorCount
    locators
    findClosest
    searchTolerance
    excludeRestrictedElements
    solverProperties
    children
    parameterCount
    parameters

Network Dataset Properties:
    attributes
    catalogPath
    defaultTravelModeName
    directions
    edgeSources
    elevationModel
    historicalTrafficData
    isBuildable
    junctionSources
    liveTrafficData
    networkType
    optimizations
    sources
    supportsDirections
    supportsHistoricalTrafficData
    supportsLiveTrafficData
    supportsTurns
    systemJunctionSource
    timeZoneAttributeName
    timeZoneTableName
    trafficSupportType
    turnSources

Prj File Properties:
    spatialReference

Raster Band Properties:
    height
    isInteger
    meanCellHeight
    meanCellWidth
    noDataValue
    pixelType
    primaryField
    tableType
    width

Raster Catalog Properties:
    rasterFieldName

Raster Dataset Properties:
    bandCount
    compressionType
    format
    permanent
    sensorType

RecordSet and FeatureSet Properties:
    json
    pjson

RelationshipClass Properties:
    backwardPathLabel
    cardinality
    classKey
    destinationClassKeys
    destinationClassNames
    forwardPathLabel
    isAttachmentRelationship
    isAttributed
    isComposite
    isReflexive
    keyType
    notification
    originClassNames
    originClassKeys
    relationshipRules

RepresentationClass Properties:
    overrideFieldName
    requireShapeOverride
    ruleIDFieldName

Schematic Diagram Properties:
    diagramClassName

Table Properties:
    hasOID
    OIDFieldName
    fields
    indexes

TableView Properties:
    table
    FIDSet
    fieldInfo
    whereClause
    nameString

Tin Properties:
    fields
    hasEdgeTagValues
    hasNodeTagValues
    hasTriangleTagValues
    isDelaunay
    ZFactor

Topology Properties:
    clusterTolerance
    featureClassNames
    maximumGeneratedErrorCount
    ZClusterTolerance

Workspace Properties:
    connectionProperties
    connectionString
    currentRelease
    domains
    release
    workspaceFactoryProgID
    workspaceType

Instead of trying to work through all of the rules for which properties apply to what kind of object, I have always found just testing all of them is quite fast.  That way, you know exactly which properties apply even if they aren't all documented.

Instead of using a try:except block, just use getattr() with a default value of None.  For a file geodatabase polyline feature class, I get 56 properties that return some kind of value.

View solution in original post

11 Replies
XanderBakker
Esri Esteemed Contributor

I agree that it can be difficult to navigate the properties of the Describe object. The properties are dynamic based on the object it is created from. Hardly recommendable, but you could do something like this:

def main():
    import arcpy
    fc = r'C:\GeoNet\Streets\GeoNet Street Sample.gdb\LebStreetSample'
    desc = arcpy.Describe(fc)

    atts = ['areaFieldName', 'lengthFieldName', 'datasetType', 'shapeFieldName',
            'OIDFieldName', 'meanCellHeight', 'whereClause']
    for att in atts:
        if hasattr(desc, att):
            value = eval('desc.{0}'.format(att))
            if value != '':
                print att, value


if __name__ == '__main__':
    main()
ThomasLaxson
New Contributor III

Yeah, that's what I was afraid of. I'll probably end up doing something similar, but I'll use try/except with getattr, instead of eval.

Maybe I'm missing something, but it seems to me that the factory pattern would make more sense than whatever hidden implementation is going on here. They may very well be using the factory pattern on the back-end C classes (I'm assuming that's what they are). But the combination of inconsistent return types and no introspection makes for grotesquely un-Pythonic code.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I think arcgisscripting was introduced with ArcGIS 9.0 and describe() has been around since the very beginning.  In my mind, arcgisscripting wasn't so much a Python package as much as a Python wrapper for a COM-based package, i.e., arcgisscripting wasn't very Pythonic.  The Describe Object has to be one of the least Pythonic objects in the entire ArcPy package, one of the reasons being what you have pointed out about lack of inspection and introspection.

This raises the question of, "why hasn't Esri made the Describe Object more Pythonic with any of the 9+ major releases since ArcGIS 9.0?"  Unfortunately, I don't have an answer, not even a bad one.  Part of it might have to do with "why fix what isn't broken" and another factor might be the nested hierarchy/inheritance of how the Describe Object works under the hood, but I am just speculating.

Although inspection and introspection of the Describe Object itself is non-existent, the documentation does lay it all out so it isn't a complete guessing game as to which objects support which property.  The following code scrapes the Esri documentation to extract all of the property types and properties from the documentation:

>>> import bs4
>>> import urllib2
>>> from collections import Iterable
>>> 
>>> site = "http://desktop.arcgis.com"
>>> path = "/en/arcmap/latest/analyze/arcpy-functions/describe.htm"
>>> desc_props = []
>>> 
>>> f = urllib2.urlopen(site + path)
>>> soup = bs4.BeautifulSoup(f.read())
>>> seealso = soup.find(class_="seealso bulleted")
>>> seealso_paths = [
...     (a.find(text=True), a['href']) 
...     for a in seealso("a", href=True)
... ]
... 
>>> 
>>> for type, path in seealso_paths:
...     f = urllib2.urlopen(site + path)
...     soup = bs4.BeautifulSoup(f.read())
...     proptbl = soup.find(class_="arcpyclass_proptbl")
...     if proptbl: 
...         proptbl.thead.extract()
...         desc_props += (
...             type, tuple(
...                 row.td.find(text=True)
...                 for row in proptbl("tr", recursive=False)
...             )
...         ),
... 
>>> # Number of property categories/types
>>> print(len(desc_props))
31
>>> # Total number of properties
>>> print(sum(1 for type, props in desc_props for prop in props))
248
>>> ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

As you can see, there are 31 property categories or types and 248 properties, although no one object supports/has all 248 properties.  As of ArcGIS 10.5, the categories and types from the code above are:

Describe Object Properties:
    baseName
    catalogPath
    children
    childrenExpanded
    dataElementType
    dataType
    extension
    file
    fullPropsRetrieved
    metadataRetrieved
    name
    path

ArcInfo Workstation Item:
    alternateName
    isIndexed
    isPseudo
    isRedefined
    itemType
    numberDecimals
    outputWidth
    startPosition
    width

ArcInfo Workstation Table:
    itemSet

CAD Drawing Dataset Properties:
    is2D
    is3D
    isAutoCAD
    isDGN

Cadastral Fabric Properties:
    bufferDistanceForAdjustment
    compiledAccuracyCategory
    defaultAccuracyCategory
    maximumShiftThreshold
    multiGenerationEditing
    multiLevelReconcile
    pinAdjustmentBoundary
   pinAdjustmentPointsWithinBoundary
    surrogateVersion
    type
    version
    writeAdjustmentVectors

Coverage FeatureClass Properties:
    featureClassType
    hasFAT
    topology

Coverage Properties:
    tolerances

Dataset Properties:
    canVersion
    changeTracked
    datasetType
    DSID
    extent
    isArchived
    isVersioned
    MExtent
    spatialReference
    ZExtent

Editor Tracking Properties:
    editorTrackingEnabled
    creatorFieldName
    createdAtFieldName
    editorFieldName
    editedAtFieldName
    isTimeInUTC

FeatureClass Properties:
    featureType
    hasM
    hasZ
    hasSpatialIndex
    shapeFieldName
    shapeType

GDB FeatureClass Properties:
    areaFieldName
    geometryStorage
    lengthFieldName
    representations

GDB Table Properties:
    aliasName
    defaultSubtypeCode
    extensionProperties
    globalIDFieldName
    hasGlobalID
    modelName
    rasterFieldName
    relationshipClassNames
    subtypeFieldName
    versionedView

Geometric Network Properties:
    featureClassNames
    networkType
    orphanJunctionFeatureClassName

LAS Dataset Properties:
    constraintCount
    fileCount
    hasStatistics
    needsUpdateStatistics
    pointCount
    usesRelativePath

Layer Properties:
    dataElement
    featureClass
    FIDSet
    fieldInfo
    layer
    nameString
    table
    whereClause

Mosaic Dataset Properties:
    allowedCompressionMethods
    allowedFields
    allowedMensurationCapabilities
    allowedMosaicMethods
    applyColorCorrection
    blendWidth
    blendWidthUnits
    cellSizeToleranceFactor
    childrenNames
    clipToBoundary
    clipToFootprint
    defaultCompressionMethod
    defaultMensurationCapability
    defaultMosaicMethod
    defaultProcessingTemplate
    defaultResamplingMethod
    dimensionAttributes
    dimensionNames
    dimensionValues
    endTimeField
    footprintMayContainNoData
    GCSTransforms
    isMultidimensional
    JPEGQuality
    LERCTolerance
    maxDownloadImageCount
    maxDownloadSizeLimit
    maxRastersPerMosaic
    maxRecordsReturned
    maxRequestSizeX
    maxRequestSizeY
    minimumPixelContribution
    mosaicOperator
    multidimensionalInfo
    orderBaseValue
    orderField
    processingTemplates
    rasterMetadataLevel
    referenced
    sortAscending
    startTimeField
    timeValueFormat
    useTime
    variableAttributes
    variableNames
    viewpointSpacingX
    viewpointSpacingY

Network Analyst:
    network
    nameString
    solverName
    impedance
    accumulators
    restrictions
    ignoreInvalidLocations
    uTurns
    useHierarchy
    hierarchyAttribute
    hierarchyLevelCount
    maxValueForHierarchyX
    locatorCount
    locators
    findClosest
    searchTolerance
    excludeRestrictedElements
    solverProperties
    children
    parameterCount
    parameters

Network Dataset Properties:
    attributes
    catalogPath
    defaultTravelModeName
    directions
    edgeSources
    elevationModel
    historicalTrafficData
    isBuildable
    junctionSources
    liveTrafficData
    networkType
    optimizations
    sources
    supportsDirections
    supportsHistoricalTrafficData
    supportsLiveTrafficData
    supportsTurns
    systemJunctionSource
    timeZoneAttributeName
    timeZoneTableName
    trafficSupportType
    turnSources

Prj File Properties:
    spatialReference

Raster Band Properties:
    height
    isInteger
    meanCellHeight
    meanCellWidth
    noDataValue
    pixelType
    primaryField
    tableType
    width

Raster Catalog Properties:
    rasterFieldName

Raster Dataset Properties:
    bandCount
    compressionType
    format
    permanent
    sensorType

RecordSet and FeatureSet Properties:
    json
    pjson

RelationshipClass Properties:
    backwardPathLabel
    cardinality
    classKey
    destinationClassKeys
    destinationClassNames
    forwardPathLabel
    isAttachmentRelationship
    isAttributed
    isComposite
    isReflexive
    keyType
    notification
    originClassNames
    originClassKeys
    relationshipRules

RepresentationClass Properties:
    overrideFieldName
    requireShapeOverride
    ruleIDFieldName

Schematic Diagram Properties:
    diagramClassName

Table Properties:
    hasOID
    OIDFieldName
    fields
    indexes

TableView Properties:
    table
    FIDSet
    fieldInfo
    whereClause
    nameString

Tin Properties:
    fields
    hasEdgeTagValues
    hasNodeTagValues
    hasTriangleTagValues
    isDelaunay
    ZFactor

Topology Properties:
    clusterTolerance
    featureClassNames
    maximumGeneratedErrorCount
    ZClusterTolerance

Workspace Properties:
    connectionProperties
    connectionString
    currentRelease
    domains
    release
    workspaceFactoryProgID
    workspaceType

Instead of trying to work through all of the rules for which properties apply to what kind of object, I have always found just testing all of them is quite fast.  That way, you know exactly which properties apply even if they aren't all documented.

Instead of using a try:except block, just use getattr() with a default value of None.  For a file geodatabase polyline feature class, I get 56 properties that return some kind of value.

JoshuaBixby
MVP Esteemed Contributor

ArcGIS Pro 2.0 includes a new Describe method in the ArcPy Data Access module that returns all describe properties in a Python dictionary.

What's new in ArcGIS Pro 2.0—ArcGIS Pro | ArcGIS Desktop

Python

  • A new arcpy.da.Describe function was added for describing data. It is similar to the arcpy.Describe function but returns its information as a Python dictionary.

I haven't heard whether this will be back-ported to ArcMap.

StefanOffermann
Occasional Contributor II

Is it possible that the the new is slower than the old one? May be because the new one is based on dictionaries which are filled right away, and the old one "lazy loads" property data?

0 Kudos
DanPatterson_Retired
MVP Emeritus

I have noticed no difference in speed or issues when working with locally stored data and Pro. 

0 Kudos
StefanOffermann
Occasional Contributor II

Is not that I have to wait a long time, but check the execution time of both lines when executed separately in ArcGIS Pro 2.0 python window:

arcpy.Describe('data')
arcpy.da.Describe('data')

Which one is a bit faster? 🙂 The second one has the "running progress dots", while the first one returns the describe object at once.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I could see it going either way.  On one hand, enumerating all the properties and populating a dictionary will take more time; but on the other hand, maybe the internal code within the Data Access module is faster.  The DA cursors are much faster than the old/original cursors because the back-end code was optimized.

It is hard to say without some testing, fortunately we can test:

>>> import arcpy
>>> import timeit
>>>
>>> fc = r'D:\transfer\geodata\Default.gdb\NHDWaterBody'
>>> timeit.timeit(lambda: arcpy.Describe(fc), number=1000)
72.3306900028995
>>> timeit.timeit(lambda: arcpy.da.Describe(fc), number=1000)
205.41314557604585
>>> 

So, it turns out there is a cost to enumerating all of the properties and populating the dictionary.  Since ArcPy Describe lazily evaluates properties, the test above isn't quite apples to apples, and I suspect the results will get closer the more properties you access.

Does 0.07 vs 0.21 seconds make a difference in your code when instantiating the a Describe object?  I could see some situations where it would, but I think in most cases the impact is negligible and the added benefits of having the dictionary populated with all the properties far outweighs any performance difference.

DanPatterson_Retired
MVP Emeritus

like I said... not noticeable speed difference... unless people sip coffee faster than I do

Plus the dictionary is easier to work with

d = arcpy.da.Describe(in_fc2)

d.keys()
Out[20]: dict_keys(['datasetType', 'children', 'hasM', 'FIDSet', 'extent', 
'metadataRetrieved', 'name', 'hasGlobalID', 'dataElementType', 'isVersioned', 
'representations', 'catalogPath', 'modelName', 'isCOGOEnabled', 'editorFieldName', 
'areaFieldName', 'createdAtFieldName', 'changeTracked', 'extensionProperties', 
'ZExtent', 'featureType', 'fields', 'fullPropsRetrieved', 'OIDFieldName', 'file', 
'creatorFieldName', 'versionedView', 'indexes', 'childrenExpanded', 'rasterFieldName', 
'canVersion', 'geometryStorage', 'relationshipClassNames', 'lengthFieldName', 
'defaultSubtypeCode', 'hasZ', 'shapeFieldName', 'shapeType', 'aliasName', 'dataType', 
'baseName', 'DSID', 'globalIDFieldName', 'extension', 'hasOID', 'MExtent', 'path', 
'isTimeInUTC', 'spatialReference', 'editorTrackingEnabled', 'editedAtFieldName', 
'subtypeFieldName', 'hasSpatialIndex'])‍‍‍‍‍‍‍‍‍‍‍‍‍‍

And timing results are variable, depending on what you are timing and how 

import arcpy

%timeit(arcpy.Describe(in_fc2))
97.6 ms ± 4.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit(arcpy.da.Describe(in_fc2))
249 ms ± 3.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)