List Broken Data Source's Path

LauraMiles1 · ‎08-07-2013

Hi everyone, I'm a very amateur Python user. I've managed to get the following, which I pieced together from here and there, working for the most part. I need to create a csv file which has a column showing the path to the data source which is broken. Everything works fine and dandy if I take out the dataSource part; when I add it in, it will run for a while and then fail. It seems to be getting tripped up on some mxd or data source it doesn't like perhaps? I have no clue. Here's my code:

import arcpy, os
path = r"H:\Plans\GIS Plans\2003"
f = open('BrokenMXD2003.csv', 'w')
f.write("Type, File Path, Layer, Broken Path" + "\n")
for root, dirs, files in os.walk(path):
    for fileName in files:
        basename, extension = os.path.splitext(fileName)
        if extension == ".mxd":
            fullPath = os.path.join(root, fileName)
            mxd = arcpy.mapping.MapDocument(fullPath)
            brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
            for brknItem in brknMXD:
                lyrList = arcpy.mapping.ListLayers(mxd)
                f.write("MXD, " + fullPath + ", " + brknItem.name)
                if brknItem.supports("dataSource"):
                    f.write(", " + brknItem.dataSource + "\n")
                else:
                    f.write("\n")

f.close()

print "Script Completed"

And here's the error I get:

Traceback (most recent call last):
File "X:\Documents\Working Files\Broken Data Sources\ListBrokenMXD.py", line 11, in <module>
    brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\utils.py", line 181, in fn_
    return fn(*args, **kw)
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\mapping.py", line 1465, in ListBrokenDataSources
    result = mixins.MapDocumentMixin(map_document_or_layer).listBrokenDataSources()
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 832, in listBrokenDataSources
    broken_sources = [l for l in self.layers if not l._arc_object.valid]
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 683, in layers
    for frame in reversed(self.dataFrames):
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 695, in dataFrames
    return map(convertArcObjectToPythonObject, self.pageLayout.dataFrames)
AttributeError: 'NoneType' object has no attribute 'dataFrames'

It does run and create several lines of output, but this error appears when it gets near to the end of the mxd's in the directory. I haven't a clue what the problem could be, as I said I'm quite amateur. If anyone can see what it is, I'd be greatly appreciative. Thank you!

RichardFairhurst · ‎08-08-2013

Perhaps I did not express myself clearly - I do not have SDE. I am dealing with a file gdb. When I export the table to Access then I can easily extract what I want but within Arc and python I am totally lost.

Your screwed. fgdb's support an extremely limited set of subqueries that are under the restrictions I mentioned and cannot do what you can do with personal geodatabases and SDE databases, which support full subquery functionality. Fgdb subqueries can only return one value from an entire table and the table providing the subquery value (MAX, SUM, MEAN, etc) must be external to the table where you are making the selection based on the subquery value. Nothing can make the approach your code is attempting work with an fgdb.

TerryHiggins1 · ‎08-08-2013

Thanks for the response - obviously that was not what I wanted to here.

JamesCrandall · ‎08-08-2013

I saw your post earlier and was attempting some possible approaches. I thought about using NumPy to arrive at the max() statistic you are interested in, and it looked promising.

arcpy.da.FeatureClassToNumPyArray

However, it doesn't allow for Date fields! arrggghhh!

Sorry.
j

RichardFairhurst · ‎08-08-2013

Thanks for the response - obviously that was not what I wanted to here.

I know. I have posted an Idea on the ideas page related to the poor handling and inconsistencies of date related summaries and operations in ArcMap here and requested that they improve the ability to get such information. Please vote for it and offer any other comments or suggestions on that idea.

JamesCrandall · ‎08-08-2013

Are you opposed or comfortable using a different library? You can EASILY acheive what you want with Pandas:


import arcpy
from pandas import *

ws = "H:\Documents\ArcGIS\Default.gdb"
arcpy.env.workspace = ws
fc = "MyFC"
fields = ["_currentvalue", "_pastvalue", "_curDate"]
array = []

for field in fields:
    with arcpy.da.SearchCursor(fc, fields) as cursor:
        for row in cursor:
            array.append(row)

df = DataFrame(array, columns = ["_currentvalue", "_pastvalue", "_curDate"])

maxDate = df._curDate.max()

print maxDate  #<--- this gives me the correct max date

I have found many other uses for the pandas library too -- very powerful and a welcomed addition to my codebase!

Good luck,
james

RichardFairhurst · ‎08-08-2013

Here is another post discussing the limitations of subqueries when using fgdb's. The Pandas approach is probably your best option within Python as a work around. I personally want better support in the basic geoprocessing tools for finding Min and Max dates.

MattSayler · ‎08-08-2013

How about using a dictionary

import arcpy
from arcpy import env
env.workspace=r"Z:\GIS_Data\gdb\GIS_Test.gdb"

fc="COA_FH_Inspections"
cursor = arcpy.SearchCursor(fc) #Update cursor is for updating the data in the fc's row, search cursor should suffice in this situation
idlist = []

for row in cursor:
    idlist.append(row.getValue('HydrID'))

print idlist

idunique = set(idlist)

###MY ADDITIONAL CODE###

#create dictionary with structure {<hydrantID>:(<objectid>,<Date>),...}
maxDates = {}
for item in idunique:
    maxDates[item] = (0,"1/1/1900")

features = arcpy.SearchCursor(fc)

for feature in features:
    hydrant = feature.getValue('HydrID')
    testDate = feature.getValue('TestDate')
    objectID = feature.getValue('objectid')
    
    if testDate > maxDates[hydrant][1]:
        maxDates[hydrant] = (objectID, testDate) #if the date is greater, replace the (<objectid>,<date>) tuple with new values

#maxDates can now be used to collect the features
#for illustration purposes:
keepers = []
for key in maxDates:
    hydrant = key
    objectID = key[0]
    print "Hydrant {0} with ObjectID {1} has the most recent inspection date".format(hydrant,objectID)
    keepers.append(objectID)

#Use keepers to identify the rows in the feature class and write them out to a new feature class

Almost certainly will need additional tweaks, but that's the concept.

ChrisSnyder · ‎08-08-2013

This ***untested*** code should give you a list of the unique Ids (I assume you want to use the OBJECTID field), that represent the most recent test date for each occurance of HydrId. You could then load this list of OIDs (latestTestOidList in the code below) into a SQL query.

hydrantDict = {}
searchRows = arcpy.da.SearchCursor(hydrantTable, ["OID@","TestDate","HydrID"])
for searchRow in searchRows:
   oidValue, testDate, hydrantId = searchRow
   if hydrantId not in hydrantDict:
      hydrantDict[hydrantId] = [(testDate, oidValue)]
   else:
      hydrantDict[hydrantId].append((testDate, oidValue))
del searchRow, searchRows
latestTestOidList = [sorted(dict[hydrantId], reverse=True)[0][1] for hydrantId in hydrantDict]

DavidSousa · ‎08-08-2013

Your screwed. fgdb's support an extremely limited set of subqueries that are under the restrictions I mentioned and cannot do what you can do with personal geodatabases and SDE databases, which support full subquery functionality. Fgdb subqueries can only return one value from an entire table and the table providing the subquery value (MAX, SUM, MEAN, etc) must be external to the table where you are making the selection based on the subquery value. Nothing can make the approach your code is attempting work with an fgdb.

This is not correct. A scalar sub-query in FileGDB can reference the same table. I do this all the time, and it was written specifically for this purpose. For example, if my table is ROADS, I can execute the following sub-query in a WHERE clause:

OID = (SELECT MAX(OID) FROM ROADS)

Could you provide some specific examples of the things you have tried that do not work as you would like?

RichardFairhurst · ‎08-08-2013

This is not correct. A scalar sub-query in FileGDB can reference the same table. I do this all the time, and it was written specifically for this purpose. For example, if my table is ROADS, I can execute the following sub-query in a WHERE clause:

OID = (SELECT MAX(OID) FROM ROADS)

Could you provide some specific examples of the things you have tried that do not work as you would like?

David:

Thanks for correcting me and weighing in on this subject. Apparently my problems in attempting subqueries are either than my information is out of date (derived from tests mostly done at 9.3) or that this is related to the poor documentation that confuses syntax from many databases in the SQL Reference, which is what I have always tried to follow. In any case, I can confirm that after following David's examples he provided me in an e-mail that an fgdb can make subquery selections as David has said. Also fgdb subqueries can return Max(Date) values.

The subquery can operate within a selection involving a single table and they can select based on Max date.

So in reality it appears that the problem is that you cannot use "lyr" as the table name in the subquery. You must use the underlying Feature Class name. So for example, if my layer is named CENTERLINE but the underlying feature class of the layer is named CENTERLINE_EXTENDED, this subquery will not work:

"MODIFIED_DATE" = (SELECT MAX("MODIFIED_DATE") FROM CENTERLINE WHERE "STNAME" = 'HIGH VISTA DR')

but it will work if I put the actual feature class name. So this does work:

"MODIFIED_DATE" = (SELECT MAX("MODIFIED_DATE") FROM CENTERLINE_EXTENDED WHERE "STNAME" = 'HIGH VISTA DR')

So you do not need two queries, just one for your problem. However, you need to use the underlying name of the fc variable. So something like this should replace the two queries in the original post (untested and not sure if I have to access a property of the fc to get its name):

arcpy.SelectLayerByAttribute_management("lyr", "subset_Selection",'"HydrID" = ' + i + ' And "TestDate" = (Select Max("TestDate") from ' + fc + ' WHERE "HydrID" = ' + i)