Random sample/select polygons by attributes iteratively python advice

2783
5
05-13-2011 11:42 AM
HollyCopeland
New Contributor
Hi all,

I need to write a script to do the following:

Randomly select polygons (parcels in my case) in a feature class using an attibute with parcel value. It needs to keep selecting parcels randomly until it reaches a total max parcel value. So, if each parcel is valued between 1-100, and I have a max value of 500, it needs to select enough parcels to reach the target value of 500, then I'll do some operations/calculations on these selected parcels. Finally, I need to create a loop to do this 100 or so times.

Is what I propose above doable and fairly straightforward. I'm relatively new at Python programming, so any tips about the logic in the code would be so helpful and appreciated.

Thanks,

Holly
Tags (2)
0 Kudos
5 Replies
KimOllivier
Occasional Contributor III
Yes, Python has good standard modules for random selections, permutations and other sampling techniques.

You need to run a cursor first to get all the details of the parcels into a dictionary/list in memory so that you can accumulate random selections until you get to your target value. This will be very fast.

Then create an SQL query with the selected parcel ids to process those parcels with a cursor.
If you don't want to actually update the featureclass, even this would be unnecessary.

The process to get the parcel details can run once outside your loop.
Then when you run the process 100 times, it will make a selection, run the SQL query to make a layer and then process the layer.

The SQL query would be in the form "parcel_id in (23,43,65,876,...)"
Make sure that the parcel_id has an attribute index defined, it is not by default except for the objectid.
0 Kudos
HollyCopeland
New Contributor
Thank you Kim for your advice. I am thankful to hear that this seems doable to you and straightforward. Do you have any example code or quick specific code hints for using a cursor and accumulating random selections from the attribute until I get to the target value? Thank you -- Holly
0 Kudos
KimOllivier
Occasional Contributor III
Ok, I didn't think I would get away with that brief specification. A sample script takes 3 seconds to iterate 100 times. I have shown one case creating a layer of the random selection. You could put this back into a table of contents.

# randomselection.py
# 19 May 2011
# kimo@ollivier.co.nz
#
# Input: parcel polygon featureclass
#        item to sum (area)
#        threshold (check min, mean, max values for reasonable figure)
# Assumes values always less than threshold or you get a null list

# Output: Total sum and layer with selected parcels
# set up to allow iteration
# uses ArcGIS 10 but could use 9.3

import arcpy
import sys
import os
import random
import datetime

try:
    fc = sys.argv[1]
    item = sys.argv[2]
    threshold = sys.argv[3]
except:
    # debugging and standalone operation
    fc = "c:/project/forum/mobile.gdb/parcel"
    item = "area"
    threshold = 100000.0 # square metres

def getdict(fc,item):
    ''' Make a dictionary
    once for the sampler
    '''
    dParcel = {}
    for row in arcpy.SearchCursor(fc,"featcode = 'parcel'"):
        if item.lower() == 'area':
            dParcel[row.objectid] = row.shape.area
        else :
            dParcel[row.objectid] = row.GetValue(item)
    return dParcel

def getsample(dParcel,threshold):
    '''Given a dict of values
    return a random selection
    less than the threshold.
    
    Shuffles the list and pops off
    the ids until threshold reached
    ie sampling without replacement
    '''
    total = 0
    lstObj = []
    lstKey = dParcel.keys()
    random.shuffle(lstKey)
    while total < threshold:
        nObj = lstKey.pop()
        nVal = dParcel[nObj]
        total = nVal
        if total > threshold:
            break # do not exceed threshold
        else :
            lstObj.append(nObj)
            total += nVal
    return (total,lstObj)

# --------------- main ---------------
print
start = datetime.datetime.now()
print start
#
# make a dictionary
dParcel = getdict(fc,item)
print "records",len(dParcel)

# sample the dictionary
total,lstObj = getsample(dParcel,threshold)

print "Total: ",total
print "No of parcels: ",len(lstObj)
print "objectid: value"
for ob in lstObj:
    print ob,":",dParcel[ob],",  ",
print
# make up an sql query to select the features
sqlQuery = "OBJECTID in (" + ",".join([str(id) for id in lstObj]) + ")"
print sqlQuery
arcpy.env.overwriteOutput = True
arcpy.MakeFeatureLayer_management(fc,"fc_lay",sqlQuery)
lyrName = arcpy.CreateUniqueName("xx.lyr",os.path.dirname(os.path.dirname(fc)))
print lyrName
arcpy.SaveToLayerFile_management("fc_lay",lyrName)

# Or sample the dictionary 100 times
for it in range(100):
    total,lstObj = getsample(dParcel,threshold)
    print total,len(lstObj)
    
# finished
elapsed = datetime.datetime.now() - start
print "\n\nElapsed",elapsed, "Well done!" # positive feedback


2011-05-19 20:40:43.640000
records 7170
Total:  145316.242757
No of parcels:  21
objectid: value
726 : 809.726863177 ,   4530 : 809.419546424 ,   4752 : 1012.70775115 ,   6120 : 809.397052955 ,   3022 : 809.418725814 ,   253 : 986.240771943 ,   4445 : 1132.16238039 ,   7161 : 928.75695316 ,   5747 : 964.13873029 ,   2188 : 864.834770195 ,   4164 : 20.9192428737 ,   187 : 839.573627519 ,   2745 : 811.268061104 ,   2572 : 1002.32513709 ,   1445 : 1381.14243957 ,   2674 : 812.661986759 ,   6925 : 25126.7099296 ,   5479 : 1395.56544711 ,   4791 : 810.812432473 ,   5406 : 848.317697436 ,   6543 : 72658.1213786 ,  
OBJECTID in (726,4530,4752,6120,3022,253,4445,7161,5747,2188,4164,187,2745,2572,1445,2674,6925,5479,4791,5406,6543)
c:/project/forum\xx0.lyr
185921.392647 4
145871.629063 27
154201.63183 1
...
140524.256299 0
...
190822.946912 9
193668.483773 23
334292.358346 16
141967.626185 23
15109126.1898 56
...
413536.709991 2
157130.891363 6


Elapsed 0:00:02.917000 Well done!
0 Kudos
HollyCopeland
New Contributor
Thank you, Kim. This is very helpful!!! I'll post my final code too when I get it finished. Cheers, Holly
0 Kudos
HollyCopeland
New Contributor
This code isn't nearly as elegant as yours, Kim, but perhaps it is useful to someone else, so I'm posting it. Holly

CODE:
# ---------------------------------------------------------------------------
# Select Random Parcels.py
# Created on: 2011-5-18
# Creator: Holly Copeland (hcopeland@tnc.org)
# Requirements: Spatial Analyst Extension
# Description: This script randomly selects data from a feature class and
# stops selecting when it reaches a threshold value set by the user
# It also iterates multiple times and generates summary statistics from a
# separate field at the end
# ---------------------------------------------------------------------------

# Import arcpy module
import arcpy, os, sys, random, array
from arcpy.sa import *
from arcpy import env

# Check out the ArcGIS Spatial Analyst extension license
arcpy.CheckOutExtension("Spatial")

arcpy.env.workspace = "D:\\Work\\SageGrouseEaseStudyBackup\\Data\WYParcels_Merge.gdb"
#
#Specify Input Parameters
#
fc = "WYParcelsPrivate_AgValuesMay2011"

scenarioValue = 5000000  #this is the THRESHOLD value
iterationCount = 0
iterationNumber = 5 #number of iterations

sgSumTotal = []
sgMeanTotal = []

while iterationCount < iterationNumber:
    #Set parcel accumulation value to 0
    accumulatorValue = 0
    objectIDList = []

    #Set sage grouse accumulation value to 0
    sagegrouseValue = 0
    sagegrouseList = []
    sagegrouseListSum = []

   
    # Create a search cursor
    #


    # The variables row and rows are initially set to None, so that they
    #  can be deleted in the finally block regardless of where (or if)
    #  script fails.
    #
    row, rows = None, None
   
    #Start while statement to accumulate parcels
    #Value in while statement should be the same as the scenario
   
    while accumulatorValue < scenarioValue:
        objectID = random.randrange(1,27577) #generate random number to select parcel object id
        expression = '"OBJECTID" = ' + str(objectID) 
       
        try:
            rows = arcpy.SearchCursor(fc, expression)  #select the objectid from above

            #Iterate through the rows in the cursor of randomly selected parcels
            for row in rows:
                #assuming you are sampling without replacement
                if row.OBJECTID not in objectIDList:
                    accumulatorValue += row.AdjustedParcelValue
                    sagegrouseValue += row.SageGrouseTest
                    objectIDList.append(row.OBJECTID)
                    sagegrouseList.append(row.SageGrouseTest)
                   
        except:
            arcpy.AddError("Script Bombed")

    #calculate statistics for each iteration
    sgSum = sum(sagegrouseList)
    sgMax = max(sagegrouseList)
    sgMin = min(sagegrouseList)
    sgMean = sum(sagegrouseList)/len(sagegrouseList)
    sgSumTotal.append(sgSum)
    sgMeanTotal.append(sgMean)
    print "Sage Grouse Statistics for Run #" ,iterationCount
    print "No of parcels:" ,len(objectIDList)
    print "Max:" ,sgMax
    print "Min:" ,sgMin
    print "Sum:" ,sgSum
    print "Mean:" ,sgMean


    iterationCount += 1

sgSumTotalFinal = sum(sgSumTotal)
sgMeanTotalFinal = sum(sgMeanTotal)/len(sgSumTotal)

print "Final Sage Grouse Statistics (Sum, Mean):" ,sgSumTotalFinal, sgMeanTotalFinal
0 Kudos