Please Help! Calculate Median Value

8976
11
Jump to solution
10-11-2013 01:06 PM
Chang-HengYang
New Contributor III
Hi all,

Thank you for your time in advance. I tried to update the script (downloaded from Calculate Median Value) from v9.3 to v10.1 and ran the script in the tool. However, V10.1 does not work (v9.3 works). The attached file would show the error message. I also listed my original v10.1 python code. I tried to add the filed_name in arcpy.da.SearchCursor(inputTable, readField). However, it still failed to work. Please let me know if there is any problem with my v10.1 python code.

Thanks,
Hank

v9.3 version python code from Corey Denninger
Script created by Chris Snyder; modified by C. Denninger
# Script resulted from ESRI Forum posts dated 04FEB2010.

# This script is used in a script tool within ArcGIS and calculates
#..the median value of one entire field and posts that single median
#...value to every row in one other entire field.

import sys, arcgisscripting
gp = arcgisscripting.create(9.3)


inputTable = gp.GetParameterAsText(0)
readField = gp.GetParameterAsText(1) #this needs to be a numeric field
writeField = gp.GetParameterAsText(2) #this needs to be a double or float


valueList = []
searchRows = gp.searchcursor(inputTable)
searchRow = searchRows.next()
while searchRow:
   searchRowValue = searchRow.getvalue(readField)
   if searchRowValue == None:
      pass #don't add a null value to the list!
   else:
      valueList.append(searchRowValue)
   searchRow = searchRows.next()
del searchRow
del searchRows


valueList.sort()
listLength = len(valueList)
if listLength == 0:
   print "Every value was null! Exiting script..."; sys.exit()
elif listLength % 2 == 0: #even, so get the mean of the 2 center values
   medianValue = (valueList[listLength / 2] + valueList[listLength / 2 - 1]) / 2.0
else: #odd, so it's easy!
   medianValue = valueList[listLength / 2]
updateRows = gp.updatecursor(inputTable)
updateRow = updateRows.next()
while updateRow:
   updateRow.setvalue(writeField, medianValue)
   updateRows.UpdateRow(updateRow)
   updateRow = updateRows.next()
del updateRow
del updateRows





V10.1 python code
import arcpy, sys, os

inputTable = arcpy.GetParameterAsText(0)
readField = arcpy.GetParameterAsText(1)
writeField = arcpy.GetParameterAsText(2) 

valueList = []
searchRows = arcpy.da.SearchCursor(inputTable)
searchRow = searchRows.next()
while searchRow:
    searchRowValue = searchRow.getValue(readField)
    if searchRowValue == None:
        pass #don't add a null value to the list!
    else:
        valueList.append(searchRowValue)
    searchRow = searchRows.next()
del seachRow
del searchRows

valueList.sort
listLength = len(valueList)
if  listLength == 0:
    print "Every value was null! Exiting script..."; sys.exit()
elif listLength % 2 == 0: #even, so get the mean of the 2 center values
    medianValue = (valueList[listLength / 2] + valueList[listLength / 2 - 1]) / 2.0

else:
    medianValue = valueList[listLength / 2]
updateRows = arcpy.da.UpdateCursor(inputTable)
updateRow = updateRows.next()
while updateRow:
   updateRow.setvalue(writeField, medianValue)
   updateRows.UpdateRow(updateRow)
   updateRow = updateRows.next()
del updateRow
del updateRows
    
Tags (2)
0 Kudos
1 Solution

Accepted Solutions
DouglasSands
Occasional Contributor II
One more thing, you can replace all of this:


valueList.sort
listLength = len(valueList)
if  listLength == 0:
    print "Every value was null! Exiting script..."; sys.exit()
elif listLength % 2 == 0: #even, so get the mean of the 2 center values
    medianValue = (valueList[listLength / 2] + valueList[listLength / 2 - 1]) / 2.0

else:
    medianValue = valueList[listLength / 2]


With:
medianValue = numpy.median(valueList)


By importing numpy at the beginning of your script. The list also does not need to be sorted and you don't have to put in the logic to handle all of the list length issues. If there is an empty list, numpy will return "nan" (not a number). After calculating this, you can use numpy.isnan() to test and exit:
if numpy.isnan(medianValue):
    print "Every value was null! Exiting script..."
    sys.exit()

View solution in original post

11 Replies
DouglasSands
Occasional Contributor II
With arcpy.da, search cursors work a bit differently. You need to explicitly say which fields you want to be in your search cursor (arcpy.da.SearchCursor Help). This must be a list of fields, even though in your case you have just one. You then access the values in the row in the same way that you would pull items from a list.

So for your example, you would want something like this with the search cursor:
with arcpy.da.SearchCursor(inputTable, [readField]) as searchRows:
    for searchRow in searchRows:
        if searchRow[0] != None:
            valueList.append(searchRow[0])
        del searchRow
del searchRows


The update cursor would work similarly (arcpy.da.UpdateCursor Help😞
with arcpy.da.UpdateCursor(inputTable, [writeField]) as updateRows:
    for row in updateRows:
        row[0] = medianValue
        updateRows.updateRow(row)
        del row
del updateRows


Because you are iterating over lists, this should also run faster than the old cursors did.
T__WayneWhitley
Frequent Contributor
Yes, just like the error says, on line 8... in the da.searchcursor, you have to enter a 2nd required parameter for fields.

I'd recommend you try the da cursor, but if you just wanted to make this run for now...there's still the 'regular' cursor, probably for backward compatiblity support, just take out the .da:

searchRows = arcpy.SearchCursor(inputTable)
by Anonymous User
Not applicable
I wrote a tool to do this a while ago, this has always worked for me:

'''
Written By Caleb Mackey
4/17/2013

Calculates Median Statistics
'''

import arcpy, os, sys, traceback

# env settings
arcpy.env.overwriteOutput = True
arcpy.env.qualifiedFieldNames = False

def GetMedian(in_list):
    sorted_list = sorted(in_list)
    median = int(round(len(sorted_list) / 2))
    if len(sorted_list)%2==0:
        med_val = float(sorted_list[median-1]
                        + sorted_list[median]) / 2
    else:
        med_val = sorted_list[median]
    return med_val


def GetMedianValues(source_fc, new_table, case_field, value_field):
    
    ''' Generates a table with Median Values, summarized by case_field. If the
        goal is to get the median for the entire table, use a case field that has
        the same value for all records.

        source_fc - input feature class to compute median statistics for
        new_table - output table
        case_field - similar to dissolve field, computes stats based on unique values in this field
        value_field - field that contains the actual values for statistics; must be numeric
    '''
    
    # Get unique value list for query
    print 'starting cursor'
    with arcpy.da.SearchCursor(source_fc, [case_field]) as rows:
        un_vals = list(set(r[0] for r in rows))

    lyr = arcpy.MakeFeatureLayer_management(source_fc,'source_layer')
    values = {}

    # Get Median UseValue for each station name
    for st in un_vals:
        query = '"{0}" = \'{1}\''.format(case_field, st)
        arcpy.SelectLayerByAttribute_management(lyr, 'NEW_SELECTION', query)
        use_vals = []
        with arcpy.da.SearchCursor(lyr, [value_field]) as rows:
            for row in rows:
                if row[0] != None:
                    use_vals.append(row[0])
        if len(use_vals) > 0:
            median = GetMedian(use_vals)
            values[st] = [median, len(use_vals)]


    # Create new Summary Statistics table with median
    #
    if arcpy.Exists(new_table):
        arcpy.Delete_management(new_table)
    arcpy.CreateTable_management(os.path.split(new_table)[0],os.path.basename(new_table))

    # Get field names and types
    for field in arcpy.ListFields(source_fc):
        if field.name in [case_field, value_field]:
            ftype = field.type
            name = field.name
            length = field.length
            pres = field.precision
            scale = field.scale
            if name == value_field:
                if new_table.endswith('.dbf'):
                    name = 'MED_' + value_field[:6]
                else:
                    name = 'MED_' + value_field
                value_field2 = name
            arcpy.AddField_management(new_table,name,ftype,pres,scale,length)
            
    # Add frequency field
    arcpy.AddField_management(new_table,'FREQUENCY','LONG')

    # Insert rows
    with arcpy.da.InsertCursor(new_table, [case_field, value_field2, 'FREQUENCY']) as rows:
        for k,v in sorted(values.iteritems()):
            rows.insertRow((k, v[0], v[1]))
            
    # report results
    print 'Created %s' %os.path.basename(new_table)
    arcpy.AddMessage('Created %s' %os.path.basename(new_table))

    # .dbf's are automatically given a 'Field1' field...Clean this up
    try:
        if new_table.endswith('.dbf'):
            arcpy.DeleteField_management(new_table, 'Field1')
    except:
        pass
    print 'Done'


if __name__ == '__main__':

##    # testing
##    source_fc = r'C:\Testing\Test.gdb\CSR_by_TWP'
####    new_table = r'C:\Testing\Test.gdb\Median_CSR' #gdb test
##    new_table = r'C:\Testing\Median_CSR.dbf'  #dbf test
##    case_field = 'NAME'
##    value_field = 'AVE_CSR'

    # Script tool params
    source_fc = arcpy.GetParameterAsText(0)
    new_table = arcpy.GetParameterAsText(1)
    case_field = arcpy.GetParameterAsText(2)
    value_field = arcpy.GetParameterAsText(3)

    GetMedianValues(source_fc, new_table, case_field, value_field)

    


Hope this helps

EDIT: I should mention this is a 10.1 + version.
KatjaKrivoruchko
New Contributor III

Thanks for posting this, very helpful! I have updated it to Python 3.x, which is what ArcGIS Pro ships with:

'''

Written By Caleb Mackey

4/17/2013

Updated to Python 3.x (ArcGIS Pro 1.0) by Katja Krivoruchko

4/16/2015

Calculates Median Statistic

'''

import arcpy, os, sys, traceback

# env settings

arcpy.env.overwriteOutput = True

arcpy.env.qualifiedFieldNames = False

def GetMedian(in_list):

    sorted_list = sorted(in_list)

    median = int(round(len(sorted_list) / 2))

    if len(sorted_list)%2==0:

        med_val = float(sorted_list[median-1]

                        + sorted_list[median]) / 2

    else:

        med_val = sorted_list[median]

    return med_val

def GetMedianValues(source_fc, new_table, case_field, value_field):

    

    ''' Generates a table with Median Values, summarized by case_field. If the

        goal is to get the median for the entire table, use a case field that has

        the same value for all records.

        source_fc - input feature class to compute median statistics for

        new_table - output table

        case_field - similar to dissolve field, computes stats based on unique values in this field

        value_field - field that contains the actual values for statistics; must be numeric

    '''

    # Get unique value list for query

  

    with arcpy.da.SearchCursor(source_fc, [case_field]) as rows:

        un_vals = list(set(r[0] for r in rows))

    lyr = arcpy.MakeFeatureLayer_management(source_fc,'source_layer')

    values = {}

    # Get Median UseValue for each station name

    for st in un_vals:

        query = '"{0}" = \'{1}\''.format(case_field, st)

        arcpy.SelectLayerByAttribute_management(lyr, 'NEW_SELECTION', query)

        use_vals = []

        with arcpy.da.SearchCursor(lyr, [value_field]) as rows:

            for row in rows:

                if row[0] != None:

                    use_vals.append(row[0])

        if len(use_vals) > 0:

            median = GetMedian(use_vals)

            values[st] = [median, len(use_vals)]

    # Create new Summary Statistics table with median

    #

    if arcpy.Exists(new_table):

        arcpy.Delete_management(new_table)

    arcpy.CreateTable_management(os.path.split(new_table)[0],os.path.basename(new_table))

    # Get field names and types

    for field in arcpy.ListFields(source_fc):

        if field.name in [case_field, value_field]:

            ftype = field.type

            name = field.name

            length = field.length

            pres = field.precision

            scale = field.scale

            if name == value_field:

                if new_table.endswith('.dbf'):

                    name = 'MED_' + value_field[:6]

                else:

                    name = 'MED_' + value_field

                value_field2 = name

            arcpy.AddField_management(new_table,name,ftype,pres,scale,length)

            

    # Add frequency field

    arcpy.AddField_management(new_table,'FREQUENCY','LONG')

    # Insert rows

    with arcpy.da.InsertCursor(new_table, [case_field, value_field2, 'FREQUENCY']) as rows:

        for k,v in sorted(values.items()):

            rows.insertRow((k, v[0], v[1]))

            

    # report results

    print ('Created %s' %os.path.basename(new_table)  )

    arcpy.AddMessage('Created %s' %os.path.basename(new_table))

    # .dbf's are automatically given a 'Field1' field...Clean this up

    try:

        if new_table.endswith('.dbf'):

            arcpy.DeleteField_management(new_table, 'Field1')

    except:

        pass

    #print 'Done'

if __name__ == '__main__':

    # Script tool params

    source_fc = arcpy.GetParameterAsText(0)

    new_table = arcpy.GetParameterAsText(1)

    case_field = arcpy.GetParameterAsText(2)

    value_field = arcpy.GetParameterAsText(3)

    GetMedianValues(source_fc, new_table, case_field, value_field)

DanHoffman1
New Contributor III

Thank you Caleb...worked like a charm!  I naively thought that 'median' was just going to be a simple calculation function in the Field Calculator...thank god for your script!

by Anonymous User
Not applicable

No problem, glad to see this is still getting used!

0 Kudos
DouglasSands
Occasional Contributor II
One more thing, you can replace all of this:


valueList.sort
listLength = len(valueList)
if  listLength == 0:
    print "Every value was null! Exiting script..."; sys.exit()
elif listLength % 2 == 0: #even, so get the mean of the 2 center values
    medianValue = (valueList[listLength / 2] + valueList[listLength / 2 - 1]) / 2.0

else:
    medianValue = valueList[listLength / 2]


With:
medianValue = numpy.median(valueList)


By importing numpy at the beginning of your script. The list also does not need to be sorted and you don't have to put in the logic to handle all of the list length issues. If there is an empty list, numpy will return "nan" (not a number). After calculating this, you can use numpy.isnan() to test and exit:
if numpy.isnan(medianValue):
    print "Every value was null! Exiting script..."
    sys.exit()
DanPatterson_Retired
MVP Emeritus

Once numeric values are obtained from a field, via whatever means, medians are relatively easy to determine using numpy even if the field contains NaN's

>>> import numpy as np
>>> a = np.array([1,2,np.NaN,3.,4,np.NaN],dtype='float64')
>>> np.median(a)
3.5
>>> np.nanmedian(a)
2.5
>>>
SamanehHejazi1
New Contributor

that's why I love numpy and panda

0 Kudos