Removing Duplicate feature Points/Records

15067
9
10-28-2010 04:59 AM
JasonHodgkins
New Contributor
I have a point shapefile with Name, address etc in the attribute field. I want to delete all the duplicate records from the attribute table with the same name. For example I have a 10 records that have SMITH in the name field. How can I remove/delete records to just list 1 record for SMITH. Can someone suggest a better way than just sorting the data and doing it manually as there are thousands of records. Is there a tool or a selection method that will work or a script/python code I code easily apply to perform this task?
0 Kudos
9 Replies
RichardFairhurst
MVP Honored Contributor
I have a point shapefile with Name, address etc in the attribute field. I want to delete all the duplicate records from the attribute table with the same name. For example I have a 10 records that have SMITH in the name field. How can I remove/delete records to just list 1 record for SMITH. Can soactual meone suggest a better way than just sorting the data and doing it manually as there are thousands of records. Is there a tool or a selection method that will work or a script/python code I code easily apply to perform this task?


Since this is a point shapefile are you wanting to delete the actual points or are you wanting a separate table derived from the points that has unique values for that field?  For either case you can perform a Summary, but to delete the actual points you have to perform a few more steps.

A Summary on the NAME field will create a new sorted table with the unique values of that field and a count of how many occurances of that value are in the point shapefile.  If that is all you want, that is all you have to do.

If you don't care which point is chosen to represent the unique Name value and you want to actually delete points, you would need to do a Summary, but also add a Min, Max or First summary on some other field that contains a value that represents a unique key for each of the points.  The Summary tool may not let you use the ObjectID field directly, so you may want to add a temporary field of type Long and calculate the ObjectID into it.  Then you could do a Min summary on the temporary OID field.

Once you have the new Summary table with the summary field for the unique key, join the Summary table to the point class on the summary unique key field.  If you want to create a new exported copy of the unique points, select all values where the joined table ObjectID is not Null, break the join and export the selected records.  If you want to actually delete the unmatched points from the original point shapefile and have no back up, select all values where the joined table ObjectID is Null, break the join and do a deletion of the selected records in an Edit session.  This method does not take into account any spatial characteristics of the points, so the point that is left will be basically random (but every unique NAME value will be preserved in the final result).

If you want to preserve the point spatial data but reduce the number of attribute rows to a single row, you should convert your points to multipoints using the Dissolve tool.  That will preserve all your point geometry within a single multi-point feature for each unique NAME value.

If you want multiple fields to be used to select records with unique combinations rather than one field, I typically like to create a concatenated version of those field values and use that as a summary field with the above method.  Some sort of script is another alternative for dealing with multiple field unique combinations.

There may be other options depending on the final result you want and whether or not you want some spatial characteristic to be considered for choosing the points to keep and the points to be deleted.
DanLee
by Esri Regular Contributor
Esri Regular Contributor
If you have ArcGIS 10, there is a Delete Identical tool that allows you to specify a field and delete the duplicates but one.

Regards,
0 Kudos
SeanCook
New Contributor
And if you don't have the right license, something along this line will get you there from the python scripting window

rows = arcpy.UpdateCursor('10MileQ3')
fields = arcpy.ListFields('10MileQ3')

myList = []

for row in rows:
~~for field in fields:
~~~~if field.name == 'COMPID':
~~~~~~value = row.getValue(field.name)
~~~~~~if value in myList:
~~~~~~~~rows.deleteRow(row)
~~~~~~if value not in myList:
~~~~~~~~myList.append(value)


Those curlymajiggers should be spaces, but I could not get the formatting right.
0 Kudos
JohnTangenberg
New Contributor II
I have been using Delete Identical with only the shape filed checked in a model to delete stacked polygons  after doing an intersect of census blocks and multiple overlapping buffers. I discovered that not all of the records with the same shape are being deleted. because of the spatial joins that happen after this intersect I thought a work-around would be to place the attributes onto centroid points. The duplicate points show no  spatial diffrence zoomed in to 1:0.  Even run out side the model Find Identical sees the records but Delete Identical does not. I only found this by doing a set of QC calculations checking that all the census numbers added up to the original block areas and populations, a few (not all) came up as high (so far) as 200%  I need to process some 60000 census blocks this way so I need better reliability in the deleting of duplicate records. any suggestions. I am using ArcInfo desktop, features are stored in a File Geodatabase, and all the data is in the NAD_1983_UTM_Zone_11N projection.
0 Kudos
DanLee
by Esri Regular Contributor
Esri Regular Contributor
Unfortunately there is no easy workaround for Delete Identical. It might be quicker to write some code to sort out what's identical from the output table of Find Identical and delete the duplicates but one for each group.

Could you do me a favor? Could you send to me (dlee@esri.com) just one case where you see that Find Identical finds duplicated features (polygons), but Delete Identical won't delete them properly? That would help us to investigate the issue. Thanks!
0 Kudos
JohnTangenberg
New Contributor II
I did work around it by adding XY Coordinates to the points and deleting again based on the xy values. it found 15 records. Lee - I will run the process again with out the xy and you can compare. - John
0 Kudos
CindyMeeker
New Contributor
... I discovered that not all of the records with the same shape are being deleted. ....


I have had the same problem when running Delete Identical Tool within Desktop on a set of points using only shape to compare the points.  When I run the Find Identical Tool on the same set of points (25059 points total) it finds 62, but Delete Identical found none until I added a couple of feet to the XY tolerance. I know that these points will not be that close, but this will not always be an exceptable workaround.
0 Kudos
AndresSevtsuk
New Contributor
Having the same issue: FindIdentical detects a number, but DeleteIdentical doesn't catch most of them. Unfortunately FindIdentical doesn't seem to have any use in actually selecting duplicates...  Is there any resolution to this yet? Need a tool to select duplicates.

Thanks,

Andres
0 Kudos
DanLee
by Esri Regular Contributor
Esri Regular Contributor
The problem has been resolved recently in the upcoming ArcGIS 10.0 SP3.

For the time being, you can try the following workaround:
1. Run Find Identical tool to get an output table.
2. Run Frequency tool on the FEA_SEQ field in the output table.
3. Select FREQUENCY value > 1. If there are records selected, it means there are duplicates.
4. Run Delete Identical tool.
5. Repeat step 1 - 3. If there are still duplicates, Run Delete Identical tool again until no dups.

Sorry about the inconvenience. Thanks for John's sample data. Hope you can run the tool successfully with SP3.

Thanks.