Training sample for image classification of maps

EllaWes13 · ‎01-26-2024

Hi!

I'm using ArcGIS Pro on desktop and I would consider myself a beginner so bear with me.

I'm looking for some general tips and tricks on how to do a good training sample for accurate image classification. I have a bunch of satellite images of areas where I need to classify areas on the map into four classes: forest, water, buildings/roads and pastures. I've tried making a few training samples but I notice how ArcGIS Pro sometimes mistakes for example a shadow of a tree for water (since they're almost the same colour) or sometimes the opposite when classifying the image. Although I've spent sometimes hours to make a training sample, trying to be as meticulous as possible it still makes mistakes when making the classification.

In general, how do you go about making a training sample as accurate as possible (and tbh as simply as possible) ? Should I spend hours upon hours making training samples where I include every pixel on the map as a sample? Is it important to include at least one sample of each kind of every class? By that I mean that pastures for example can obviously look very different on satellite image, they can be green, brown or white in colour etc. and so do I need to include at least one green, at least one brown etc. for ArcGis Pro to know that all those are pastures? Is there anything to do about the problem of shadows/dark green forest being classified as water?

Thanks for any and all help!

ValerieCaden-Baptiste · ‎01-26-2024

Overall, your training data should be representative of each of your classes and show variability within each class. Your training samples should be homogenous and have no overlap with any other class. Your samples should also not have any "fuzzy" boundaries, for example, don't go to the edge of a specific class when collecting your sample. Your training samples should have a significant number of pixels, especially if you are using the maximum likelihood classifier. For example, a 10 by 10 block of pixels equals 100 pixels, which is a reasonable size for a training polygon and is statistically significant.

Yes, you should have multiple samples of each kind of class to show variability. Instead of aiming for a specific amount of training areas, it is important to consider how good and representative those areas are. That being said, the documentation below states that "Parametric classifiers, such as the maximum likelihood classifier, need a statistically significant number of samples to produce a meaningful probability density function. To achieve statistically significant samples, you should have 20 or more samples per class."

It sounds like you may be doing a pixel-based classification approach. I would also look into using Image Segmentation to classify objects. Instead of classifying pixels, the process classifies segments, which can be thought of as super pixels. Each segment, or super pixel, is represented by a set of attributes used by the classifier tools to produce the classified image.

Resources

Use Training Samples Manager: https://pro.arcgis.com/en/pro-app/latest/help/analysis/image-analyst/training-samples-manager.htm#:~....

Understanding segmentation and classification: https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-analyst/understanding-segmentation-a...

Segmentation: https://pro.arcgis.com/en/pro-app/latest/help/analysis/image-analyst/segmentation.htm

View solution in original post

ValerieCaden-Baptiste · ‎01-26-2024

Overall, your training data should be representative of each of your classes and show variability within each class. Your training samples should be homogenous and have no overlap with any other class. Your samples should also not have any "fuzzy" boundaries, for example, don't go to the edge of a specific class when collecting your sample. Your training samples should have a significant number of pixels, especially if you are using the maximum likelihood classifier. For example, a 10 by 10 block of pixels equals 100 pixels, which is a reasonable size for a training polygon and is statistically significant.

Yes, you should have multiple samples of each kind of class to show variability. Instead of aiming for a specific amount of training areas, it is important to consider how good and representative those areas are. That being said, the documentation below states that "Parametric classifiers, such as the maximum likelihood classifier, need a statistically significant number of samples to produce a meaningful probability density function. To achieve statistically significant samples, you should have 20 or more samples per class."

It sounds like you may be doing a pixel-based classification approach. I would also look into using Image Segmentation to classify objects. Instead of classifying pixels, the process classifies segments, which can be thought of as super pixels. Each segment, or super pixel, is represented by a set of attributes used by the classifier tools to produce the classified image.

Resources

Use Training Samples Manager: https://pro.arcgis.com/en/pro-app/latest/help/analysis/image-analyst/training-samples-manager.htm#:~....

Understanding segmentation and classification: https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-analyst/understanding-segmentation-a...

Segmentation: https://pro.arcgis.com/en/pro-app/latest/help/analysis/image-analyst/segmentation.htm