Hi,
For training a cannabis cultivation classification model you will need 1. Input raster (NAIP/High resolution imagery) and a feature layer which represents cannabis farms. Make sure both have same spatial reference. In your case pixel classification is more than object detection.
You can refer this notebook to understand the whole workflow exporting training data, training a deep learning model and using the trained model to predict: https://developers.arcgis.com/python/samples/extracting-slums-from-satellite-imagery/
The above notebook shows how a pixel classification model can be trained for classifying slums using a 3 band satellite imagery.