Annotated image data is valuable for assessing the performance of an image processing pipeline and as training data for machine learning methods such as deep learning. When assessing the performance of a CellProfiler pipeline, for example a pipeline that segments nuclei, the annotated image data are used as the ground truth. The performance of the pipeline can be quantified by comparing the segmentation output to the ground truth and calculating a comparison metric, such as the Jaccard Index or F1 Score. Annotated images are also essential for deep learning applications as training data, for example see the 2018 Data Science Bowl; an in-depth discussion on how the Data Science Bowl images were annotated can be found on the Kaggle forum.
This blog post outlines a method for annotating image data using CellProfiler together with another open source software, GIMP. This method is best for annotating or labeling objects to define their boundaries, exactly, as opposed to annotating an image with bounding boxes or centroids; some tools that are adept at annotating with bounding boxes are labelme and labelbox.io. GIMP is used to draw outlines around objects of interest and to label background regions. An outline image is exported from GIMP and then a CellProfiler pipeline converts this outline image into a label image.
Creating a set of annotations
- Install CellProfiler and GIMP.
- Download the annotation tutorial files.
Open the image to annotate in GIMP.
- Check to see if the image mode is RGB. If not, then change the mode to RGB. Image > Mode > RGB
Adjust the brightness and contrast to clearly show objects of interest.
- Colors > Brightness-Contrast…
Create the annotation layers. It is useful to dedicate a layer to each class of object.
- Create a layer and name it “nuclei”
Customize the pencil tool.
- The foreground color will be red (255,0,0)
- The background color will be blue (0,0,255)
Draw the outline around nuclei in the nuclei layer
- Make sure the annotation layer is selected and highlighted.
Helpful GIMP shortcuts:
- Hold “Ctrl” and scroll mouse wheel to zoom in and out.
- Hold “Spacebar” and move the mouse to pan.
Draw at least 1 pixel of background in the background layer.
- Note, any region or pixel enclosed by outlines in the nuclei layer that belongs to background must be labeled with the background color. Otherwise it will be regarded as a nuclei.
- Note, in this annotation method, a pixel can only have a single label.
- Nuclei on the border do not need to have an outline along the border.
Export the outline image.
- Hide the image layer so that only the outlines and background.
- File > Export As…
- Open CellProfiler and import the outline image(s).
- Running CellProfiler will produce a folder of label images and image masks.
Testing a pipeline against ground truth
- Run each CellProfiler prediction pipeline. A label image will be exported from each pipeline.
Run the CellProfiler compare ground truth pipeline.
- Import the prediction label images and the ground truth.
- The segmentations from the prediction pipelines can be compared for accuracy quantitatively.
- Annotation: A piece of metadata or label, created by a human or a seperate machine method, that specifies an object or region of interest within an image or a general characteristic about the image. An annotation could be many things such as a bounding box around an object in an image, or an outline of an object within an image. An outline of an object requires more complex storage vs. a bounding box because it is either an array of pixel locations or a copy of the image with the outline information.
- Deep learning: A machine learning technique that involves the design and training of models that perform automated tasks such as segmentation, detection, or classification. The technique was coined with the word “deep” to refer to the many layers of small computation units, the neural net, used to generate a nonlinear classifier with remarkable performance.
- Label: Synonymous with annotations. A label image refers to an image where unique objects are distinguished from each other using integer values as pixel intensities. For example, the first object in an image would have all the pixels belonging to that object with a pixel value of 1. The second object would have pixel values of 2 and so on.
- Mask: A binary image that corresponds to a single object in the source image. The background pixels have a value of 0 and the foreground pixels belonging to the object have a value of 1. A separate mask image will exist for each individual object in an image.
- Test data: A set of annotated images kept separate from the training data for testing the performance of a machine learning method. The separation of training and test data is important to test if a method has overfit the training data.
- Training data: A set of annotated images set aside for training supervised machine learning methods, including deep learning.