NSF BBSRC Imaging Flow Cytometry Project
(July 15, 2015–June 30, 2018) [Project abstract at nsf.gov] [Project Outcomes Report (search here, using Fed Award ID 1458626 (PDF))]
Non-technical description of the problem:
There are a variety of cell types circulating in human blood, each with distinct morphologies and functions. It is not trivial for clinicians and biologists to examine a blood sample (typically 10 ml) and reach an accurate conclusion about the profiles, ratios, roles, health and underlying disorders of different blood cell types in such a complex mixture. Imaging flow cytometers can capture images from thousands of cells per second but the software to analyze images from these instruments has been limited, preventing useful applications in biomedical research and clinical practice.
The goal of this project is to develop and demonstrate software to mine data from imaging flow cytometers. These instruments can capture thousands of images of cells per second. The images can in theory be analyzed to precisely measure hundreds of features related to a cell's appearance ("morphology"); this project is to develop advanced machine-learning software to accomplish this, unlocking the otherwise hidden information within the images. The software will be developed, improved, and validated in several demonstration experiments involving the cell cycle, the component cells of primary blood, immune cell activation, and stem cell identity. The goal will be to use as few or indeed no fluorescent biomarkers, eliminating the need to perturb cells. The resulting open-source software will be freely available to scientists worldwide for both applied and clinical research, and will be accompanied by user-friendly training materials and in-person workshops. The project is collaborative and interdisciplinary and includes training early career-stage scientists in computational biology, via the existing Scientists without Borders program.
The project involves close collaboration with researchers using imaging flow cytometers and builds on successful interdisciplinary work in biological data mining. In order to devise the novel software and methodology to mine the large datasets acquired using imaging flow cytometry, the team will develop algorithms to seamlessly import data from an imaging cytometer, robustly segment cells, quality-filter them (e.g., for debris and blur), and quantify morphological parameters (usually hundreds) for each cell (usually thousands), including various measures of size, shape, and texture. Using these features, trained machine-learning algorithms will identify cell phenotypes of interest or otherwise characterize the state of cell in driving biological projects from project partners who use imaging flow cytometry in a host of biological research studies. The goal will be to use as few or indeed no fluorescent biomarkers, eliminating the need to perturb cells. The project will give the scientific community a validated, open-source software toolbox of image processing and machine learning algorithms readily usable by biologists.
as of Year 2, April 2017
We developed prototype workflows and tested CellProfiler, deep learning, or a combination of both to extract features from the IFC images, and then apply machine learning to profile phenotypes of collected cells.
Thus far, we found that extracting features of cells from images, followed by machine learning, can harvest information that is not visible to the human eye, which can eliminate the expense, time, and potential perturbation of adding fluorescent markers to cells (Publication 1). We developed and are currently improving a user-friendly workflow for imaging flow cytometry to aid biologists in applying these techniques to many important biological and clinical problems (Publication 2). The workflow includes all major steps of the analysis:
- digesting input data (accepting CIF files from imaging flow cytometry and TIFF files from conventional microscopy);
- feature extraction, including pre-defined features from image analysis software (CellProfiler) or deep learning features from neural networks either pre-configured or adapted/constructed by the researcher;
- machine learning classification (vector machine, tree classification and neural networks); and
- visualizing the output in a form of unsupervised clustering such as t-SNE or PCA.
In the final year of the project, with experimental data from project collaborators, we will optimize and prove these methodologies in diverse biological applications such as:
- Developing a label-free imaging flow cytometry assay for cell-cycle phase identification.
- Quantifying cellular heterogeneity in hematological data, especially in leukemia and immune system activation of allergic patients from clinical trials.
Our work is aimed at solving the bottleneck of manual cytometric data processing, i.e., time- consuming filtering, gating, profiling, and visualizing; all require significant effort from scientists. This work can then facilitate automated phenotyping, which in turn will broaden imaging flow cytometry usage in both basic science and translational medicine.
Publications supported, at least in part, by this grant:
Publication 1: [link] In Blasi, et al. (Nat Commun 2016), we demonstrated label-free prediction of DNA content and quantification of the mitotic cell cycle phases by applying supervised machine learning to morphological features extracted from brightfield and the typically ignored darkfield images of cells from an imaging flow cytometer. This method facilitates non-destructive monitoring of cells avoiding potentially confounding effects of fluorescent stains while maximizing available fluorescence channels. The method is effective in cell cycle analysis for mammalian cells, both fixed and live, and accurately assesses the impact of a cell cycle mitotic phase blocking agent.
Publication 2: [link] In Hennig, et al. (Methods 2016), we created an open-source pipeline to mine rich information in digital imagery from raw data of an imaging flow cytometer. Proprietary .cif files are imported into the open-source software CellProfiler, where an image processing pipeline identifies cells and subcellular compartments allowing hundreds of morphological features to be measured. This high-dimensional data can then be analysed using machine learning and clustering using user-friendly platforms such as CellProfiler Analyst.
- Blasi T, Hennig H, Summers HD, Theis FJ, Cerveira J, Patterson JO, Davies D, Filby A, Carpenter AE, Rees P (2016). Label-free cell cycle analysis for high-throughput imaging flow cytometry. Nat Commun 7:10256 / doi: 10.1038/ncomms10256. PMID: 26739115. PMCID: PMC4729834. [pdf]
- Hennig H, Rees P, Blasi T, Kamentsky L, Hung J, Dao D, Carpenter AE, Filby A (2016). An open-source solution for advanced imaging flow cytometry data analysis using machine learning. Methods 112:201-210 / doi: 10.1016/j.ymeth.2016.08.018. PMID: 27594698. PMCID: PMC5231320. [pdf]