Help! I want to use CellProfiler Analyst but I ran my pipeline with ExportToSpreadsheet!

David R. Stirling

We’ve all done this before – CellProfiler Analyst needs a database file, but your CellProfiler pipeline that took days to run was set up to export spreadsheets. To make matters worse, you also need a .properties file explaining what to do with that database. Fortunately, you don’t need to re-run the entire experiment to get your data into CellProfiler Analyst. This blog post will cover how to convert a spreadsheet into a database and make a usable .properties file with which to load it into CellProfiler Analyst.

Step 1: Creating a database from output spreadsheets

 

A database file is largely just a series of tables. Because of this we can generate a database from a series of spreadsheets. For this workflow we’ll be using SQLite databases, since these are easy to work with and don’t require a server.

To create a database, on either Mac or PC we recommend installing DB Browser for SQLite from https://sqlitebrowser.org/. This free program provides all the tools we’ll need to turn .csv files into a database. If we open DB Browser, we should see a “New database” button in the top left. Select this and provide a destination for your new database (.db) file. Once that’s done you’ll be asked to start defining a table. Since we’re going to be importing our tables we can just hit “Cancel” to return to the main screen.

The main DB Browser window
The main DB Browser window.

 

We’re now looking at an empty database. To turn our spreadsheets into tables we need to import them. In the top menu hit “File - Import - Table from CSV file…

In the resulting popup, navigate to and select all the spreadsheets that we need to import. You should then see the screen below, which will allow you to configure the import.


Import interface for DB Browser

 

The settings will largely be automatic if we’re using CSV files, but importantly we need to import the selected sheets individually as separate tables. To do this, uncheck all but one of the input sheets (you can click on it to get a preview in the bottom half of the window), enter a name for the table at the top of the window and hit OK. That sheet will be added to the database and removed from the list, but the importer will remain open until we’ve finished with all the sheets we’re opening. Select, name and then import the remaining sheets to complete the database.

If you’ve exported spreadsheets to multiple subfolders, you can repeat the import sequence for the contents of each folder. Entering the same “Table name” when importing additional sheets will allow you to merge datasets together, which may be useful when combining multiple runs (e.g. Nuclei.csv from experiment 1 and experiment 2). If there are large numbers of subfolders to combine, you may want to try using concatenating files with csvstack before importing, or generating the database with cytominer-database.

If that’s been done correctly, you should now be able to see all the imported tables within your new database. Press “File - Write Changes” to save the database. We’ve now finished with DBBrowser for now. This software can also be useful for viewing your database tables, which can be done by clicking the “Browse Data” tab.


Example of a generated database.

 

Step 2: Creating a CellProfiler Analyst .properties file

 

Alongside the database, CellProfiler Analyst also needs a .properties file to tell it where the database is and which images, objects and settings you’d like to use with that database. There are two ways we can create this file.

 

Option 1 – Have CellProfiler do most of it for you

 

While we don’t want to run the entire analysis again, we can instead construct a pipeline using just a single image set and run that with the ExportToDatabase module to generate a .properties file for us to use with our database.

Open up the CellProfiler pipeline you previously used to perform the analysis, then remove all but a single set of images. We won’t be using the results data from this run so which set of images is not important, what matters is that we can run the pipeline workflow from start to end. At the end of your pipeline, add an “ExportToDatabase” module and enable the option to generate a .properties file (see below). You should also disable any other Export/SaveImage modules to avoid overwriting results from your original run. Run this pipeline and collect the .properties file it produces.


ExportToDatabase settings for generating a .properties file.

 

We now need to edit the .properties file to have it use the database we made from the full results database, which we made in Step 1. The .properties file can be opened in any text editor. Any line preceded by # will be ignored by CellProfiler Analyst, which can be used to disable unwanted options or leave yourself notes or comments. See the end of this section for an example.

The key parameters that you may need to change are as follows:

Database Info

db_type  should be sqlite (since we made this using DB Browser)
db_sqlite_file  needs to point to the full file path of the database we created in Step 1.

Database Tables

The image and object table parameters need to reference a table from the database we created. ExportToDatabase likes to generate tables named with the format “Title_Per_Object”, but the tables we’ve imported manually will probably be called “Title_Object”. We need to edit the names for each table type to match what we created in our database.

Database Columns

Just like the tables themselves, the spreadsheets we obtained from ExportToSpreadsheet may not share exactly the same column names as those from ExportToDatabase. Most notably, in our properties file the default object_id setting Number_Object_Number should be changed to just ObjectNumber. Some other properties file settings will probably need the prefix removed: for example, Image_Group_Number becomes Group_Number. The same applies to the Location_ settings, which are particularly important for the classifier module.

The best way to do this is to open the database in DB Browser and search for the relevant columns in your custom database, then edit your properties file to match.

Once all the table and column names are correct, you should be ready to go. The rest of the parameters were supplied by ExportToDatabase and should be valid if you didn’t change the pipeline. The resulting file should look something like the following example.


Example .properties file after editing to better reflect how data typically comes out of ExportToSpreadsheet.

 

Option 2 – Roll your own .properties file

An alternative approach is to create a .properties file from scratch. You can download a template file from this page

The guide on that page will explain all of the options you need to specify, and is also useful for understanding some of the other settings that can be used with CellProfiler Analyst.

 

Other useful .properties file settings

The image_tile_size setting allows you to change the size of the thumbnails displayed when using the classifier tool. Default is 50 pixels, but if your objects are much smaller or larger in diameter you may want to adjust this setting to provide a more useful thumbnail showing the entire object.

If you’re using multi-channel images, CellProfiler Analyst will be set up to interpret them as grayscale by default. Image settings are stored as a comma-separated list. The channels_per_image setting can be used to tell CellProfiler Analyst that an image contains multiple channels. In the example below we have 3 images, of which Image2 is multi-channel and Image3 is brightfield.

Channels_per_image example

When we provide values for image_channel_colors, the values will be applied sequentially according to the channels_per_image setting. This means that when reading this list it will take 1 color for Image1 (magenta), the next 3 for Image2 (red, green, blue) and the final color (grey) for Image3. Be sure to provide enough colors for all the channels you’re using. The same color can be used more than once, and you’ll be able to toggle their display (or override the color specified by the properties file) in CellProfiler Analyst itself.

image_channel_blend_modes is an optional setting that allows you to alter how multiple channels are mixed together for display. Values are provided in a similar way to the channel colors. The options are add and subtract, meaning whether an image’s intensity values are added or subtracted from the other channels when making a final pixel intensity value. ‘Subtract’ mode is primarily useful when working with brightfield images, which have a dark signal against a light background.

You should now be ready to use CellProfiler Analyst. When you launch CellProfiler Analyst, it will ask to load a properties file. If you load the file you’ve created but the program closes, there’s probably an incorrect setting somewhere. Be sure to double-check that the database location and table/column names match the database that you’re trying to load, and that the computer on which you are running CellProfiler Analyst has permissions to read the database and image files. If everything is correct you should see the screen below, in which case you can start analysing!


The main CellProfiler Analyst window.

 

Hopefully this guide will cover most scenarios in which you need to generate a database and properties file from a CellProfiler run that was set to export spreadsheets. If you’re still having trouble, please feel free to make a post over at the image.sc forum for further help!