Institute for Frontier Life and Medical Sciences, Kyoto University

A clustering-independent method for finding differentially expressed genes in single-cell transcriptome data

Alexis Vandenbon1,2* & Diego Diez3

(1 Institute for Frontier Life and Medical Sciences, Kyoto University. 2 Institute for Liberal Arts and Sciences, Kyoto University. 3 Immunology Frontier Research Center, Osaka University)
* To whom correspondence should be addressed. Email: alexisvdb*infront.kyoto-u.ac.jp (Replace the ∗ with @)

A clustering-independent method for finding differentially expressed genes in single-cell transcriptome data

Nature Communications (2020),     DOI: 10.1038/s41467-020-17900-3

Abstract

A common analysis of single-cell sequencing data includes clustering of cells and identifying differentially expressed genes (DEGs). How cell clusters are defined has important consequences for downstream analyses and the interpretation of results, but is often not straightforward. To address this difficulty, we present singleCellHaystack, a method that enables the prediction of DEGs without relying on explicit clustering of cells. Our method uses Kullback–Leibler divergence to find genes that are expressed in subsets of cells that are non-randomly positioned in a multidimensional space. Comparisons with existing DEG prediction approaches on artificial datasets show that singleCellHaystack has higher accuracy. We illustrate the usage of singleCellHaystack through applications on 136 real transcriptome datasets and a spatial transcriptomics dataset. We demonstrate that our method is a fast and accurate approach for DEG prediction in single-cell data. singleCellHaystack is implemented as an R package and is available from CRAN and GitHub.

singleCellHaystack R package on GitHub

singleCellHaystack R package on CRAN

Below are 2 example applications of our methods, on a single-cell RNA-seq dataset (Figure 1) and on a spatial transcriptomics dataset (Figure 2).

Figure 1: Example application of singleCellHaystack on a bone marrow tissue dataset. a t-SNE plot of the 5250 cells in this dataset. The color scale shows the number of genes detected in each cell. b–f Expression patterns of five high-scoring DEGs predicted by our method.

Figure 2: Prediction of DEGs using singleCellHaystack in spatial transcriptomics data of the mouse anterior brain. a–f Expression levels (normalized counts) per bead in the anterior1 slice of the mouse brain are shown for the top 6 high-scoring genes returned by our method. Each figure shows circles representing the 2696 beads superimposed on a slice of the anterior mouse brain. The locations of the circles correspond to the 2D coordinates of the beads and their colors reflect the expression of each gene.