JSC [Java statistical API]
|
1.0 |
|
Java |
scLVM
|
0.99.3 |
Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, Teichmann SA, Marioni JC & Stegle O, 2015. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-Sequencing data reveals hidden subpopulation of cells, Nature Biotechnology, doi: 10.1038/nbt.3102. |
R |
GPy
|
1.5.6 |
GPy is a Gaussian Process (GP) framework written in python, from the Sheffield machine learning group. |
Python |
limix
|
0.8.0.dev0 |
Limix is a flexible and efficient linear mixed model library with interfaces to Python. |
Python |
h5py
|
2.6.0 |
The h5py package is a Pythonic interface to the HDF5 binary data format. |
Python |
NA
|
0.18.1 |
NA |
NA |
numpy
|
1.11.2 |
|
Python |
scipy
|
0.18.1 |
|
Python |
Java JDK
|
1.8.0_111 |
|
Java |
Python
|
2.7.5 |
|
Python |
ComBat [sva package]
|
3.24.4 |
The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. Specifically, the sva package contains functions for the identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data (like gene expression/RNA sequencing/methylation/brain imaging data) that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise. The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv). Removing batch effects and using surrogate variables in differential expression analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility, see (Leek and Storey 2007 PLoS Genetics, 2008 PNAS or Leek et al. 2011 Nat. Reviews Genetics).
Johnson, WE, Rabinovic, A, and Li, C (2007). Adjusting batch effects in microarray expression data using Empirical Bayes methods. Biostatistics 8(1):118-127. |
R |
R version & default packages [stats] (K-means, PCA, Hierarchical clustering, distance, correlation)
|
3.3.1 |
|
R |
SC3
|
1.3.11 |
A tool for unsupervised clustering and analysis of single cell RNA-Seq data.
Kiselev VY, Kirschner K, Schaub MT, Andrews T, Yiu A, Chandra T, Natarajan KN, Reik W, Barahona M, Green AR and Hemberg M (2016). “SC3 - consensus clustering of single-cell RNA-Seq data.” bioRxiv. doi: 10.1101/036558, http://biorxiv.org/content/early/2016/09/02/036558. |
R |
MDS [MASS Package]
|
7.3-45 |
Functions and datasets to support Venables and Ripley, "Modern Applied Statistics with S" (4th edition, 2002).
Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition.
Springer, New York. ISBN 0-387-95457-0 |
R |
PAM / Silhouette Plot [cluster package]
|
2.0.6 |
Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) "Finding Groups in Data".
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.(2016). cluster: Cluster
Analysis Basics and Extensions. |
R |
ZIFA
|
0.1 |
Emma Pierson and Christopher YauEmail, ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis, Genome Biology, 16:241, 2015, DOI: 10.1186/s13059-015-0805-z |
Python |
Rtsne
|
0.13 |
An R wrapper around the fast T-distributed Stochastic Neighbor Embedding implementation by Van der Maaten (see <https://github.com/lvdmaaten/bhtsne/> for more information on the original implementation).
L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008.
|
R |
Pagoda / SCDE
|
2.5.0 |
The scde package implements a set of statistical methods for analyzing single-cell RNA-seq data. scde fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The scde package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify and characterize putative cell subpopulations based on transcriptional signatures. The overall approach to the differential expression analysis is detailed in the following publication: "Bayesian approach to single-cell differential expression analysis" (Kharchenko PV, Silberstein L, Scadden DT, Nature Methods, doi: 10.1038/nmeth.2967). The overall approach to subpopulation identification and characterization is detailed in the following pre-print: "Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis" (Fan J, Salathia N, Liu R, Kaeser G, Yung Y, Herman J, Kaper F, Fan JB, Zhang K, Chun J, and Kharchenko PV, Nature Methods, doi:10.1038/nmeth.3734).
Peter V Kharchenko, Lev Silberstein and David T Scadden, Bayesian approach to single-cell differential expression analysis, Nature Methods 11, 740–742 (2014) doi:10.1038/nmeth.2967 |
R |
Limma / Voom
|
3.32.2 |
Data analysis, linear models and differential expression for microarray data.
Charity W Law, Yunshun Chen, Wei Shi and Gordon K Smyth, voom: precision weights unlock linear model analysis tools for RNA-seq read counts, Genome Biology, 15:R29, 2014 DOI: 10.1186/gb-2014-15-2-r29 |
R |
edgeR
|
3.18.1 |
Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. As well as RNA-seq, it be applied to differential signal analysis of other types of genomic data that produce read counts, including ChIP-seq, ATAC-seq, Bisulfite-seq, SAGE and CAGE.
Mark D. Robinson, Davis J. McCarthy, and Gordon K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data, Bioinformatics. 2010 Jan 1; 26(1): 139–140. doi: 10.1093/bioinformatics/btp616. PMCID: PMC2796818 |
R |
DESeq2
|
1.16.1 |
Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.
Love MI, Huber W and Anders S (2014). “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology, 15, pp. 550. doi: 10.1186/s13059-014-0550-8. |
R |
SCAN / UPC
|
2.18.0 |
SCAN is a microarray normalization method to facilitate personalized-medicine workflows. Rather than processing microarray samples as groups, which can introduce biases and present logistical challenges, SCAN normalizes each sample individually by modeling and removing probe- and array-specific background noise using only data from within each array. SCAN can be applied to one-channel (e.g., Affymetrix) or two-channel (e.g., Agilent) microarrays. The Universal exPression Codes (UPC) method is an extension of SCAN that estimates whether a given gene/transcript is active above background levels in a given sample. The UPC method can be applied to one-channel or two-channel microarrays as well as to RNA-Seq read counts. Because UPC values are represented on the same scale and have an identical interpretation for each platform, they can be used for cross-platform data integration.
Piccolo SR, Withers MR, Francis OE, Bild AH and Johnson WE (2013). “Multi-platform single-sample estimates of transcriptional activation.” Proceedings of the National Academy of Sciences of the United States of America, 110(44), pp. 17778-17783. doi: 10.1016/j.ygeno.2012.08.003. |
R |
data.table
|
1.10.4 |
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
https://github.com/Rdatatable/data.table/wiki |
R |
Gene Ontology
|
2017-Jun |
|
Database |
KEGG
|
2016-Nov |
|
Database |
MSigDB from GSEA
|
2016-Nov |
|
Database |
Gene Atlas
|
2016-Nov |
|
Database |
Ensembl
|
2017-Mar |
Ensembl Database (GTF files) |
Database |