YASMA - Yet Another Statistical Microarray Analysis
YASMA - Yet Another Statistical Microarray Analysis
One problem arising in the analysis of gene expression data from
microarrays is the identification of differentially expressed
genes. There exist some popular rules, for example, to accept two-fold
over or under-expression as significant. Though such rules might be
quite useful in providing a quick overview, statistics has to offer a
more thorough approach. By considering the amount of variability
between replicates of the same experiments, a statistical significance
in form of p-values can be assigned to differential gene
expression levels.
YASMA is an add-on library for the
R statistical package
and can be used to analyse simple replicated experiments. For
example, we are interested in bacterial genes over- or under-expressed in
mutants as compared to the wild type. For this purpose, multiple mRNA
preparations are hybridized on several arrays. As long as the
same number of arrays is used for each preparation a straightforward
ANOVA analysis and analysis of variance components
can be applied to the series of experiments (a balanced
factorial design).
At the moment the package contains
- Routines for inspecting the correlation between array
replicates and for removing low expression genes that cause
low correlation.
- A method for interpolating missing data using estimates
from an ANOVA analysis.
- Routines for fast ANOVA analysis of data typical for
microarrays experiments (balanced factorial designs with nested
factors). The design matrix for such analysis is usually very large
(due to the effects involving genes) and cannot be dealt with
efficiently by standard ANOVA routines based on design matrix
evaluations.
- Routines for an analysis of variance components:
cultures, arrays, etc are random representatives of possible
experiments and need to be analyzed in a slightly different way than
by simple ANOVA.
- A routine to caculate the optimal design of an experiment based
on relative costs of cultures, arrays, etc, and on estimates of
variability from a prior experiment.
- Calculation of p-values derived from the results
of the analysis of variance components. If residuals show different
amounts of variance (which is often the case), hierarchical
bootstrapping of residuals is a more reliable way to derive p-values.
- Implementations of additional standard tests (t-statistic) and
of Newton's et. al hierarchical model methods (see below the
Statistics Notes) for multiple experiments.
As the name suggests YASMA is intendended as a complement to the
SMA
microarray analysis package. High on my list of future extensions are
the incorporation of regression analysis for time series data
and extensions to more general designs (eg, latin square design).
Downloads
-
yasma_0.20.tar.gz (17/02/03),
an
R library for the
ANOVA analysis of replicated microarray experiments (balanced
factorial designs only)
-
YASMA tutorial (v0.19,14/05/02)
, which provides a quick tour through a
typical ANOVA analysis of microarray experiments
(A4 version for printing).
-
replicates.pdf (14/05/02), a manuscript explaining our
approach to normalization, filtering, ANOVA analysis, analysis of
variance components, and significance analysis for differential
expression in detail.
-
Statistics notes, which are primarily for my own
reference, but other people may find them useful as well. I will
update them continuously. At the moment the notes contain some background
and additional explanations of the hierarchical models approach by
M.A. Newton et. al., On differential variability of
expression ratios: Improving statistical inference about gene
expression changes from microarray data, Journal of Computational
Biology 8:37-52, 2001
(abstract,
paper)
and how to extend it to a multiple slide analysis.
This is joint work with
Sharon Kendall,
Neil Stoker,
Shamit Soneji, and the
Bugs group
Lorenz Wernisch
May 2001