Retrieve upstream sequences
Extract upstream sequences
There are currently a few genomes in our PostgreSQL database. The data
are from Genbank.
- Retrieval of more than the direct upstream sequence of a gene
is possible. This is sometimes necessary if the gene belongs to an
operon.
- gap
Defines the operon as a chain of
genes where tandem genes must have smaller gaps between them than this
value.
- min gap
If a gap between genes of an operon is
smaller than this value it is not included as one of the upstream
sequences, otherwise it is.
- Max operon seq
The operon upstream sequence is the
upstream sequence of the whole operon (the last upstream sequence in
the chain). One can set a maximum extraction length for it.
- Min operon seq
The operon upstream sequence is not
printed if it is smaller than this value.
- The upstream of a gene is defined as the upstream
sequence of the gene and the upstream sequences of all the other genes before it in the
same operon. These are the extraction modes.
- Gene upstream sequences only
Only the immediate
upstream sequence (the first in the chain) is retrieved.
- All upstream sequences of gene in operon
The whole chain of upstream sequences is retrieved. The starting
sequence for the operon search is indicated in parenthese, eg,
(Rv0350/1), (Rv0350/2), ...
- Operon upstream sequence only
Only the
last gene in the chain is retrieved. The starting gene for the operon
search indicated in parentheses, eg, (Rv0350/o).
- Note that multiple sequences are suppressed. If several genes
share an operon, it might be that in the operon sequence retrieval mode
one upstream sequence is retrieved only for all of them.
Gibbs and Alignace
These are two Gibbs sampling programs: Gibbs and
AlignACE.
The programs are based on similar principles
(alignace taking a bit longer, doing more
iterations).
- width
The width of the expected motif.
- expected
The approximate number of motifs expected. Note
that one sequence may contain several occurences of the same motif or
none at all.
- seed
These are probabilistic programs using the date to
start the random number generator; to repeat a result
the seed needs to be noted after a run and entered explicitely next
time.
Palindrome
Palindrome first looks for approximate palindromes in upstream
sequences. Words that are highly palindromic are then used to extract
similar words in an iterated fashion to generate motifs. This is a
deterministic algorithm, so no seed is needed.
- width
The width of the expected motif.
- p-value palindrome
The more mirror sites there are in a palindrome the lower (better) the
p-value. For very small values only true palindromes are accepted at all.
- p-value iteration
A motif initially only consists of one palindrome. In an iteratitive
search it is expanded by further palindromes matching the growing
motif. The lower the iteration p-value the harder it is to find
matching palindromes, and consequently the motifs comprise fewer
sequences.
This algorithm is similar to one described in
M. S. Gelfand, E. V. Koonin, and A. A. Mironov, Prediction of
transcription regulatory sites in Archaea by a comparative genomic
approach,
Nucl. Acid Res., 28(3):695-705, 2000.
Lorenz Wernisch
July 2000