DKFZ
Molecular Genome Analysis
 

3 of 5

Documentation

Purpose Formats Syntax

Publication


Main page
Documentation






Formats



There are different sequence formats and pattern formats available, which both can be combined together arbitrarily:


Sequence input
Importing sequence data, you can choose between fasta format - with one or more record sets - or the import of exactly one sequence without header. As general rule, empty lines are tolerated between different sequences. No empty line is tolerated between header and sequence. The sequence can be put in as continuous text or with line feed. (All three sequence formats are tolerated in each pattern mode).

Simply sequence
datadatadadada

Fasta
>header
datadatadata

(Multi-) Fasta
>header
datadatadata
>header
datadatadata



Pattern input
Equal the input of sequence data, the pattern data can be imported in fasta format or as record sets without header line. In addition there is the new feature to group similar patterns. In this case the output page will consequently show the list of results ordered for these groups. However, at first one of these individual pattern formats (modes) have to be chosen via the check boxes, which are localized above the pattern import field in the 3of5 interface. (This is in contrast to sequence formats which all are always applicable as mentioned above.)

Only text mode
Purpose of this quick mode is to make straightforward sequence searches. More than one pattern record is possible.
A[KL].L

Fasta mode
Also pattern inputs in multifasta format are possible. No empty line between header and data is tolerated.
>kinase pattern 1
[KL].{1,3}[KL]

>kinase pattern 2
[KL].{1,4}[PL]


Fasta grouped
Purpose is to put single patterns in groups with common properties, e.g. there is a set of patterns which contains also several kinase patterns. Then you can group all kinase patterns together by beginning with the special header line ">>". As a result, the ouput page will list all kinase matches together.

>>kinase patterns

>kinase pattern 1
[KL].{1,3}[KL]

>kinase pattern 2
[KL].{1,4}[PL]

>>all others
>nls
[KR][KR]


Which characters are tolerated ?

In sequence input
Header: nearly all characters are tolerated
Data: only characters of the amino acid alphabet and the special character "." are allowed

In pattern input
Header: no special bracket symbols are allowed
Data: only pattern characters in pattern syntax are tolerated

Principal use of space characters
The usage of space characters may be arbitrary for the purpose of an enhanced overview of the pattern syntax (especially brackets) in the input form. These space characters will be removed automatically by 3of5 in following processing steps