next up previous contents
Next: Class section Up: Data file format Previous: Pairing mask   Contents


Molecular sequences

The third section of the data file contains the molecular sequences. Indels (-) and ambiguities (purine (R), pyrimidine (Y), unknown(N or ?) ) are allowed. Sequences can be written in one of two formats. The first is the non-interleaved format. This consists of an identifying label for each sequence followed by the whole sequence. An example is:
2 8 DNA

Mouse ACCGUGGU
  UCCAUAAA
Rat ACUGUGGC
  UCGAUAUA

There can be no spaces in the label though the sequence itself can be formatted into blocks using multiple lines and spaces. An alternate way of specifying the sequences is using the interleaved format. This enables the sequences to be split into homologous blocks. The non-interleaved example given above could equivalently be written:
2 8 DNA

Mouse ACCG
Rat ACUG
UGGUUCCAUAAA  
UGGCUCGAUAUA  

Notice that only the first interleaved block should contain labels. Subsequent interleaved blocks are assumed to have the same labels and to be in the same order.


next up previous contents
Next: Class section Up: Data file format Previous: Pairing mask   Contents
Gowri-Shankar Vivek 2003-04-24