next up previous contents
Next: The String Function Up: Accessing a Darwin Sequence Previous: The Offset Structure

The Sequence Structure

We have already seen how sequences can be extracted using selection and the Entry structured type.
> Entry(4)['SEQ'];
MGAQQGKDRGAHSGGGGSGAPVSCIGLSSSPVASVSPHCISSSSGVSSAP ..(1520).. 
SLRQISNALNR
This statement returns a copy of what is found in Entry(4) of the Sample/SH2 database as a string.

However, it is sometimes convenient to instead reference a sequence in the database rather than making a copying of it. The Match structures which we explore in Chapter [*] - The Pairwise Comparison of Sequences require such references. To reference a sequence in Darwin, we provide the offset of the sequence from DB[string]. The Sequence structure allows us do this easily. Given a Entry structure, it returns the offset to the end of the opening <SEQ> tag for that entry in the form of an unevaluated Sequence function call.

> Sequence(Entry(1));
Sequence(367)
> Sequence(Entry(2));
Sequence(1338)
> Sequence(Entry(76, 77, 78));
Sequence(74267,75202,76197)

We can combine the Entry, Offset and Sequence structured types to return the offset of the sequence contained in an entry given only an offset from DB[string].

> offset_from_DF := 45000:
> entry_number := Entry(Offset(offset_from_DF));
entry_number := Entry(45)
> seq := Sequence(entry_number);
seq := Sequence(44545)
> seq;
Sequence(44545)
> print(seq);
MKERVKEMKVFGCRLNFWNHIGHEPDQFQNQRRQRRVLQPRIQRAAVSPNSSTTNSQ
FSLQHNSSGSLGGGVGGGLGGGGSLGLGGGGGGGGSCTPTSLQPQSSLTTFKQSPTL
LNGNGNLLDANMPGGIPTPGTPNSKAKDNSHFVKLVVALYLGKAIEGGDLSVGEKNA
EYEVIDDSQEHWWKVKDALGNVGYIPSNYVQAEALLGLERYEWYVGYMSRQRAESLL
KQGDKEGCFVVRKSSTKGLYTLSLHTKVPQSHVKHYHIKQNARCEYYLSEKHCCETI
PDLINYHRHNSGGLACRLKSSPCDRPVPPTAGLSHDKWEIHPIQLMLMEELGSGQFG
VVRRGKWRGSIDTAVKMMKEGTMSEDDFIEEAKVMTKLQHPNLVQLYGVCTKHRPIY
IVTEYMKHGSLLNYLRRHEKTLIGNMGLLLDMCIQVSKGMTYLERHNYIHRDLAARN
CLVGSENVVKVADFGLARYVLDDQYTSSGGTKFPIKWAPPEVLNYTRFSSKSDVWAY
GVLMWEIFTCGKMPYGRLKNTEVVERVQRGIILEKPKSCAKEIYDVMKLCWSHGPEE
RPAFRVLMDQLALVAQTLTD

Darwin offers a simpler way to find the offset of a sequence for an entry. Selecting on an Entry structure with option 'SequenceOffset' or, simply, 'SO' returns a Sequence structure containing the offset.

> Entry(1)['SO'];
Sequence(367);


next up previous contents
Next: The String Function Up: Accessing a Darwin Sequence Previous: The Offset Structure
Gaston Gonnet
1998-09-15