next up previous contents
Next: Creating Raw Data Files Up: Reading Raw Data Files Previous: The ReadRawFile command

   
Reading from Pipes

The readlines, OpenPipe, and ReadLine commands allow one to read a file in line by line. The are extremely useful when attempting to transform extremely large data files that can not fit into the memory offered by your system. This is typically the case when transforming genetic databases (especially DNA databases such as EMBL) into the Darwin format.

The readlines and OpenPipe both prepare a file to be read via a pipe line by line.
readlines(fn : string)
The readlines function accepts a filename (or path and file name) fn as a parameter and establishes a pipe7.1 between Darwin and this file.
OpenPipe(cmd : string)
The OpenPipe command accepts a string item cmd as a parameter which it passes to the operating system to execute. If the operating system is successful, OpenPipe acts in the same manner as readlines and establishes a pipe between Darwin and the result of cmd. If the operating system reports an error while attempt to execute cmd, OpenPipe echos this error back to the terminal.

Note, we are only allowed to establish one pipe at a time. However, the pipe does not effect any redirection of input/output caused by WriteFile/AppendFile commands.

> readlines('Sample/P00519');          # establish a pipe between Darwin and this file.
>                                      # it can now be read line by line
> OpenPipe('zcat Sample/P00524.Z');
>               # via the Unix command 'zcat', decompress this file and establish a pipe.

Once a file has been opened and a pipe established, we can read it line by line via the ReadLine command.
ReadLine(p);

The ReadLine command reads from the open pipe up until the next (new line) character (\n). Typically, the open pipe is the standard input stream (the keyboard). The readlines and OpenPipe redirect the pipe to a file. We can read the first ten lines of the compressed file Sample/P00524.Z as follows:

> ShortRead := proc( filename )
>   OpenPipe('zcat '.filename);
>   for i from 1 to 10 do
>     t := ReadLine();
>     lprint(i, ':', t);
>   od;
> end:

> ShortRead('Sample/P00524.Z');
1 :    SRC_RSVSR      STANDARD;      PRT;   526 AA.
2 : AC   P00524;
3 : DT   21-JUL-1986 (REL. 01, CREATED)
4 : DT   01-JUL-1989 (REL. 11, LAST SEQUENCE UPDATE)
5 : DT   01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE)
6 : DE   TYROSINE-PROTEIN KINASE TRANSFORMING PROTEIN SRC (EC 2.7.1.112) (P60-
7 : DE   SRC).
8 : GN   V-SRC.
9 : OS   ROUS SARCOMA VIRUS (STRAIN SCHMIDT-RUPPIN).
10 : OC   VIRIDAE; SS-RNA ENVELOPED VIRUSES; POSITIVE-STRAND; RETROVIRIDAE;






The readlines, OpenPipe and ReadLine problems have displayed irregular behaviour at times. We have found most (if not all) of these problems disappear when you execute these commands from within a procedure. There are some very subtle differences between executing a statement directly into the global Darwin environment and executing a statement from within a procedure. At the time you define a procedure, Darwin does some rather extensive analysis of your routine. Beyond issues such as the scoping of local and global variables, it ``looks beyond'' ReadLine commands. When executing a statement outside of a procedure, no such ``look ahead'' is possible and it can become confused.



next up previous contents
Next: Creating Raw Data Files Up: Reading Raw Data Files Previous: The ReadRawFile command
Gaston Gonnet
1998-09-15