The readlines, OpenPipe, and ReadLine commands allow one to read a file in line by line. The are extremely useful when attempting to transform extremely large data files that can not fit into the memory offered by your system. This is typically the case when transforming genetic databases (especially DNA databases such as EMBL) into the Darwin format.
The readlines and OpenPipe both prepare a file to be
read via a pipe line by line.
readlines(fn : string)
The readlines function accepts a filename (or path and file
name) fn as a parameter and establishes a pipe7.1 between Darwin and this file.
OpenPipe(cmd : string)
The OpenPipe command accepts a string item cmd as a
parameter which it passes to the operating system to execute.
If the operating system is successful, OpenPipe acts in the same manner as readlines and
establishes a pipe between Darwin and the result of cmd. If
the operating system reports an error while attempt to execute cmd,
OpenPipe echos this error back to the terminal.
Note, we are only allowed to establish one pipe at a time. However, the pipe does not effect any redirection of input/output caused by WriteFile/AppendFile commands.
> readlines('Sample/P00519'); # establish a pipe between Darwin and this file. > # it can now be read line by line
> OpenPipe('zcat Sample/P00524.Z'); > # via the Unix command 'zcat', decompress this file and establish a pipe.
Once a file has been opened and a pipe established, we can read it
line by line via the ReadLine command.
ReadLine(p);
The ReadLine command reads from the open pipe up until the next
(new line) character (\n
). Typically, the open pipe is the
standard input stream (the keyboard). The readlines and OpenPipe redirect the pipe to a file.
We can read the first ten lines of the compressed file Sample/P00524.Z
as follows:
> ShortRead := proc( filename ) > OpenPipe('zcat '.filename); > for i from 1 to 10 do > t := ReadLine(); > lprint(i, ':', t); > od; > end: > ShortRead('Sample/P00524.Z'); 1 : SRC_RSVSR STANDARD; PRT; 526 AA. 2 : AC P00524; 3 : DT 21-JUL-1986 (REL. 01, CREATED) 4 : DT 01-JUL-1989 (REL. 11, LAST SEQUENCE UPDATE) 5 : DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 6 : DE TYROSINE-PROTEIN KINASE TRANSFORMING PROTEIN SRC (EC 2.7.1.112) (P60- 7 : DE SRC). 8 : GN V-SRC. 9 : OS ROUS SARCOMA VIRUS (STRAIN SCHMIDT-RUPPIN). 10 : OC VIRIDAE; SS-RNA ENVELOPED VIRUSES; POSITIVE-STRAND; RETROVIRIDAE;