3.18. Controlling Data Flow¶
3.18.1. Video for this section¶
Video for this section is not available yet.
3.18.2. Stdout and Stdin¶
Most processes initiated by Unix commands write their output to the standard
output channel (offically called stdout
), that is, they write to the
terminal screen. Similarly most programs take their input from the standard
input, (stdin
) i.e., they read input from the keyboard. There is also the
standard error , where processes write their error messages, which is by
default, also the terminal screen.
Here is one rather abstract example.
We have already seen how to use the ``cat` command to write the contents of a file to the screen.
This time however, type cat
without specifing a file to read. Just type:
cat
Without specifying a file, the cat
program has no content to print to the screen.
After typing cat
and ENTER, the cursor goes to the beginning of the next line and waits.
sci[~]>cat
Type a few words on the keyboard and press the ENTER key. You should see your words as you type, then each time you hit ENTER, the word(s) you typed should be printed again.
sci[~]>cat
Here is some text[ENTER]
Here is some text
and some more[ENTER]
and some more
you get the idea[ENTER]
you get the idea
Finally, type Ctrl-d
(the CTRL
key and the “d” key, simultaneously).
What happened?
If you run the cat
command without specifing a file to read, then
instead of reading from a file, it reads from stdin (standard input, i.e. the
keyboard), then writes to stdout (standard output, the screen), until
receiving the End of File signal (CTRL-D
).
In Unix, we can redirect both the input and the output of commands, which can be VERY powerful.
3.18.3. Redirecting the Output: >
¶
We use the > symbol to redirect the output of a command to a file.
In your unixplay/
directory is a little program called squares.py
which simply prints out integers and their squares.
cd
to your unixplay/
directory, and do a long list (use your alias``ll``, or do
ls -lF
). Notice that squares.py
has is shown with a trailing \*
sci[unixplay]>ll
total 44
-rw-rw-r--. 1 jhetrick jhetrick 22 Sep 24 15:58 anotherFile
drwxrwxr-x. 2 jhetrick jhetrick 4096 Sep 24 15:58 dirONE/
drwxrwxr-x. 3 jhetrick jhetrick 4096 Sep 24 15:58 dirTWO/
-rw-rw-r--. 1 jhetrick jhetrick 53 Sep 24 15:58 file_unixplay.txt
drwxrwxr-x. 2 jhetrick jhetrick 4096 Sep 24 15:58 gammaData/
-rw-rw-r--. 1 jhetrick jhetrick 292 Sep 24 15:58 ints.dat
-rw-rw-r--. 1 jhetrick jhetrick 4273 Sep 24 15:58 scifi_list.txt
-rwxrw-r--. 1 jhetrick jhetrick 468 Nov 21 10:07 squares.py*
-rw-rw-r--. 1 jhetrick jhetrick 38 Sep 24 15:58 testfile
-rw-rw-r--. 1 jhetrick jhetrick 26 Sep 24 15:58 yetanother
We’ve alread met the trailing /
decoration, indicating a directory, as well
as the trailing \@
showing links (shortcuts). Now you are seeing the \*
which
indicates that the file is executable–it’s a program or script that can be
run, and it will do something.
Go ahead and type
squares.py
If called with no arguments (i.e. just by itself), it prints the first 10 integers and their squares. (Did it?)
If called with a min and max, like this:
squares.py 20 30
it prints the integers and squares between the min and max (20 and 30 in this case) numbers, as shown above. These numbers are called the arguments to the command. The output is shown below.
20 400
21 441
22 484
23 529
24 576
25 625
26 676
27 729
28 784
29 841
30 900
Now let’s redirect the output to a file. Type this:
squares.py > squares.out
Verify that the output file is there by listing (l
) and then have a look
at the file squares.out
with less
. Remember, you have an alias for less
in your .bashrc
file. You can use the alias m
for less
, so typing
m squares.out
should show you the first 10 integers and their squares, in your newly created data file,
squares.out
.
When doing scientific computing we will often use this method to catch
the output of a program in a file. We can then use other tools on the file, such as sort
or plot the data with gnuplot
.
Try it again, but give squares.py
some arguments.
Catch the output in the same file as above, again:
squares.py 20 30 > squares.out
Check the output file again with less
(or m
).
Notice that the previous squares.out
file, with integers 1 through 10, has been overwritten
by the second use of the redirect >
.
Note
It is important to remember that when using >
by itself, the
redirection output file is first cleared before the output is collected.
Your previous file “squares.out” has been lost and replaced with
the new output.
3.18.4. Append Output to a File: >>
¶
If instead of using >
for redirection, we can use the double
redirect symbol: >>
to append data to an existing file, as
opposed to overwritting it. This is very handy if we do a computation
over and over and want to add the result to the end of the output file
after each iteration.
Look at your squares.out
file with less
.
It should contain the squares of the integers from 20 to 30.
Now, do
squares.py 31 40 >> squares.out
Verify that the file squares.out
now contains the integers and squares from 20 to 40.
The last command appended the results from 31 to 40 to the existing
file squares.out
which already had 20 through 30.
Note
Using >>
does NOT erase, then rewrite the output file. It preserves the output file,
and adds to it.
3.18.5. Redirecting the Input: <
¶
We can use the <
symbol to redirect the input to a command from somewhere
else, usually a file.
In the unixplay/
directory is another executable file called: cubes.py
Run this program by typing cubes.py
.
This time, the program will prompt you for input. Instead of taking command line arguments
like squares.py
above, it asks you to input the numbers:
sci[unixplay]>cubes.py
Enter MIN:
Enter a small integer, like 3, and then ENTER.
sci[unixplay]>cubes.py
Enter MIN: 3
Enter MAX:
Do the same for the maximum integer (enter a number greater than the MIN integer).
sci[tmp]>cubes.py
Enter MIN: 3
Enter MAX: 8
3 27
4 64
5 125
6 216
7 343
8 512
The program outputs the cubes of the integers from MIN to MAX.
This program is a little different than squares.py
. squares.py
used “command line arguments”: items given on the command line after
the name of the program that are passed to the program to modify its
behavior.
cubes.py
prompts you for input, asking you questions, to which
you input data. So it is expecting YOU to give it data, via
stdin—the keyboard
We can put this input data intp a file, then redirect the input to
cubes.py
from that file.
First, let’s put the input numbers in a file. Since we haven’t learned
how to edit a text file yet, we can use cat
as we
did above. Without specifying an output file, cat
, by itself,
will read from stdin (the keybord) and write to stdout (the screen).
We can use output redirection, >
, to redirect that output to a
file!
When we run cubes.py
, it wants us to input two numbers, the MIN and MAX integers.
So, we want to have a file that contains the two numbers that we
would input to cubes.py
if we were typing those inputs by hand
from the keyboad. In our example above I used 3
and 8.
We type cat > in.cubes
with no filename specified for cat to read,
then redirect the output of cat to a file called in.cubes
.
``sci[unixplay]> cat > in.cubes`` [ENTER]
The cursor will go the beginning of the next line, like it did above, as cat
waits for
you to type something.
So, type
3
8
[CTRL-D]
Now, do an listing to see that your new file, in.cubes
is there.
If you want to see what the last file produced was (it should be in.cubes
), use the
lt
alias I put in your .bashrc file. Do
lt
and you should see in.cubes
at the top of the list.
Have a look at in.cubes
with less
. It should contain two lines, with: 3 and 8.
Now use this file as the input to cubes.py
by redirecting input using <
sci[unixplay]>cubes.pl < in.cubes
Enter MIN: Enter MAX:
3 27
4 64
5 125
6 216
7 343
8 512
This should produce the integers from 0 to 20 and their cubes. Does it?
Notice that when cubes.py
runs, it prints the prompts “Enter MIN:” and “Enter MAX:”, but
now reads from the file in.cubes
instead of stdin (the keyboard).
We used the <
to redirect the input to cubes.py
from a file instead of from stdin.
3.18.6. < and >¶
You can use both input and output redirection at the same time! (I can hear the sound of your mind being blown).
command < in.file > out.file
will make command read input from the file in.file and catch the output in the file out.file.
Try this yourself. Use input and output redirection as described above to save the cubes of
integers from 0 to 2 to a file called cubes_0_20.out
.
3.19. The Pipe: |¶
Previously we learned a few really useful commands: grep, sort, and wc. Be assured, there are many more.
It turns out that we can string these together so that the output of one command serves as the input to the next, using the ” pipe”. This is kind of like redirecting the output and input to a file, but instead of using files on the hard drive, we can pass all the data between programs in memory (which is much faster and more convenient).
The “pipe” is the vertical bar “|” on your keyboad, above the “\” backslash.
Recall our file, scifi_list.txt
, which contains the year of
publication, rank, author, and title of 100 of the best science
fiction works. Let’s ask some questions:
Suppose you want to know the rank in popularity (column 2 in
scifi_list.txt
) of the books by Robert Heinlein . Have a look at
the file to remind yourself what’s there. Heinlein’s books are mixed
in with the rest–randomly. grep comes to the rescue:
grep Heinlein scifi_list.txt
produces
1961 6 Heinlein, Robert A Stranger in a Strange Land
1973 41 Heinlein, Robert A Time Enough For Love
1958 93 Heinlein, Robert A Have Space-Suit - Will Travel
1957 84 Heinlein, Robert A Citizen Of the Galaxy
1951 88 Heinlein, Robert A The Puppet Masters
1956 80 Heinlein, Robert A The Door Into Summer
1966 17 Heinlein, Robert A The Moon is a Harsh Mistress
1959 12 Heinlein, Robert A Starship Troopers
Then, to get these ordered by popularity rank, you could redirect the output
of the grep
command to a file, then sort the file in a two step process, like this:
grep Heinlein scifi_list.txt > heinlein.out
sort -k 2 -n heinlein.out
However, the pipe allows you to do this in one step:
grep Heinlein scifi_list.txt | sort -k 2 -n
which yields
1961 6 Heinlein, Robert A Stranger in a Strange Land
1959 12 Heinlein, Robert A Starship Troopers
1966 17 Heinlein, Robert A The Moon is a Harsh Mistress
1973 41 Heinlein, Robert A Time Enough For Love
1956 80 Heinlein, Robert A The Door Into Summer
1957 84 Heinlein, Robert A Citizen Of the Galaxy
1951 88 Heinlein, Robert A The Puppet Masters
1958 93 Heinlein, Robert A Have Space-Suit - Will Travel
We “piped” the output of the grep command to the sort command, (where we used the switches “-k 2 -n” in order to sort on the second “k“olumn in numerical order).
We can even chain pipes together:
grep Heinlein scifi_list.txt | sort -k 2 -n | wc
produces simply
8 73 402
The output of the sorted grep contains 8 lines, 73 words, 402 bytes.
3.20. Command Summary¶
Data Flow | Meaning |
---|---|
command > file | redirect output to a file (catch/save output of command) |
command >> file | catch and append output of command to the end of a file |
command < file | read input for a command from a file |
cmd1 | cmd2 | pipe the output of cmd1 to the input of cmd2 |