3.18. Controlling Data Flow

../_images/magritte-not-a-pipe.jpg

3.18.1. Video for this section

Video for this section is not available yet.

3.18.2. Stdout and Stdin

Most processes initiated by Unix commands write their output to the standard output channel (offically called stdout), that is, they write to the terminal screen. Similarly most programs take their input from the standard input, (stdin) i.e., they read input from the keyboard. There is also the standard error , where processes write their error messages, which is by default, also the terminal screen.

Here is one rather abstract example.

We have already seen how to use the ``cat` command to write the contents of a file to the screen.

This time however, type cat without specifing a file to read. Just type:

cat

Without specifying a file, the cat program has no content to print to the screen.

After typing cat and ENTER, the cursor goes to the beginning of the next line and waits.

sci[~]>cat

Type a few words on the keyboard and press the ENTER key. You should see your words as you type, then each time you hit ENTER, the word(s) you typed should be printed again.

sci[~]>cat
Here is some text[ENTER]
Here is some text
and some more[ENTER]
and some more
you get the idea[ENTER]
you get the idea

Finally, type Ctrl-d (the CTRL key and the “d” key, simultaneously).

What happened?

If you run the cat command without specifing a file to read, then instead of reading from a file, it reads from stdin (standard input, i.e. the keyboard), then writes to stdout (standard output, the screen), until receiving the End of File signal (CTRL-D).

In Unix, we can redirect both the input and the output of commands, which can be VERY powerful.


3.18.3. Redirecting the Output: >

We use the > symbol to redirect the output of a command to a file. In your unixplay/ directory is a little program called squares.py which simply prints out integers and their squares.

cd to your unixplay/ directory, and do a long list (use your alias``ll``, or do ls -lF). Notice that squares.py has is shown with a trailing \*

sci[unixplay]>ll
total 44
-rw-rw-r--. 1 jhetrick jhetrick   22 Sep 24 15:58 anotherFile
drwxrwxr-x. 2 jhetrick jhetrick 4096 Sep 24 15:58 dirONE/
drwxrwxr-x. 3 jhetrick jhetrick 4096 Sep 24 15:58 dirTWO/
-rw-rw-r--. 1 jhetrick jhetrick   53 Sep 24 15:58 file_unixplay.txt
drwxrwxr-x. 2 jhetrick jhetrick 4096 Sep 24 15:58 gammaData/
-rw-rw-r--. 1 jhetrick jhetrick  292 Sep 24 15:58 ints.dat
-rw-rw-r--. 1 jhetrick jhetrick 4273 Sep 24 15:58 scifi_list.txt

-rwxrw-r--. 1 jhetrick jhetrick  468 Nov 21 10:07 squares.py*

-rw-rw-r--. 1 jhetrick jhetrick   38 Sep 24 15:58 testfile
-rw-rw-r--. 1 jhetrick jhetrick   26 Sep 24 15:58 yetanother

We’ve alread met the trailing / decoration, indicating a directory, as well as the trailing \@ showing links (shortcuts). Now you are seeing the \* which indicates that the file is executable–it’s a program or script that can be run, and it will do something.

Go ahead and type

squares.py

If called with no arguments (i.e. just by itself), it prints the first 10 integers and their squares. (Did it?)

If called with a min and max, like this:

squares.py 20 30

it prints the integers and squares between the min and max (20 and 30 in this case) numbers, as shown above. These numbers are called the arguments to the command. The output is shown below.

20      400
21      441
22      484
23      529
24      576
25      625
26      676
27      729
28      784
29      841
30      900

Now let’s redirect the output to a file. Type this:

squares.py > squares.out

Verify that the output file is there by listing (l) and then have a look at the file squares.out with less. Remember, you have an alias for less in your .bashrc file. You can use the alias m for less, so typing

m squares.out

should show you the first 10 integers and their squares, in your newly created data file, squares.out.

When doing scientific computing we will often use this method to catch the output of a program in a file. We can then use other tools on the file, such as sort or plot the data with gnuplot.

Try it again, but give squares.py some arguments. Catch the output in the same file as above, again:

squares.py 20 30 > squares.out

Check the output file again with less (or m).

Notice that the previous squares.out file, with integers 1 through 10, has been overwritten by the second use of the redirect >.

Note

It is important to remember that when using > by itself, the redirection output file is first cleared before the output is collected. Your previous file “squares.out” has been lost and replaced with the new output.

3.18.4. Append Output to a File: >>

If instead of using > for redirection, we can use the double redirect symbol: >> to append data to an existing file, as opposed to overwritting it. This is very handy if we do a computation over and over and want to add the result to the end of the output file after each iteration.

Look at your squares.out file with less. It should contain the squares of the integers from 20 to 30. Now, do

squares.py 31 40 >> squares.out

Verify that the file squares.out now contains the integers and squares from 20 to 40. The last command appended the results from 31 to 40 to the existing file squares.out which already had 20 through 30.

Note

Using >> does NOT erase, then rewrite the output file. It preserves the output file, and adds to it.

3.18.5. Redirecting the Input: <

We can use the < symbol to redirect the input to a command from somewhere else, usually a file.

In the unixplay/ directory is another executable file called: cubes.py

Run this program by typing cubes.py.

This time, the program will prompt you for input. Instead of taking command line arguments like squares.py above, it asks you to input the numbers:

sci[unixplay]>cubes.py
Enter MIN:

Enter a small integer, like 3, and then ENTER.

sci[unixplay]>cubes.py
Enter MIN: 3
Enter MAX:

Do the same for the maximum integer (enter a number greater than the MIN integer).

sci[tmp]>cubes.py
Enter MIN: 3
Enter MAX: 8

3       27
4       64
5       125
6       216
7       343
8       512

The program outputs the cubes of the integers from MIN to MAX.


This program is a little different than squares.py. squares.py used “command line arguments”: items given on the command line after the name of the program that are passed to the program to modify its behavior.

cubes.py prompts you for input, asking you questions, to which you input data. So it is expecting YOU to give it data, via stdin—the keyboard

We can put this input data intp a file, then redirect the input to cubes.py from that file.

First, let’s put the input numbers in a file. Since we haven’t learned how to edit a text file yet, we can use cat as we did above. Without specifying an output file, cat, by itself, will read from stdin (the keybord) and write to stdout (the screen). We can use output redirection, >, to redirect that output to a file!

When we run cubes.py, it wants us to input two numbers, the MIN and MAX integers.

So, we want to have a file that contains the two numbers that we would input to cubes.py if we were typing those inputs by hand from the keyboad. In our example above I used 3 and 8.

We type cat > in.cubes with no filename specified for cat to read, then redirect the output of cat to a file called in.cubes.

``sci[unixplay]> cat > in.cubes`` [ENTER]

The cursor will go the beginning of the next line, like it did above, as cat waits for you to type something.

So, type

3
8
[CTRL-D]

Now, do an listing to see that your new file, in.cubes is there.

If you want to see what the last file produced was (it should be in.cubes), use the lt alias I put in your .bashrc file. Do

lt

and you should see in.cubes at the top of the list.

Have a look at in.cubes with less. It should contain two lines, with: 3 and 8.


Now use this file as the input to cubes.py by redirecting input using <

sci[unixplay]>cubes.pl < in.cubes
Enter MIN: Enter MAX:
3       27
4       64
5       125
6       216
7       343
8       512

This should produce the integers from 0 to 20 and their cubes. Does it?

Notice that when cubes.py runs, it prints the prompts “Enter MIN:” and “Enter MAX:”, but now reads from the file in.cubes instead of stdin (the keyboard).

We used the < to redirect the input to cubes.py from a file instead of from stdin.

3.18.6. < and >

You can use both input and output redirection at the same time! (I can hear the sound of your mind being blown).

command < in.file > out.file

will make command read input from the file in.file and catch the output in the file out.file.

Try this yourself. Use input and output redirection as described above to save the cubes of integers from 0 to 2 to a file called cubes_0_20.out.

3.19. The Pipe: |

Previously we learned a few really useful commands: grep, sort, and wc. Be assured, there are many more.

It turns out that we can string these together so that the output of one command serves as the input to the next, using the ” pipe”. This is kind of like redirecting the output and input to a file, but instead of using files on the hard drive, we can pass all the data between programs in memory (which is much faster and more convenient).


The “pipe” is the vertical bar “|” on your keyboad, above the “\” backslash.

Recall our file, scifi_list.txt, which contains the year of publication, rank, author, and title of 100 of the best science fiction works. Let’s ask some questions:

Suppose you want to know the rank in popularity (column 2 in scifi_list.txt) of the books by Robert Heinlein . Have a look at the file to remind yourself what’s there. Heinlein’s books are mixed in with the rest–randomly. grep comes to the rescue: grep Heinlein scifi_list.txt produces

1961    6       Heinlein, Robert A      Stranger in a Strange Land
1973    41      Heinlein, Robert A      Time Enough For Love
1958    93      Heinlein, Robert A      Have Space-Suit - Will Travel
1957    84      Heinlein, Robert A      Citizen Of the Galaxy
1951    88      Heinlein, Robert A      The Puppet Masters
1956    80      Heinlein, Robert A      The Door Into Summer
1966    17      Heinlein, Robert A      The Moon is a Harsh Mistress
1959    12      Heinlein, Robert A      Starship Troopers

Then, to get these ordered by popularity rank, you could redirect the output of the grep command to a file, then sort the file in a two step process, like this:

grep Heinlein scifi_list.txt > heinlein.out
sort -k 2 -n heinlein.out

However, the pipe allows you to do this in one step:

grep Heinlein scifi_list.txt | sort -k 2 -n

which yields

1961    6       Heinlein, Robert A      Stranger in a Strange Land
1959    12      Heinlein, Robert A      Starship Troopers
1966    17      Heinlein, Robert A      The Moon is a Harsh Mistress
1973    41      Heinlein, Robert A      Time Enough For Love
1956    80      Heinlein, Robert A      The Door Into Summer
1957    84      Heinlein, Robert A      Citizen Of the Galaxy
1951    88      Heinlein, Robert A      The Puppet Masters
1958    93      Heinlein, Robert A      Have Space-Suit - Will Travel

We “piped” the output of the grep command to the sort command, (where we used the switches “-k 2 -n” in order to sort on the second “k“olumn in numerical order).

We can even chain pipes together:

grep Heinlein scifi_list.txt | sort -k 2 -n | wc

produces simply

8      73     402

The output of the sorted grep contains 8 lines, 73 words, 402 bytes.

3.20. Command Summary

Data Flow Meaning
command > file redirect output to a file (catch/save output of command)
command >> file catch and append output of command to the end of a file
command < file read input for a command from a file
cmd1 | cmd2 pipe the output of cmd1 to the input of cmd2

3.21. Homework

Homework 3 is here