Controlling Data Flow ===================== .. image:: imgs/magritte-not-a-pipe.jpg :width: 300px :align: center Video for this section ---------------------- Video for this section is not available yet. Stdout and Stdin ---------------- Most processes initiated by Unix commands write their output to the **standard output** channel (offically called ``stdout``), that is, they write to the terminal screen. Similarly most programs take their input from the **standard input**, (``stdin``) i.e., they read input from the keyboard. There is also the standard error , where processes write their error messages, which is by default, also the terminal screen. Here is one rather abstract example. We have already seen how to use the ``cat` command to write the contents of a file to the screen. This time however, type ``cat`` without specifing a file to read. Just type: ``cat`` Without specifying a file, the ``cat`` program has no content to print to the screen. After typing ``cat`` and ENTER, the cursor goes to the beginning of the next line and waits. :: sci[~]>cat Type a few words on the keyboard and press the **ENTER** key. You should see your words as you type, then each time you hit ENTER, the word(s) you typed should be printed again. :: sci[~]>cat Here is some text[ENTER] Here is some text and some more[ENTER] and some more you get the idea[ENTER] you get the idea Finally, type ``Ctrl-d`` (the ``CTRL`` key *and* the "**d**" key, simultaneously). What happened? If you run the ``cat`` command without specifing a file to read, then instead of reading from a file, it reads from **stdin** (*standard input*, i.e. the **keyboard**), then writes to **stdout** (*standard output*, the **screen**), until receiving the **End of File** signal (``CTRL-D``). In Unix, we can **redirect both the input and the output of commands**, which can be VERY powerful. | Redirecting the Output: ``>`` ----------------------------- We use the **>** symbol to redirect the output of a command to a file. In your ``unixplay/`` directory is a little program called ``squares.py`` which simply prints out integers and their squares. ``cd`` to your ``unixplay/`` directory, and do a *long list* (use your alias``ll``, or do ``ls -lF``). Notice that ``squares.py`` has is shown with a trailing ``\*`` :: sci[unixplay]>ll total 44 -rw-rw-r--. 1 jhetrick jhetrick 22 Sep 24 15:58 anotherFile drwxrwxr-x. 2 jhetrick jhetrick 4096 Sep 24 15:58 dirONE/ drwxrwxr-x. 3 jhetrick jhetrick 4096 Sep 24 15:58 dirTWO/ -rw-rw-r--. 1 jhetrick jhetrick 53 Sep 24 15:58 file_unixplay.txt drwxrwxr-x. 2 jhetrick jhetrick 4096 Sep 24 15:58 gammaData/ -rw-rw-r--. 1 jhetrick jhetrick 292 Sep 24 15:58 ints.dat -rw-rw-r--. 1 jhetrick jhetrick 4273 Sep 24 15:58 scifi_list.txt -rwxrw-r--. 1 jhetrick jhetrick 468 Nov 21 10:07 squares.py* -rw-rw-r--. 1 jhetrick jhetrick 38 Sep 24 15:58 testfile -rw-rw-r--. 1 jhetrick jhetrick 26 Sep 24 15:58 yetanother We've alread met the trailing ``/`` decoration, indicating a *directory*, as well as the trailing ``\@`` showing *links* (shortcuts). Now you are seeing the ``\*`` which indicates that the file is **executable**--it's a program or *script* that can be run, and it will do something. Go ahead and type ``squares.py`` If called with no *arguments* (i.e. just by itself), it prints the first 10 integers and their squares. (**Did it?**) If called with a **min** and **max**, like this: ``squares.py 20 30`` it prints the integers and squares between the **min** and **max** (20 and 30 in this case) numbers, as shown above. These numbers are called the **arguments** to the command. The output is shown below. :: 20 400 21 441 22 484 23 529 24 576 25 625 26 676 27 729 28 784 29 841 30 900 Now let's **redirect the output to a file**. Type this: ``squares.py > squares.out`` Verify that the output file is there by listing (``l``) and then have a look at the file ``squares.out`` with ``less``. Remember, you have an alias for ``less`` in your ``.bashrc`` file. You can use the alias ``m`` for ``less``, so typing ``m squares.out`` should show you the first 10 integers and their squares, in your newly created data file, ``squares.out``. When doing scientific computing we will often use this method to catch the output of a program in a file. We can then use other tools on the file, such as ``sort`` or plot the data with ``gnuplot``. Try it again, but give ``squares.py`` some arguments. Catch the output in the same file as above, again: ``squares.py 20 30 > squares.out`` Check the output file again with ``less`` (or ``m``). Notice that the *previous* ``squares.out`` file, with integers 1 through 10, has been overwritten by the second use of the redirect ``>``. .. note:: It is important to remember that when using ``>`` by itself, the redirection output file is first cleared before the output is collected. Your previous file "squares.out" has been lost and replaced with the new output. Append Output to a File: ``>>`` ------------------------------- If instead of using ``>`` for redirection, we can use the *double* redirect symbol: ``>>`` to **append** data to an existing file, as opposed to overwritting it. This is very handy if we do a computation over and over and want to add the result to the end of the output file after each iteration. Look at your ``squares.out`` file with ``less``. It should contain the squares of the integers from 20 to 30. Now, do ``squares.py 31 40 >> squares.out`` Verify that the file ``squares.out`` now contains the integers and squares from 20 to 40. The last command **appended** the results from 31 to 40 to the existing file ``squares.out`` which already had 20 through 30. .. note:: Using ``>>`` does NOT erase, then rewrite the output file. It preserves the output file, and adds to it. Redirecting the Input: ``<`` ---------------------------- We can use the ``<`` symbol to redirect the input to a command *from* somewhere else, usually a file. In the ``unixplay/`` directory is another executable file called: ``cubes.py`` Run this program by typing ``cubes.py``. This time, the program will *prompt you for input*. Instead of taking *command line arguments* like ``squares.py`` above, it *asks* you to input the numbers: :: sci[unixplay]>cubes.py Enter MIN: Enter a small integer, like 3, and then ENTER. :: sci[unixplay]>cubes.py Enter MIN: 3 Enter MAX: Do the same for the maximum integer (enter a number greater than the MIN integer). :: sci[tmp]>cubes.py Enter MIN: 3 Enter MAX: 8 3 27 4 64 5 125 6 216 7 343 8 512 The program outputs the cubes of the integers from MIN to MAX. | This program is a little different than ``squares.py``. ``squares.py`` used "*command line arguments*": items given on the command line after the name of the program that are passed to the program to modify its behavior. ``cubes.py`` prompts you for input, asking you questions, to which you input data. So it is expecting YOU to give it data, via *stdin*---the keyboard We can put this input data intp a file, then *redirect the input* to ``cubes.py`` from that file. First, let's put the input numbers in a file. Since we haven't learned how to edit a text file yet, we can use ``cat`` as we did above. Without specifying an output file, ``cat``, by itself, will read from *stdin* (the keybord) and write to *stdout* (the screen). We can use output redirection, ``>``, to redirect that output to a file! When we run ``cubes.py``, it wants us to input two numbers, the MIN and MAX integers. So, we want to have a file that contains the two numbers that we would input to ``cubes.py`` if we were typing those inputs by hand from the keyboad. In our example above I used **3** and **8**. We type ``cat > in.cubes`` with no filename specified for *cat* to read, *then* redirect the output of *cat* to a file called ``in.cubes``. :: ``sci[unixplay]> cat > in.cubes`` [ENTER] The cursor will go the beginning of the next line, like it did above, as ``cat`` waits for you to type something. So, type :: 3 8 [CTRL-D] Now, do an listing to see that your new file, ``in.cubes`` is there. If you want to see what the last file produced was (it should be ``in.cubes``), use the ``lt`` alias I put in your .bashrc file. Do ``lt`` and you should see ``in.cubes`` at the top of the list. Have a look at ``in.cubes`` with ``less``. It should contain two lines, with: **3** and **8**. | Now use this file *as the input* to ``cubes.py`` by redirecting input using ``<`` :: sci[unixplay]>cubes.pl < in.cubes Enter MIN: Enter MAX: 3 27 4 64 5 125 6 216 7 343 8 512 This should produce the integers from 0 to 20 and their cubes. **Does it?** Notice that when ``cubes.py`` runs, it prints the prompts "*Enter MIN:*" and "*Enter MAX:*", but now reads from the file ``in.cubes`` instead of *stdin* (the keyboard). We used the ``<`` to redirect the input to ``cubes.py`` from a file instead of from *stdin*. < and > ------- You can use both input and output redirection at the same time! (*I can hear the sound of your mind being blown*). ``command < in.file > out.file`` will make **command** read input from the file **in.file** and catch the output in the file **out.file**. Try this yourself. Use input and output redirection as described above to save the cubes of integers from 0 to 2 to a file called ``cubes_0_20.out``. The Pipe: **|** =============== Previously we learned a few really useful commands: **grep**, **sort**, and **wc**. Be assured, there are many more. It turns out that we can string these together so that the output of one command serves as the input to the next, using the " **pipe**". This is kind of like redirecting the output and input to a file, but instead of using files on the hard drive, we can pass all the data between programs in memory (which is much faster and more convenient). | The "*pipe*" is the vertical bar "**|**" on your keyboad, above the "**\\**" backslash. Recall our file, ``scifi_list.txt``, which contains the year of publication, rank, author, and title of 100 of the best science fiction works. Let's ask some questions: Suppose you want to know the rank in popularity (column 2 in ``scifi_list.txt``) of the books by Robert Heinlein . Have a look at the file to remind yourself what's there. Heinlein's books are mixed in with the rest--randomly. **grep** comes to the rescue: ``grep Heinlein scifi_list.txt`` produces :: 1961 6 Heinlein, Robert A Stranger in a Strange Land 1973 41 Heinlein, Robert A Time Enough For Love 1958 93 Heinlein, Robert A Have Space-Suit - Will Travel 1957 84 Heinlein, Robert A Citizen Of the Galaxy 1951 88 Heinlein, Robert A The Puppet Masters 1956 80 Heinlein, Robert A The Door Into Summer 1966 17 Heinlein, Robert A The Moon is a Harsh Mistress 1959 12 Heinlein, Robert A Starship Troopers Then, to get these ordered by popularity rank, you could redirect the output of the ``grep`` command to a file, then sort the file in a two step process, like this: .. code-block:: bash grep Heinlein scifi_list.txt > heinlein.out sort -k 2 -n heinlein.out However, the pipe allows you to do this in one step: .. code-block:: bash grep Heinlein scifi_list.txt | sort -k 2 -n which yields :: 1961 6 Heinlein, Robert A Stranger in a Strange Land 1959 12 Heinlein, Robert A Starship Troopers 1966 17 Heinlein, Robert A The Moon is a Harsh Mistress 1973 41 Heinlein, Robert A Time Enough For Love 1956 80 Heinlein, Robert A The Door Into Summer 1957 84 Heinlein, Robert A Citizen Of the Galaxy 1951 88 Heinlein, Robert A The Puppet Masters 1958 93 Heinlein, Robert A Have Space-Suit - Will Travel We "*piped*" the output of the **grep** command to the **sort** command, (where we used the switches "**-k 2 -n**" in order to sort on the second "**k**"olumn in **n**\umerical order). We can even chain pipes together: ``grep Heinlein scifi_list.txt | sort -k 2 -n | wc`` produces simply :: 8 73 402 The output of the sorted grep contains 8 lines, 73 words, 402 bytes. Command Summary =============== +---------------------+--------------------------------------------------------------+ | Data Flow | Meaning | +=====================+==============================================================+ | command > file | redirect output to a file (catch/save output of command) | +---------------------+--------------------------------------------------------------+ | command >> file | catch and *append* output of command to the end of a file | +---------------------+--------------------------------------------------------------+ | command < file | read input for a command from a file | +---------------------+--------------------------------------------------------------+ | cmd1 \| cmd2 | pipe the output of cmd1 to the input of cmd2 | +---------------------+--------------------------------------------------------------+ Homework ======== Homework 3 is `here `_