Unix Essentials Class
Introduction · Table of Contents · Unix vs. other operating systems · Getting Started · Fundamental UNIX Concepts · Unix Files · Directories · Introduction to the Korn Shell · The vi editor · Korn Shell Again · Hyper-Ad Home Page · Technical Tutoring Home Page · Recommended Books · Online Store
Fundamental UNIX Concepts
UNIX Architecture · UNIX File system · UNIX Processes - ps · Man pages - man · Pagers - more and less
We've already touched on a few fundamental UNIX concepts, particularly the notion of session. During a session, the user enters commands that the shell interprets and executes. Another name for a shell is command interpreter. The shell is actually the top layer of a series of layers of software that finally act to turn the millions of transistors in the CPU and memory on and off.
The earliest computers were huge, clunky things that were literally programmed with 0's and 1's - programmers actually told the computer exactly where to stick the 0's and 1's and how to combine them. A big program might be hundreds of lines of 0's and 1's which were completely unreadable to anyone but an expert.
Modern computers are much different - commands and programs are much closer to ordinary languages. Typical programs run into the millions of lines of code.
In order to make this work at all, much less manageable, software on the computer had to be divided into layers that managed different levels of information. This division into layers forms a plan called the system architecture that governs everything about the computer. Most users will work with at least two of the four common layers of system architecture; power users will work with all four.
The kernel At the lowest level, the computer has to:
There is a specific piece of software that controls all these low-level actions, called the kernel. In my computer, the kernel is in the /boot directory:
Usually, the kernel has a special name, like vmlinuz in Linux systems or vmunix in Solaris. In the above case, vmlinuz is actually a symbolic link (like a Windows shortcut) to the "real" kernel vmlinuz-2.2.12-20.
The point is that there is a specific executable file on the computer that gets loaded into the RAM memory at boot time that literally runs the computer. This program is the "real" UNIX - all the other programs (hundreds, perhaps thousands) are just helper programs that allow us to talk to the computer in something easier to understand than meaningless sequences of 0's and 1's.
Device driver level The kernel is really rather simple-minded. It just moves data around and does calculations, all in base 2. In fact, the kernel proper does not really talk to all the hardware, just the CPU and memory. Every other piece of hardware has a special piece of software that sends and receives data from the kernel and interprets it for the hardware. These special programs are called device drivers. The collection of all device drivers forms what is called the device driver layer. Most of the device drivers are considered part of the kernel and are in fact built into the kernel.
Application level User programs form yet another layer of software. These are programs and utilities like word processors, spreadsheets, editors, graphical user interfaces, etc. Usually, these are written in some sort of programming language (like C, C++, Java) that has to be compiled into executable files. There are typically hundreds of these programs on a computer.
Shell level Tying all of these together is a program that allows the user to talk to the computer and execute these other programs. This program is called the shell, command interpreter or user interface. Without it, there would be no way to navigate the file system, execute other programs, look at files, make files, move files or delete files. The shell is the program the users constantly talk to, and the shell in turn talks to the other programs.
Most of what we will be doing in this class is learning to use the shell.
These four layers, the kernel, the device drivers, user programs and the shell make up the UNIX system architecture. Ordinary users just use the shell and user programs; system administrators will typically be concerned with all four layers.
[Aside: In Windows, the shell is called win. It has several forms, like the desktop, My Computer, the Explorer, various pop-up windows, but the actions it performs are the same - start and exit programs, manage files, navigate the file system. There is another shell on Windows - DOS (or the NT version cmd) which is a command line interface somewhat similar to UNIX shells.]
Files The UNIX system is organized using a file system. Literally everything in UNIX is a file - all the directories are files, the programs are files, the devices are files. In practice, we divide the files in a UNIX system into categories by type of file:
We have already been introduced to the root directory and some of the subdirectories of the root directory. We'll come back to the matter of files later on.
Standard Input and Output Above we considered a program called "cat". We'll use cat as an example program to study the input and output behavior of a typical program. A lot of standard UNIX programs are called filters - they take input from the standard input (i.e. the screen), do something to it, and output to the standard output (again, the screen). We'll have cat output to a file called "file1" by redirecting the standard output to a file. This is accomplished simply by typing cat, then a greater than sign (>), then the name of the file we want to write:
What you can't see here is the ^D typed after the line of text. That caused cat to exit and brought up the second command prompt ($). Then, "ls" was used to look in our directory to see the new file "file1" that was just created.
Then, cat was invoked again, this time with "file1" as an argument, producing the contents of the file "file1". So, the action of "cat" is just to write its standard input to its standard output.
If that seems a bit unclear, let's try another example. This time, since we already have "file1", let's use it as input for cat (by redirecting again, this time with a "less than" symbol < ).
What happened here? The cat program read the file "file1" instead of screen input, and placed its output on the screen (standard output).
Let's try another experiment: this time, let's again use "file1" as redirected input, and make another file, "file2", the redirected output:
We then used cat to look at the contents of file2. What is useful to notice here is that we can create files by redirecting the standard output of a program to a file, and that cat is good for looking at the contents of a file.
This might seem a bit confusing, but cat is a very simple program. Let's do another example:
What happened here is that I redirected the output of cat to file2, and entered a line of text from the standard input (the screen - I just typed it in, hit return, and then hit ^D to exit cat). Then I ran cat again with file2 as the argument, producing the line of text I just entered. What happened to the previous contents of file2? It was overwritten - the new contents replaced the old contents. Running "cat file2" just displayed the contents of file2 by writing it to the standard output.
The null device There are a lot of devices in the /dev directory - we shall take a look at them later in more detail. Let's look at one of them, a very special device called the null device or /dev/null. This device is empty:
In fact, anything written to it disappears:
First, I ran cat on file1 to show its contents. Then, I did it again, this time redirecting the output to /dev/null. You will remember from before that doing that with file2 wrote the contents of file1 to file2. Here, redirecting the output to /dev/null did nothing - another cat on /dev/null showed no output.
Let's make a new file called file3 by redirecting the output of cat to it, show that it has something in it by doing cat file3, then redirect cat /dev/null to file3:
What happened? The file comes up empty! What happened here is that I redirected the contents of an empty file to file3 and erased it!
The lesson here is that /dev/null is empty and cannot be filled. It is designed to be an empty device and stay that way. It can be used as a way to write to somewhere and guarantee that the written content disappears, and also as a way to "zero" out files so they are empty.
When a UNIX program is running, it occupies RAM (memory) and stays in memory as long as it runs. The memory is reclaimed when the program exits. During the time it runs, the program in memory is called a process. We can look at the programs that are running by using the ps utility:
The output of ps shows two programs running - bash and ps. Each process has a process id or PID, both are running on the "terminal" pts/1, neither process was consuming CPU time when ps took its "snapshot". UNIX keeps a table of running processes called the process table. This table collects the name of the process and useful data about the process such as who is running it, how much CPU time is being consumed and where the process is running (the session or "terminal" or "tty" the process is run from).
Adding some options to the process command shows a more complete picture of what is happening in the computer:
From this we can see that the ps command without options shows only the processes being run by the user student. Adding options in this last case shows all of the processes being run by all users including root. From this we can see that there is a lot going on in the computer even though we can't usually see it.
There are hundreds of programs on a typical UNIX system, some of them very complicated. Fortunately, there is almost always documentation for these programs in the form of man (short for manual) pages. Modern systems frequently have other documentation systems as well, but the man pages are the traditional online documentation system. To use the man pages, one types "man <program_name>" like this:
To get out of the current man page, type "q" - the program exits immediately (no need to hit return).
Suppose you wanted more information on how to use the man pages. There is a man entry for man:
(note that much of the screen output has been omitted)
Notice the colon at the bottom of the entry - that is a signal that the man entry was too big to fit in one screen full of text and that there is more info off screen. Hit the spacebar to get another screen full (omitted here).
Suppose you were not quite sure what the name of a command was, but you needed to know that, too.
In many systems, the command "apropos" is a synonym for "man -k". This option to the man pages searches for all references to the argument and outputs a list of appropriate man pages that might have something to do with the search topic.
GNU abandoned man pages some years ago - they are still included but no longer maintained. The current documentation for Linux systems uses the texinfo program, invoked by typing the word info:
The texinfo program is self-documenting. You are welcome to try it yourself. There is a brief online tutorial (type "h" when you get the above screen). We won't go any further than mentioning it here since not all versions of UNIX use it.
Try typing the command:
there is so much info returned (1300 blocks are in the directory, although 138 lines are returned - most have been omitted here for brevity) that it flies off the screen faster than you can see it. Only the bottom screen full can be seen. This is fixed by using pagers - programs that only allow 1 screen full of information to be displayed at a time.
Now try writing the output of the same command to a file via
Try looking at the contents of list using
The same long output again blows past the screen. Now try
the first part of the output (in fact, 52% of it) is displayed. You know there is more to the file because of the
at the bottom of the screen. Hit the spacebar - more of the file is displayed. Keep going until you get back to a command prompt. Instead of the spacebar, you can hit the enter key to scroll through the rest of the file one line at a time.
Now try the same experiments with
Notice the similarity to the "more" output. Less is actually a more sophisticated version of more - and its name is typical UNIX word game. On some systems, there is another pager "pg", but not on Linux. We'll come back to pagers later.