Processes - fork and exec

Programs and processes

It is important to distinguish between programs and processes. A program is a set of instructions (and probably some data, like initialized variables) that is stored in a device, most probably a hard drive. A process is where a program is loaded and executed. Processes are created by the kernel, and contain all the needed structures to load and run the code and data of a program.

Multitasking

Like in real life, processes are created, executed and terminated. This life cycle is an integral part of any operating system.

Linux is no exception, being a multi-task system running several processes at the same time.

Well, this is how we feel it behaves. But if we change the time scale to see what happens in one second, running processes at the same time still holds still true but we will notice that each process/task is in fact run in a serialized way in a very fast fashion. A graphical example could be to recall how a Flip book or a Zoetrope work. When running fast enough static pictures, our eyes/brain perceive continuous movement. This is true, at least, for a single conre CPU, when a proces is given a very small part of CPU and replaced very fast by another process. So fast that we can not distinguish this constant swapping.

To realize how fast this is in multitasking systems, bear in mind that each process is given some CPU cycles to execute its code. A cycle is the amount of time of an internal clock (oscilator). The speed of these oscillations are measured in megahertz (MHz) or gigahertz (GHz). A 3 GHz CPU will perform 3.000.000.000 cycles per second (yes, 9 zeros !!!). In a single core CPU, one instruction takes one cycle. If the CPU has 4 cores, 4 instructions (probably of different processes) can be executed in 1 cycle. This means that a quad-core CPU can run 12.000.000.000 instruction in 1 second. There is plenty of room to run processes (4 in a row in a quad-core CPU), giving, for instance, 1000 cycles (so 1000 machine code instructions) per process. This is known as a time slice or quantum. More information in the scheduler post.

Nowadays everybody expects to have multitasking systems that, at the same time we move the mouse, its arrow moves across the screen (this is already multitasking) while we listen a song in our favorite music player. Find more about this in Wikipedia

Since its birth, Unix (and thefore its derivates) have had the ability to create new processes, therefore being able to do multitasking. This ability is implemented by the fork/exec process-creation model. This is, of course, the case of Linux, being fork() one of, if not the most, important system calls.

Let’s review this process creation model in more detail. at the end, though, some more modern view (threads and so on) is also covered.

fork()

The “classical” fork() system call will create a new process (a child) which is an identical copy in memory of the parent process. The child process has a different (new) PID and a new process structure in the kernel.

Because the child process is a copy of the parent, the instruction pointer is at the same instruction in both processes, so both will run the next instruction after the fork() call. To distinguish who they are, fork() returns a zero in the child’s process. In the parent, it returns the PID of the child or a negative value if fork failed.

As a copy of the parent process, the child process inherits the environment, resource limits, umask, controlling terminal, current working directory, root directory, signal masks and other process resources that are also duplicated from the parent in the forked child process.

Current versions of the Linux kernel optimize the speed and resource usage of the fork() system call by postponing the copy of the parent structures. This copy only happens when the child modifies any shared data. This technique is called Copy-On-Write (COW)

COW is an important improvement, as many child processes are created to run fully new processes which do not need any of the parent structures or resources. The system runs new processes with the exec() system call below.

It is important to note that some resources duplicated by a fork() call, such as file pointers, will cause intermixed output from both processes. This is why it is important not only to distinguish who is who, but also control access to shared resources. Think about closing files. A terminating process will close files before exiting. File locks set by the parent process are not inherited by the child process.

There are two main cases for new process creation. As commented in the booting post, the initialization of the system needs to execute many different non interrelated processes. In order to do so, fork() is combined with exec() (see below).

The other case is when the parent needs to wait for a child to finish. Bash is a clear example, as the parent (where the command to run is entered), will fork a new process and wait for it. The child process will replace himself with the given command code via the the exec() call.

wait()

When needed, a parent can wait for a child termination via the wait() system call. Calling wait() is a way to prevent concurrent access to shared resources, as the parent will be “frozen” in the wait() instruction till the child finishes. When this happens, the parent resumes execution after the wait().

There are several versions of wait(). Please, refer to the documentation to have a thorough and actualized description.

Bear in mind that if a process launches several child processes, wait() will return when any child finishes. Look at the different wait() flavors to choose the right one for you program.

exec()

exec() (like wait()) is a family of system calls. The main purpose of exec() is to replace the current running process with the program given as a parameter. In other words, the caller’s address space is overridden by the program passed as argument to exec() by th caller. This new program will start execution immediately.

The different exec() calls differ in the parameters given. Look at the different exec() flavors to choose the right one for you program. At the very end all flavors call to execve()

Note that execve() closes and removes almost everything from the parent except file descriptors !!

How all this works?

As you probably have already noticed, fork(), wait() an exec() are very interrelated. It is important to understand that fork() and exec() can be combined in many ways. This an clear Unix/Linux difference with other operating systems, where fork() and exex() are combined in a single call, like “spawn” in DOS/Windows.

It is quite common to see the child process calling exec() after a parent fork(). This is the standard way to launch new programs: an already running process calls fork() to create another process, which is a copy of himself. If the child calls exec(), it will load a new program and replace the current child code and data with the code and data of the new program. The parent process might (or might not) wait for the new process running the loaded program to terminate. As commented previously, waiting for the child is how bash runs commands entered at the shell prompt.

Nevertheless, there are other ways to use fork() and exex(). Imagine that we need a process listening on a TCP port for any incoming connection. The normal step in this case is to launch a child to handle the incoming request, while the parent keeps listening for new incomming requests. When the child has finished serving the request, it terminates. The parent can control the number of launched children to the concurrency of child processes in a given moment, etc. There are many TCP sockets programming samples. Find here a basic example without and with fork().

NOTE: Talking about listening on ports, see the different (event driven approach vs process driven approach) of the NGINX server for handling incomming connections.

clone() - Processes and threads

There is a way to use a “lighter” process in Linux. Although in Linux all is a process, these lighter processes are named threads. They are lighter because its creation does not imply a full copy of the parent. clone() is the system call to create a thread, but based on the standard process model, estructures, etc. in other words, from a kernel point of view, threads are processes with less information. Clone() duplicates fewer resources than fork(), depending on the given parameters. As you can read in the clone() man page, “clone() allows the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers.” Therefore, depending on the parameters of clone(), a programmer can control whether to create a full “heavy” process or a “thin” thread.

Being acurate, since version 2.3.3 of the C library fork() is a wrapper to the clone() system call. In fact, forking is a special case of cloning. You will find nowadays many literature talking indistinguishably about processes or threads. A program running with a single thread can be considered a “classic” process, which code is executed in a the single thread.

Deads, Zombies and Orphans

As commented at the beginning of this post, processes (like many other things in life) go thorough different states.

In Unix/Linux, the concepts are the same but the terminology varies a bit. Find the states coded in the linux kernel in sched.h or in array.c definitions for the /prod filesystem.

Basically, a process can be

Code (1) Meaning
D Uninterruptible sleep (usually IO)
R Running or runnable (on run queue)
S Interruptible sleep (waiting for an event to complete)
T Stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z Defunct (“zombie”) process, terminated but not reaped by its parent.

(1) - Codes shown by ps command in ’s’ or ‘stat’ column

The termination (dead) of a process, being it because the end of the code has been reached or by an external event (i.e “kill”), makes the process to transition from a “TASK_RUNNING” state to an ephemeral “EXIT_DEAD” which finally brings a “EXIT_ZOMBIE” state.

In a “correct” termination, the parent (remember that almost all processes have a parent) fetches the termination status state of a child (via the wait() system call). In fact, on any process termination the kernel deallocates the memory and resources used by the process. This is done in the kernel exit() system call, invoked when a program calls the exit() function of the used language library. At this point, the state of the process changes to “EXIT_ZOMBIE”. As commented, the process is already deallocated (no code, no data, etc.), but its process descriptor is kept to allow the parent to gain knowledge about the ending status of the terminated child.

The parent gets the child exit status via the wait() system call. When the parent is signaled that a child has ended, the wait() system call resumes execution (remember that wait() is “bloking” the parent), reading the child status in the process descriptor entry and cleaning its entry in the process descriptor table. The parent can get the child information via the “wstatus” argument in the wait() call.

If for any reason the parent does not reap the child status, the child entry in the process descriptor will remain (and therefore consuming a process id), and the child will remain as a “ZOMBIE” process. In this case, the system assigns a new parent to the child. This parent is normally init (PID 1), which time to time handles the reaping of children processes wbing left “zombies” by their parents.

It is possible for a parent to ignore, intentionally, the status a children. This can be done ignoring the SIGCHLD signal via the SIG_IGN flag. In this case, the system will remove the child entry in the process descriptor table (see “Signal Handling and Masking” at signals)

The function release_task() is in charge to do the final cleaning. It is invoked by wait() when the parent takes care of the child exit status or by do_exit() when the parent does not care about its child.

It is also possible that a parent dies (intentionally or unintentionally) and leaves children processes running. In this situation, the children are orphan processes and the system assigns them also to the init process (so in fact they have a “new” parent).

To cut this long story short, here is summary of a possible life cycle for a process:

This is just “a possible” cycle as processes can hung, be orphaned, etc..