Linux Processes vs Threads: NPTL and Clone() Explained

Most backend programmers have been asked an interview question: “In Linux, what is the difference between a process and a thread?”

I believe you can easily answer this question:

A process is the smallest unit of resource allocation, while a thread is the smallest unit of CPU scheduling.
Each process contains at least one thread. Processes have independent address spaces, process IDs, file descriptors, environment variables, blah blah…
Threads within the same process share the process’s address space and system resources, but each thread has its own independent thread ID, registers, stack, errno, blah blah…

Just when you are secretly pleased with your fluent answer, the interviewer follows up: “If a process first creates multiple threads using pthread_create and then calls fork to create a child process, will the child process have multiple threads? What problems might this cause?”, “How does the Linux kernel differentiate between threads and processes?”…

Thread Models

There isn’t just one way to implement multithreading. Due to differing support from the kernel and user space, implementations mainly fall into these categories. Their primary difference lies in whether the thread scheduler resides within the kernel or outside it.

Kernel-Level Thread Model

The kernel-level thread model is the simplest and most direct approach. This model is also known as the “1:1 thread model” because the number of kernel-scheduled entities matches the number of user-created threads one-to-one.

The key to implementing a kernel-level thread model is kernel support. Linux uses this thread model (both the earlier LinuxThreads and the current NPTL).

User-Level Thread Model

Contrary to the kernel-level model, the user-level thread model is called the “N:1 thread model”. User space is crucial here because threads are implemented entirely in user space. Multiple threads created by a user within a single process map to only one kernel scheduling entity. Such a model can even be implemented on simple kernels that don’t natively support threads. However, the user-space code logic becomes complex as thread management must be handled there.

The advantage of this model is that thread switching is almost zero-cost because the application decides which thread runs without kernel intervention. Its main disadvantage is the inability to effectively utilize multi-core processors, as multiple threads run on a single kernel entity.

GNU Pth implements the N:1 thread model.

Hybrid Thread Model

Mapping M user threads onto N kernel entities results in the “M:N thread model”. This model can be seen as a combination of the first two, aiming to achieve the true parallelism of the 1:1 model and the low-cost switching of the N:1 model. The idea is sound, but the model itself is somewhat complex.

Light-weight Process (LWP)

Light-weight Process is a term that can easily cause confusion. In Unix System V and Solaris, there was a separate LWP layer residing in user space. Multiple LWPs within the same process shared address space and system resources. Each LWP could host one or more user threads.

Linux does not have a separate LWP layer. Each user thread corresponds directly to a kernel thread. In this context, LWP often refers to the kernel thread, while “thread” denotes the user thread.

POSIX Threads

POSIX defines a set of standards independent of programming languages. Most Unix-like operating systems support POSIX Threads (usually via libpthread). It’s important to note that native Windows systems do not support POSIX Threads.

Linux Threads Library Implementations

LinuxThreads

LinuxThreads was the widely used thread library before Linux kernel version 2.6. It implemented an LWP-based 1:1 thread model, where one thread entity corresponded to one light-weight process.

Its specific implementation involved a manager thread within each process responsible for thread management. This manager thread was created and started when the process first called pthread_create(), and subsequently, it created and managed other threads.

This approach seemed clever but had many issues:

Process ID Problem According to the POSIX standard, all threads within the same process should share the same process ID and parent process ID. LinuxThreads’ implementation clearly couldn’t achieve this.
Signal Handling Problem Asynchronous signals are delivered on a per-process basis. In LinuxThreads, each thread was essentially a process, and there was no concept of a “process group.” Consequently, certain signals like SIGSTOP and SIGCONT couldn’t be effectively applied to all threads; they could only suspend a specific thread, not the entire process.
Thread Count Limit Problem LinuxThreads set the maximum number of threads per process to 1024. However, this number was also limited by the total number of processes allowed in the system, as its underlying implementation used processes.
Manager Thread Problem The manager thread could easily become a bottleneck, a common issue with this type of architecture. Additionally, the manager thread was responsible for cleaning up user threads. Therefore, although the manager thread masked most signals, if it died unexpectedly, user threads had to be cleaned up manually. User threads also had no way of knowing the manager thread’s status, leaving subsequent thread creation requests unhandled.
Synchronization Problem Thread synchronization in LinuxThreads relied heavily on signals. This method, using the kernel’s complex signal handling mechanism, was consistently plagued by efficiency issues.
Other POSIX Compliance Issues Many Linux system calls, semantically related to processes (e.g., nice, setuid, setrlimit), only affected the calling thread in LinuxThreads.
Real-time Issues Threads were introduced partly for real-time considerations, but LinuxThreads did not support features like scheduling options at the time. This lack of real-time focus wasn’t limited to LinuxThreads; standard Linux itself had few real-time considerations back then.

NPTL

The aforementioned problems with LinuxThreads, especially compatibility issues, were frequently criticized. Many in the Linux community worked towards improving the thread library. The most successful effort was NPTL—Next Generation POSIX Threading. NPTL became the standard implementation in Linux starting from kernel version 2.6.

Fundamentally, NPTL is still an LWP-based 1:1 thread model. However, compared to LinuxThreads, NPTL introduced significant improvements. The biggest difference is that NPTL does not use a manager thread; instead, core thread management is handled directly within the kernel, thanks to enhanced kernel support.

Let’s see how NPTL addressed the LinuxThreads problems:

Process ID The kernel supports creating new tasks (threads) that share the same process ID as the original process. Thus, all threads have the same PID (which is actually the Thread Group ID, TGID). It’s also possible to distinguish the main thread, preventing thread lists from cluttering process listings.
Signal Handling The kernel implements the POSIX-required thread signal handling mechanism. Signals sent to the process (TGID) are delivered by the kernel to an appropriate thread for handling.
Thread Count Limit The kernel was extended to handle an arbitrary number of threads (limited by system resources).
Manager Thread The tasks of the manager thread are handled by the extended clone system call. The exit_group system call was added to terminate the entire process (all threads).
Synchronization A mechanism called Futex (Fast Userspace Mutex - note, not Mutex) was implemented for inter-thread synchronization. Futex operations are primarily performed in user space, resolving the efficiency problems associated with kernel signal-based synchronization.

Of course, NPTL is not perfectly compliant with the POSIX standard either.

NGPT

The NPTL mentioned above was spearheaded by RedHat. Around the same time, another project funded by IBM, called NGPT (Next Generation POSIX Threading), is worth mentioning because it implemented the M:N thread model.

According to a notice on the official NGPT website in March 2003, considering the growing acceptance of NPTL and to avoid confusion caused by different thread library versions, NGPT ceased further development and shifted to supportive maintenance.

NPTL Implementation

We now know that in Linux, both processes and threads correspond to a task_struct in the kernel. We typically use fork() to create processes and pthread_create() to create threads.

What is the relationship between fork and pthread_create? Let’s first look at the underlying system call for pthread_create—clone:

int clone(int (*fn)(void *), void *child_stack, int flags, void *arg, ... /* pid_t *pid, struct user_desc *tls ", pid_t *" ctid " */ );

Let’s examine the flags parameter:

CLONE_PARENT: Sets the parent of the new process to be the same as the parent of the calling process, rather than the calling process itself.
CLONE_FILES: The child process shares the file descriptor table with the parent process.
CLONE_VM: The child process shares the memory space with the parent process.
CLONE_THREAD: Places the child process in the same thread group as the parent process.
…

Why introduce the clone system call? Because both fork and clone are simple wrappers around do_fork, which in turn is a simple wrapper around copy_process. So, in the kernel’s world, there’s no fundamental difference between a new process created by fork and a new thread created by pthread_create. They are just tasks created with different parameters controlling resource sharing. (Process, thread, task - these different terms mean the same thing to the kernel). Let’s look at the implementation of task_struct in the Linux kernel:

struct task_struct {
    ….
    pid_t pid;
    pid_t tgid;
    ….
}

In Linux kernel version 2.6 and later, the task_struct includes the tgid field (thread group id). For the “main thread” (the initial thread in a process), tgid equals pid. For other threads created within the process, tgid equals the tgid (and pid) of the main thread, which is the process ID. Each thread also has its own unique pid (kernel’s task ID). As mentioned earlier, passing the CLONE_THREAD flag to the clone system call sets the new task’s tgid to the parent’s tgid. Using tgid, the kernel or related utilities can determine whether a task_struct represents a process (main thread) or just another thread within a process, and decide whether to display it in process listings like ps.

Let’s look at an example program:

#include <pthread.h>
#include <stdio.h>
#include <sys/syscall.h>
#include <time.h>
#include <unistd.h>

void *thread_func(void *arg) {
    int i = 0;    
    for (; i < 5; i++) {        
        pid_t pid = getpid();        
        pthread_t thread_id = pthread_self();        
        long tid = (long)syscall(186);        
        printf("pid: %d, thread_id: %0LX, tid: %d\n", pid, thread_id, tid);        
        sleep(1);    
    }    
    sleep(50);
} 

int main() {    
    pthread_t thread;    
    pthread_create(&thread, NULL, thread_func, NULL);     
    if (fork() == 0) {        
        printf("child process: %d\n", getpid());    
    } else {        
        printf("parent process: %d\n", getpid());    
    }    
    pthread_join(thread, NULL);     
    
    printf("hello from %d\n", getpid());    
    return 0;
}

Output of the above code (Note: The exact order of output lines, thread IDs, and PIDs might vary slightly due to scheduling.):

pid: 20819, thread_id: 7F3275384700, tid: 20820
parent process: 20819
child process: 20821
hello from 20821
pid: 20819, thread_id: 7F3275384700, tid: 20820
pid: 20819, thread_id: 7F3275384700, tid: 20820
pid: 20819, thread_id: 7F3275384700, tid: 20820
pid: 20819, thread_id: 7F3275384700, tid: 20820
hello from 20819

From the output above, we can observe:

During the program’s execution, ls /proc will show directories /proc/20819 (parent process) and /proc/20821 (child process). Under NPTL, the thread 20820 will likely be visible within the parent’s task directory, e.g., /proc/20819/task/20820, but will not exist as a separate /proc/20820 directory.
getpid() returns the process ID (which is the tgid - Thread Group ID). To get the kernel’s actual thread ID (which the kernel calls pid), you need the gettid() system call. syscall(186) corresponds to gettid (on the x86_64 architecture).

#define __NR_gettid 186

During execution, tools like ps -eLf or top -H will show individual threads. Standard ps aux usually only lists the main thread (process). If we were to kill 20820 (send a signal specifically to that thread’s kernel PID), the default signal (SIGTERM) would be delivered to the entire thread group (process 20819), causing the whole process to terminate. The child process 20821 exits after printing its message. If the parent process doesn’t wait() for it, ps will show the child’s state as Z (zombie), indicating it has terminated but its exit status hasn’t been collected by the parent.
When fork() is called, the parent process’s memory data is copied to the child process (typically using Copy-on-Write). However, other threads in the parent process are not duplicated in the child process. The child process starts with only one thread: a copy of the thread that called fork(). If there was multi-threaded synchronization (like Mutexes) in the parent, the state of these locks is copied to the child. But since the other threads (potentially holding locks) do not exist in the child, this can easily lead to deadlocks if the child tries to acquire these locks. Therefore, the general recommendation is to call one of the exec() family functions immediately after fork() in the child process, especially in multithreaded programs.

Thread Models#

Kernel-Level Thread Model#

User-Level Thread Model#

Hybrid Thread Model#

Related Concepts#

Light-weight Process (LWP)#

POSIX Threads#

Linux Threads Library Implementations#

LinuxThreads#

NPTL#

NGPT#

NPTL Implementation#