What is PID 0?

I get nerd-sniped a lot. People offhandedly ask something innocent, and I lose the next several hours (or in this case, days) comprehensively figuring out the answer. Usually this ends up in a rant thread on mastodon or in some private chat group or other. But for once I have the energy to write one up for the blog.

Today’s innocent question:

Is there a reason UIDs start at 0 but PIDs start at 1?

The very short version: Unix PIDs do start at 0! PID 0 just isn’t shown to userspace through traditional APIs. PID 0 starts the kernel, then retires to a quiet life of helping a bit with process scheduling and power management. Also the entire web is mostly wrong about PID 0, because of one sentence on Wikipedia from 16 years ago.

There’s a slightly longer short version right at the end, or you can stick with me for the extremely long middle bit!

But surely you could just google what PID 0 is, right? Why am I even publishing this?

The internet is wrong

At time of writing, if you go ask the web about PID 0, you’ll get a mix of incorrect and misleading information, and almost no correct answers.

After figuring out the truth, I asked Google, Bing, DuckDuckGo and Kagi what PID 0 is on linux. I looked through the top 20 results for each, as well as whatever knowledge boxes and AI word salads they organically gave me. That’s 2 pages of results on Google, for reference.

All of them failed to produce a fully correct answer. Most had a single partially correct answer somewhere in the first 20 results, but never near the top or showcased. DDG did best, with the partially correct answer at number 4. Google did the worst, no correct answer at all. And in any case, the incorrect answers were so prevalent and consistent with each other that you wouldn’t believe the one correct site anyway.

The top-2 results on all engines were identical, interestingly: a stackoverflow answer that is wrong, and a spammy looking site that seems to have embraced LLM slop, because partway through failing to explain PID 0 it randomly shifts to talking about PID loops, from control system theory, before snapping out of it a paragraph later and going back to Unix PIDs.

Going directly to the source of the LLM slop fared slightly better, on account of them having stolen from books as well as the web, but they still make shit up in the usual amount. I was able to get a correct answer though, using the classic prompting technique of already knowing the answer and retrying until I got good RNG.

If we set aside the few entirely wrong answers (“there is no PID 0”, “it launches init then exits”, “it’s part of systemd”, “it’s the entire kernel”, “it spins in an infinite loop and nothing else”), the most common answer follows a single theme: PID 0 has something to do with paging, or swap space, virtual memory management in some way.

This theme comes straight from, where else? Wikipedia’s article on PIDs, which said:

There are two tasks with specially distinguished process IDs: swapper or sched has process ID 0 and is responsible for paging, and is actually part of the kernel rather than a normal user-mode process. Process ID 1 is usually the init process primarily responsible for starting and shutting down the system.

That text has been on Wikipedia for 16 years, and in that time has been quoted, paraphrased and distorted across the web to the point that it’s displaced the truth. It’s a pretty funny dynamic, and also a bit sad, given the source code for Linux and the BSDs is right there, you can just check.

(Later note: after I published this, someone went and updated the article to have the correct information. The link above takes you to the old version so that the rest of this explanation still makes sense, but at time of writing this update the current version of the PID article is accurate)

To explain why Wikipedia was inaccurate here, we need to take a little history lesson.

The history of PID 0 in Unix

As I said in the opening TLDR, PID 0 does some scheduling and power management, and no paging. It’s what the scheduler runs when it has nothing else for a CPU core to do.

The exact implementation obviously varies across kernels and versions, but all the ones I inspected follow the same broad pattern: when PID 0 gets to run, it tries to find something else that could run in its place. Failing that, it puts the current CPU core to sleep until something else wakes it back up, and then loops around and starts over.

Don’t take my word for it. Here’s do_idle in the Linux kernel, which is called in an infinite loop by PID 0. nohz_run_idle_balance tries to find alternate work. The while loop puts the core to sleep. After wakeup, schedule_idle lets the scheduler take over and put the core to work again.

But maybe that’s just linux, I hear you say. Okay, here’s sched_idletd in the FreeBSD kernel. tdq_idled tries to steal runnable tasks from another core. Failing that, cpu_idle puts the core to sleep. Rinse, repeat.

Okay sure but these are modern kernels, maybe it was different in the olden days? Okay, how about sched in 4.3BSD, from the summer of 1986? Computers are getting smaller and OSes more compact, so the scheduler and idle loop are now smushed into one routine. It tries to find something to schedule, and failing that sleeps until an external event wakes it back up.

Incidentally, this is the origin of the vague allegation that PID0 is sometimes called “sched”: in earlier Unixes, the function that implements PID 0 is literally called sched.

Still not sure? Maybe it’s just a weird BSD thing that leaked into Linux?

Okay fine, here’s sched in Unix V4, the first known version of the Unix kernel written in C. Again the scheduler and idle loop are firmly intertwined, and there’s also some PDP-11 esoterics that are confusing to modern eyes, but the same bones are there: find a runnable process and switch to it, or idle and then try again.

You could go further back. The source code for Unix V1 is out there, as well as an early prototype on PDP-7. However, it’s all in PDP assembler, uses some mnemonics that don’t seem to be listed in the surviving assembler references I could find, and the kernel’s structured a fair bit differently from the C version.

That said, if you want to go digging, I believe the swap routine is the meat of the scheduler. And finally we get a clear idea of the root of the Wikipedia claim: in the earliest Unix implementation, the scheduler was sometimes nicknamed the “swapper.”

It was called that because, now that we’re back at the beginning of Unix, one routine encompasses not only scheduling and idling, but also moving entire process memory images between the small core memory and secondary storage. Hard drives in this case, references in the kernel code as well as the Computer History wiki confirm that Bell Labs’s PDP-11 at the time ran an RS11 disk for the core OS and process swapping, and an RK03 for the user filesystem.

(Sidebar! This is where the / vs. /usr split comes from. /usr was the part of early Unix stored on the RK03 disk, whereas the smaller root filesystem was on the RS11. Unless you’re still running on a PDP-11 with single RS11 and RK03 disks, a split /usr is vestigial and causes a variety of problems in early boot)

So now the history is hopefully fairly clear. In the first Unix the world at large saw (Unix V5), entry zero in the process table initialized the kernel, then looped in the sched function, defined in slp.c. Those two names clearly telegraph the loop’s primary functions. However, the scheduling algorithm is quite simple at this point, and so almost all of sched’s code is concerned with swapping process images in and out of core memory in order to make scheduling happen. Thinking of this function as the “swapper” is reasonable, even if the original source code never uses that name.

This essential structure survives to this day, with a lot more complications. Whole-process swapping gave way to demand paging, and so PID 0 stopped concerning itself with even a little memory management. As both the scheduling algorithms and the mechanics of idling a CPU became more complex, scheduling and idling were split out into separate pieces of code, and you end up with what we’ve had for at least two decades: the function implementing PID 0 has sched or idle in its name, and has a supporting role in doing those two things.

Going back to the Wikipedia article, it seems the author of that edit wanted to write “swapping”, in the classic Unix V5 sense of swapping out whole processes as a consequence of scheduling. But the edit didn’t clarify that “swapping” was being used in an archaic sense that was likely to confuse the modern reader. Furthermore the edit wrote “paging” rather than “swapping”. I don’t know why, but my guess is that it’s because the canonical article for this general memory management concept is titled “Memory paging”, whereas “swapping” is a disambiguation page. In the moment of making the edit, I could definitely see myself swapping out for the seemingly preferred term.

Unfortunately, in this particular context, replacing “swapping” with “paging” makes the sentence incorrect. And there it sat for 16 years, slowly leaking into the rest of the web as people quoted wikipedia at each other and paraphrased or elaborated further in the wrong direction.

Okay, end of rant about how the web is turning to ash in our hands. It’d be nice if it didn’t, or at least it’d be nice if half the industry wasn’t breathlessly building ways to spray more petrol on what’s left. So it goes. Back to PID 0 now.

Are those functions really PID 0?

Above, I claim by fiat that the functions I’m linking to are PID 0. Tracing all of them would take a lot more words, but I’ll demonstrate the point on Linux and leave you to trace the others. I encourage you to do so! It’s remarkable how similar to each other different kernels are in this area, both across current OSes and over time. They’ve become more complex, but the family tree is still evident.

Disclaimer, the Linux kernel is a very complex beast. I’m not going to walk through every single thing the kernel does before reaching do_idle. Think of this as a signposts to help orient you, not a comprehensive breakdown. This was written using the 6.9 kernel source code, so if you’re visiting from the future: hello! I hope your dilithium matrix is cycling well, and things may have changed.

We begin! The bootloader jumps to the first instruction of kernel code. The first few steps from here are extremely specific to the CPU architecture and nearby chipset hardware. I’m going to skip that and begin at start_kernel, where the machine has been set up to a common baseline and architecture-independent kernel code takes over (albeit still assisted by arch-specific helpers).

At this point, start_kernel is the only thing running on the machine (yes I know about ring minus 1 and SMM and so on, I said I was simplifying). On multicore systems, the bootloader/firmware/hardware arranges for a single CPU core to be running, called the bootstrap core. That single thread of execution is what we’re looking at, and it’s all we get until the kernel starts the other cores itself.

The first thing to get called is set_task_stack_end_magic(&init_task). Well that looks relevant! It’s a very simple function that writes a magic number to the top of init_task’s stack space to detect overflows. init_task is statically defined in init_task.c, and the leading comment tells us it’s the first task. What’s a task though?

task_struct, PIDs TIDs TGIDs and oh no

Here we have to take a detour into something very confusing: the Linux kernel and its userspace disagree on the meaning of PID.

In the kernel, the unit of running things is the task_struct. It represents one thread of execution, rather than a whole process. To the kernel, a PID identifies a task, not a process. task_struct.pid is the identifier for that one thread only.

The kernel still needs to represent the concept of a userspace process somehow, but it’s not a nice crunchy data structure you can point at. Instead, threads are collected into “thread groups”, and groups are identified by a thread group identifier, or TGID. Userspace calls thread groups processes, and thus the kernel TGID is called the PID in userspace.

To add confusion, these numbers are often the same. When a new thread group is created (e.g. when userspace runs fork()), the new thread is given a new thread ID, and that ID also becomes the new group’s TGID. So for single-threaded processes, kernel TID and TGID are identical, and asking either the kernel or userspace what this thing’s “PID” is would give you the same number. But that equivalence breaks once you spawn more threads: the new thread gets its own thread ID (which is what the kernel calls a PID), but inherits its parent’s thread group ID (which userspace calls a PID).

To add even more confusion, the arrival of containers forces threads and processes to have multiple identities. The thing that’s PID 1 in a docker container is very much not the same as PID 1 outside the container. This is tracked in a separate pid struct, which keeps track of the different thread IDs a task_struct has, depending on which PID namespace is asking.

I’m a userspace enjoyer by day, so when I started this rabbithole I interpreted “PID 0” in the question as an analog to the PID 1 I know, that /bin/init thing. But now the question is ambiguous! PID 0 could mean thread 0, or it could mean thread group 0.

At the beginning of the kernel, the answer is fortunately easy: init_task represents PID 0 by everyone’s definition. It’s the thread with ID 0 (which is the PID according to the kernel), it’s the only thread in the group with ID 0 (which is the PID according to userspace), and no child PID namespaces exist yet, so there’s no other numbers for init_task to be.

This will get muddier later on because thread group 0 is going to grow more threads, so in userspace terms we’ll have a PID 0 process that contains several threads, one of which has TID 0.

In the rest of this post I’m going to try and say “task” or “thread” to mean a single thread of execution, the thing described by a task_struct; and “thread group” for the thing userspace would call a process. But it’s not just you, it’s terribly confusing.

Erratum: some folks pointed out that I got two details above wrong! I’m grateful for the corrections, which are as follows.

It’s true in general that TIDs and TGIDs are sometimes the same as described above, but it’s possible to construct a fresh userspace process in which the single initial thread has a TID that doesn’t match its TGID. If you execve() in a multithreaded process from any thread other than the initial thread, the kernel will kill all other threads, and make the exec-ing thread the leader of the thread group. The TID of the thread doesn’t change, and so the new process will execute on a thread whose TID doesn’t match its TGID.

The second error is more specific to this post’s topic: on linux, all threads within thread group 0 have thread ID 0! It’s explicitly special-cased in a few places, and as far as I can tell is the only place in the kernel where multiple definitely different threads have the exact same identity.

Okay, back to the code walk…

The path to the idle task

So, we know init_task is the PID 0 we’re looking for, albeit now it’s actually two different PID 0s at the same time because it’s the thread with ID 0 within the thread group with ID 0. How do we know that init_task describes the currently-executing CPU context?

There’s a few things. We know we’re the only thread of execution currently happening, and init_task is described as the first task, aka the first thread. That sounds like us. It’s using init_stack as its stack, which is the stack we’re currently using (proving this requires digging into arch-specific code and gcc linker scripts, so I’m going to skip it, but have fun!). Its __state is TASK_RUNNING, which means it’s either running right now, or it’s runnable and waiting for CPU time. The kernel scheduler isn’t initialized yet, so there can’t really be any other runnable task at this point. This could be a setup for an elaborate trolling, but the evidence suggests that this init_task is us. And spoiler, we’re not being trolled, init_task is indeed the initial thread that executes start_kernel.

At this point a lot of early kernel initialization happens. We can skip over all that for our purposes, and pick up at the call to sched_init. This function does basic initialization of the CPU scheduler’s data structures. A lot happens because the scheduler is a large beast, we’ll just peek at a couple of relevant lines:

     * The idle task doesn't need the kthread struct to
     * function, but it is dressed up as a per-CPU
     * kthread and thus needs to play the part if we want
     * to avoid special-casing it in code that deals with
     * per-CPU kthreads.

     * Make us the idle thread. Technically, schedule()
     * should not be called from this thread, however
     * somewhere below it might be, but because we are the
     * idle thread, we just pick up running again when this
     * runqueue becomes "idle".
    init_idle(current, smp_processor_id());

The first line describes the currently executing thread as “the idle task,” and mentions that it’s a special kernel thread: most kernel threads are run by kthreadd, which is task 2 and doesn’t exist yet. If you’re on linux, ps ax | grep kthreadd will show that kthreadd is PID 2 in userspace, in addition to also being thread/task ID 2 in the kernel.

The second line explicitly tells the scheduler that the currently running thread is the “idle thread” for the bootstrap CPU core. current is a pointer to the currently-running task_struct, which at this point in execution points to init_task. The implementation of current is another very architecture-specific piece of code, so I’m going to encourage you to go poke at it if curious, and move right along.

Going back to start_kernel, the remaining initialization code doesn’t concern us, so we can skip straight to the call to rest_init. This function is short and sweet: it spawns task 1, which will become the init process in userspace; and task 2 for kthreadd, which manages all future kernel threads.

We’ll be following the life of task 1, and although it will someday become PID 1 in userspace, to start it’ll run kernel_init. Not yet though. These new tasks exist and are known to the scheduler, but they’re not running yet because we haven’t asked the scheduler to do its thing yet. (caveat: in some kernel configurations, the scheduler may get a chance to switch to task 1 and 2 sooner than what I’m about to describe, but these first tasks are orchestrated such that the outcome is nearly identical.)

Finally, rest_init calls cpu_startup_entry, which goes into an infinite loop of calling do_idle. And here we are, we’ve become the idle task on the bootstrap CPU core. On the first iteration, we don’t put the CPU to sleep because there are other runnable tasks (the two we just made). So we drop to the bottom of do_idle, and go into schedule_idle. The scheduler finally gets to run, and we switch away from task 0. kthreadd in task 2 isn’t terribly interesting, it does a little initialization then yields the CPU again until something else asks to create kernel threads. Let’s follow task 1 instead, it’s much more fun.

Task 1 starts at kernel_init. This does even more kernel initialization, including bringing up all device drivers and mounting either the initramfs or the final root filesystem. And then, at last, it calls run_init_process to drop out of kernel mode and execute userspace’s init program. If init(1) asks the kernel who it is, it’ll be told that it is thread 1, which is part of thread group 1. Or thread 1 in PID 1, in the conventional userspace vocabulary.

It was a surprise to me that task/pid 1 does a whole bunch of kernel work before if morphs into the familiar userspace process! A large chunk of what I think of as the kernel booting technically happens in PID 1, albeit in a very different looking universe to init(1) in userspace. Why not do those bits in task 0, like the earlier bits of init?

PID 0 in multicore systems

If you’ve been following carefully so far, you may be wondering about the other CPU cores. So far we’ve run entirely single-threaded, and when we initialized the scheduler we explicitly told it to pin task 0 to the bootstrap core. When does that change?

The answer is, in task 1! The first thing kernel_init does is start up all other CPU cores. This means the bulk of the boot process that happens in kernel_init can make use of all available CPU power, rather than being stuck on a single thread. Starting CPU cores is quite intricate, but the exciting bit for our purposes is the call to smp_init. In turn, it calls fork_idle for each non-bootstrap core, creating a new idle thread and pinning it to that core.

This is where the “PID 0” term gets muddy, because these new idle tasks have non-zero thread IDs, but they are still part of thread group 0. So, in userpace parlance, PID 0 is a process that contains one pinned thread per core, with thread 0 pinned to the bootstrap core.

Erratum: some folks pointed out that the above paragraph is wrong! I’m grateful for the correction, which is as follows. As mentioned in the erratum earlier in the post, idle tasks are special-cased, and all idle threads across all cores share the same identity: thread ID 0, and thread group ID 0. This happens in a few separate places in code because of the different sets of fields that record TID and TGID, but it’s all within the fork_idle call.

First, fork_idle calls the common copy_process function to make a new task as a copy of the currently running task. Normally this would allocate a new TID for the new task. However, there is a special case that skips allocation of a new struct pid if the caller signals that it’s making an idle task. Then, fork_idle calls init_idle_pids, which further explicitly resets all the task’s identifiers to match init_struct_pid, which is the identity of init_task. As a result, every idle task on every CPU core shares an identity with the init_task we’ve followed through early kernel boot, and they all have PID 0 under both the kernel and userspace’s definition of a PID.

After that, smp_init runs bringup_nonboot_cpus, which does architecture-specific incantations to wake up the cores. As each core starts, it does a bit of arch-specific setup to make itself presentable, then runs cpu_startup_entry and do_idle, just like the bootstrap core did with task 0. All CPU cores are now alive and can run tasks, and kernel_init proceeds with the rest of boot.

I’m bad at conclusions

And that’s it! To summarize:

PID 0 does exist, it’s the one thread that starts the kernel, provided by the bootstrap CPU core.

PID 0 runs early kernel initialization, then becomes the bootstrap CPU core’s idle task, and plays a minor supporting role in scheduling and power management.

PID 0 has done this, with different degrees of fanciness but the same broad strokes, since the first Unix kernels. You can go read the source code of many of them and see for yourself! That’s cool.

PID 0 has nothing to do with memory management. In early Unix kernels it did some incidental memory management as part of process scheduling. PID 0 stopped doing that many decades ago.

On Linux, “PID” is ambiguous because userspace and the kernel use “PID” to refer to different values: the TID for the kernel, and the TGID for userspace. The kernel’s definition wins in practice for PID 0, because none of the entities that make up PID 0 are visible to userspace through the traditional Unix APIs.

On multicore linux systems, every CPU core gets an idle thread. All those idle threads are part of thread group 0, which userspace would call PID 0. They are also a special case in the kernel, and all share the single thread ID 0.

Seemingly all Q&A websites on the internet function primarily by paraphrasing Wikipedia. This is made evident and awkward when Wikipedia accidentally makes the web repeat incorrect information for 16 years.

This conclusion used to say that I now need to figure out how to submit an edit to wikipedia while complying with the various policies on self-promotion, sourcing, primary research and so on… But it looks like someone’s already made an edit, and provided additional sources for the important modifications. Thanks, mysterious benefactor!

Thanks for joining me on this chronicling of how I end up going on very large sidequests when presented with short, odd questions.