The main thread name and process name are the same thing on Linux!

Posted on Jun 22, 2018

On typical days, we software engineers are usually stuck due to head scratching bugs, instead of actually writing interesting software. I had made some small changes and suddenly our end-to-end tests were timing out, with no useful clues. The harness emits a crazy number of log lines, so digging through them was difficult.

Fortunately, the problem was easily reproduceable on a local VM and with some piecemeal commenting I was able to isolate it to changing the thread name of the main thread, that I had done for some debugging information. All consumer OSes support some kind of thread naming facility, which is then used by debuggers/process monitors.

On Linux this is done by pthread_setname_np(). According to the man page this is accomplished by procfs shenanigans, so one can also write directly to a file. What they don’t tell you is how this affects the process itself.

On Linux, threads are ‘light weight processes’. Each thread has a kernel ID that is unique across the system, just like process IDs. The kernel nomenclature for threads is tasks. Each process' entry in /proc (the current process is accessible at /proc/self) has a directory called task/ with an entry for each thread. It is really easy to see this:

Python 2.7.13 (default, Nov 24 2017, 17:33:09)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getpid()
>>> os.listdir('/proc/self/task')

As shown, the main thread maps to a task with the same ID as the process. Since Linux 2.6.33, the name is written to /proc/self/task/<task id>/comm. Commands like top/htop and others can read this out. This change is also reflected in /proc/self/task/<task id>/status.

>>> open('/proc/self/task/9165/comm').readline()
>>> open('/proc/self/task/9165/status').readline()
>>> with open('/proc/self/task/9165/comm', 'w') as f:
...   f.write('new name')
>>> open('/proc/self/task/9165/comm').readline()
'new name\n'
>>> open('/proc/self/task/9165/status').readline()
'Name:\tnew name\n'

What the man pages don’t tell you is changing the task name for the main thread, also changes the process name!

nsmnikhil@penguin:~$ ps -xo 'pid tid cmd comm' | grep 9165
 9165  9165 python                      new name
>>> open('/proc/self/status').readline()
'Name:\tnew name\n'

Our test harness uses psutil to interact with processes, and one of the routines waits around for the process to start, by searching for it by name. Unfortunately for me, psutil.Process().name() uses procfs to get the name on Linux. The harness was looking for a process whose name I had changed! It kept waiting expecting the process to turn up, which it never did, and caused the timeouts!

The easiest fix was to ignore the name change when the calling thread was the main thread. Patch landed!

The userspace thread implementation provides pthread_t as the unique thread identifier. There is no way to infer the mapping from pthread_t to the kernel task ID without accessing non-public structures. pthread_t is not the same as the kernel task ID. Instead, one may use the gettid() system call to obtain the task ID of the calling thread. Of course, this is not exposed by libc, nor by Python, but it can be called by using syscall() directly. I do not know of a publicly documented way to get the task ID for another thread.

pthread_t is usually a pointer to a struct that is unique per thread, and the address itself is used as the ID. In fact, in another implementation-specific, don’t rely on this kind of knowledge, CPython simply casts the pthread_t to a uint64_t and uses that as the Python threading.Thread.ident thread identifier.

I also found this lovely bit about procfs abuse in LXC.