On typical days, we software engineers are usually stuck due to head scratching bugs, instead of actually writing interesting software. I had made some small changes and suddenly our end-to-end tests were timing out, with no useful clues. The harness emits a crazy number of log lines, so digging through them was difficult.
Fortunately, the problem was easily reproduceable on a local VM and with some piecemeal commenting I was able to isolate it to changing the thread name of the main thread, that I had done for some debugging information. All consumer OSes support some kind of thread naming facility, which is then used by debuggers/process monitors.
On Linux this is done by
pthread_setname_np(). According to the man
page this is accomplished by
procfs shenanigans, so
one can also write directly to a file. What they don’t tell you is how this
affects the process itself.
On Linux, threads are ‘light weight processes’. Each thread has a kernel ID
that is unique across the system, just like process IDs. The kernel
nomenclature for threads is tasks. Each process’ entry in
/proc (the current
process is accessible at
/proc/self) has a directory called
task/ with an
entry for each thread. It is really easy to see this:
Python 2.7.13 (default, Nov 24 2017, 17:33:09) [GCC 6.3.0 20170516] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.getpid() 9165 >>> os.listdir('/proc/self/task') ['9165']
As shown, the main thread maps to a task with the same ID as the process. Since
Linux 2.6.33, the name is written to
/proc/self/task/<task id>/comm. Commands
like top/htop and others can read this out. This change is also reflected in
>>> open('/proc/self/task/9165/comm').readline() 'python\n' >>> open('/proc/self/task/9165/status').readline() 'Name:\tpython\n' >>> >>> with open('/proc/self/task/9165/comm', 'w') as f: ... f.write('new name') ... >>> open('/proc/self/task/9165/comm').readline() 'new name\n' >>> open('/proc/self/task/9165/status').readline() 'Name:\tnew name\n'
What the man pages don’t tell you is changing the task name for the main thread, also changes the process name!
nsmnikhil@penguin:~$ ps -xo 'pid tid cmd comm' | grep 9165 9165 9165 python new name ...
>>> open('/proc/self/status').readline() 'Name:\tnew name\n'
Our test harness uses
interact with processes, and one of the routines waits around for the process
to start, by searching for it by name. Unfortunately for me,
psutil.Process().name() uses procfs to get the name on Linux. The harness was
looking for a process whose name I had changed! It kept waiting expecting the
process to turn up, which it never did, and caused the timeouts!
The easiest fix was to ignore the name change when the calling thread was the main thread. Patch landed!
The userspace thread implementation provides
pthread_t as the unique thread
identifier. There is no way to infer the mapping from pthread_t to the kernel
task ID without accessing non-public structures. pthread_t is not the same as
the kernel task ID. Instead, one may use the
gettid() system call to obtain
the task ID of the calling thread. Of course, this is not exposed by libc, nor
by Python, but it can be called by using
syscall() directly. I do not know of
a publicly documented way to get the task ID for another thread.
pthread_t is usually a pointer to a struct that is unique per thread, and the
address itself is used as the ID. In fact, in another implementation-specific,
don’t rely on this kind of knowledge, CPython simply casts the pthread_t to
uint64_t and uses that as the Python
I also found this lovely bit about procfs abuse in LXC.