Library Design Gotchas: Configuration loading

A friend was complaining about this library they were trying to use that was failing to load a configuration from a file. The resulting dive into the code inspired this post about inappropriate choices made when designing how a library is configured. It isn’t my intention to pick on pyart. I appreciate the hard work the developers did to create it and open source it. It is just the example at hand.

Retrieving function arguments while unwinding the stack

When a debugger, profiler or crash reporter is unwinding the call stack, can it reliably retrieve the function arguments of every function in the stack? I originally intended for this to be a later part of the Sampling Profilers series, but a recent discussion with Ben Frederickson, and his subsequent py-spy implementation helped crystallize my thoughts and I figured I’d write it down separately. Retrieving function arguments is “trivial” in certain cases and pure guesswork in others.

Sampling Profiler Internals: Suspending Threads

This is part 2 of the Sampling Profilers Internals series. Introduction Suspending threads Stack unwinding Symbolication Presenting profile output Extending the profiler to managed languages As described in the introduction, a sampling profiler captures the stack of each thread every few milliseconds. To do this, it is preferable to suspend the thread 1. We don’t want the stack changing under us as we are profiling.

Sampling Profiler Internals: Introduction

Sampling profilers are useful tools for performance analysis of programs. I’ve spent a lot of time over the past several months digging into various implementations of sampling profilers and reading a lot about symbolication, threads, stack unwinding and other gory operating system details. This series of posts attempts to summarize my understanding. Introduction Suspending threads Stack unwinding Symbolication Presenting profile output Extending the profiler to managed languages Background High CPU usage is a problem that comes up often in widely used software.

Diving into the Python call stack (PyGotham 2018)

I gave a talk at PyGotham 2018 about how Python implements stack frames and how Dropbox leveraged that to improve crash reporting using Crashpad. I also contributed to the Dropbox Tech Blog post that goes into great detail on the crash reporting pipeline. The talk was less about Crashpad and more about Python internals. There is a video and slides. As part of preparing for the talk, I wrote the following post.

The main thread name and process name are the same thing on Linux!

On typical days, we software engineers are usually stuck due to head scratching bugs, instead of actually writing interesting software. I had made some small changes and suddenly our end-to-end tests were timing out, with no useful clues. The harness emits a crazy number of log lines, so digging through them was difficult. Fortunately, the problem was easily reproduceable on a local VM and with some piecemeal commenting I was able to isolate it to changing the thread name of the main thread, that I had done for some debugging information.

Why does my stack have an extra 4 bytes? Digging into Clang's return value implementation

I was futzing around with some C code a few days ago and noticed that executables generated by Clang would sometimes have an extra 4 bytes on the stack. This was just for the main function. We can verify this is Compiler Explorer. Try switching to GCC and this doesn’t happen. This was interesting, so I spent a few hours over the holiday digging into why and how this happens.

CppCon 2017 Talks I enjoyed

I spent a recent holiday listening to several CppCon talks. I’m hooked! I was impressed by the generally high quality of the talks. The lack of “use my/my company’s framework which is so awesome!” talks was refreshing. In addition, the use cases where C++ trumps most competition are often either performance sensitive, or correctness sensitive under strong constraints. Of course, this means contending with the syntactic and semantic complexities C++ throws at you.

Using Windows Job Objects for Process Tree Management

Using child processes to perform various tasks is a standard construct in larger programs. The simplest reason is this gets you memory isolation and resource management for free, with the OS managing scheduling and file descriptors and other resources. A common requirement when using multiple processes is the ability to wait on or kill one or more of these children. It is not always possible to record process IDs at fork(), since the fork may happen in a library that does not give you such access.

Debugging MacOS file locking with DTrace

A few days ago, I was stymied at work by a set of tests that had intermittent failures on OSX but not Windows. There was a process which would try to obtain an exclusive lock on a file, using the lock-on-open provided by the BSD/MacOS O_EXLOCK flag to open(2). It also used O_NONBLOCK; if the file was locked by another process, it could be skipped. The process would hold the lock and remove (unlink(2)) the file, before close(2)-ing the descriptor.