CppCon 2017 Talks I enjoyed
I spent a recent holiday listening to several CppCon talks. I’m hooked!
I was impressed by the generally high quality of the talks. The lack of “use my/my company’s framework which is so awesome!” talks was refreshing. In addition, the use cases where C++ trumps most competition are often either performance sensitive, or correctness sensitive under strong constraints. Of course, this means contending with the syntactic and semantic complexities C++ throws at you. This leads to an engineering mentality biased to investing in quality and maintainability, or just sheer, bare metal speed, both of which lead to interesting conversation topics. In addition, the talks are usually about the impact of low-level tinkering to account for real hardware, or about processes required to validate functionality in… hard to unit-test situations, which often involves novel solutions.
Dr. Titus Winters
This was my favorite one of the bunch for a couple of reasons. The talk is about managing emerging complexity in source code and tackling maintainability across several years, thousands of developers and millions of lines of code, straight from the horse’s mouth.
The talk frames dependency management as a process issue rather than a technology issue, where, by technology, I mean any and all of:
- Package managers
- Semantic Versioning
- Dependency Pinning
My favorite quote of the talk is:
Software engineering is programming integrated over time
I’ll let the talk best explain why that is spot on.
Another “law” that several Googler talks referenced was Hyrum’s Law which I thought was a great way to phrase the fundamental problem in dependency management.
Google’s solution to this problem is to chuck technology for change management and instead use technology for change enablement. That is, as long as users of the library promise to use the library as specified by library authors, library authors will ship tools and processes to make upgrades as seamless as possible.
This requires an organization wide cultural setup that has to be given enough resources; as such, it probably isn’t applicable to a startup, but at Google’s scale, the developers assigned to the task, and the cultural costs pale in comparison to the ability to improve code across millions of machines seamlessly. This is particularly useful in the context of time or memory optimizations, where Googlers routinely refer to making changes that say, reduce fleet-wide RAM usage by 0.1%. Assuming Google has 106 servers, with 32GB of RAM each, that would let them save ~1000 computers and the associated power and cooling costs.
Live-at-head is facilitated by a mono-repo, which makes updates and synchronization easy, a build-from-source toolchain (with extreme caching of build artifacts!), and build tools that allow querying. In addition, tools like libclang (clang-tidy & co.), able to parse C++ semantically, are another requirement for large scale source analysis and transformations.
Considering that linters and analysis tools are typically easier to write or already exist for other languages, adopting live-at-head to other platforms and organizations is not impossible. I see two major constraining factors.
Test coverage is typically not as aggressively enforced in other code bases.
Higher-level and dynamic languages do not suffer from the language complexity and ABI issues that often trip C++ libraries. Instead, these languages are affected by a more pernicious problem, that often manifests at run-time, often in production. That problem is of course, the lack of types. Having recently been bitten by several of these, I’m still not sure how to solve this entirely, apart from aggressively driving unit test coverage. There exist good analysis tools (for example, we use MyPy at Dropbox), but they often break down at I/O boundaries, such as dealing with schema migrations.
Takes a look at duplicate initialization when linking static and dynamic libraries with the same globals. It was a good introduction to exactly how static vs dynamic linking is done.
Particularly :headdesk: for me was that the order of command line arguments affects this!
Quick look at a collection of simple to implement and understand, but very useful algorithms. I was already aware of Reservoir Sampling, but Heavy Hitters seems like a really useful load-shedding algorithm.
This was a fascinating journey through Google’s hash table implementation, and
by extension how hashmaps are implemented in
std and in other languages. In
particular, the transition to L1 cache as the major bottleneck, and thus
optimizing for fitting the initial lookup operation and subsequent list
iterations (to confirm identity) in cache lines.
Also see Raymond Hettinger’s Modern Python dictionaries – A confluence of a dozen great ideas.
tl;dr don’t bother hand optimizing code until all other avenues are exhausted.
I first discovered Compiler Explorer a few months ago and it is an amazing tool for poking around compiler generated assembly! This is a funny and insightful talk from the author about all the magical optimizations modern compilers do to preserve C++’s speed and zero-cost abstractions promise.
The fact that clang can do full-function optimization of common access patterns and turn them into single x86 instructions was mindblowing.
I liked the focus on deadlock detection across all stages of the software development life cycle. The comprehensiveness is commendable. The topological sort algorithm for detecting deadlocks is well known, but implementing it at various stages to account for different restrictions, plus using some similarities to pick between mutex-as-node vs thread-as-node based on what the stack provides easier access to is cool.
The fact that bcc ships with a deadlock detector that is “pluggable” means that it should be easy to adopt this to other resource checks and other languages. You’d just have a “backend” tool generate something this python script could parse into a graph.
I’ve spent a lot of time in the past few months poking around process address space, including building DLLs/dylibs with custom sections and reading arbitrary process memory to improve crash analysis, so this talk was particularly fun! The speaker dives into the DLL format, discusses various tools that ship with Windows to inspect them and how DLLs are loaded into executables.
There are several other great talks I’ve not got around to watching yet, so I recommend checking out the CppCon YouTube channel. In particular, a lot of Bloomberg and high-frequency-trading firm engineers talked about optimizing for the fast-path, that, while interesting, doesn’t come up often in the work I do.