Library Design Gotchas: Configuration loading
A friend was complaining about this library they were trying to use that was failing to load a configuration from a file. The resulting dive into the code inspired this post about inappropriate choices made when designing how a library is configured.
It isn’t my intention to pick on pyart. I appreciate the hard work the developers did to create it and open source it. It is just the example at hand.
In the instance above, simply executing this line within the production environment at my friend’s job was failing:
and the backtrace pointed to the configuration file loader. The
library tries to load its configuration from a file. Unfortunately, in the
production environment, the Python code is bundled into a custom archive. Their
environment adds additional module loaders to the Python import machinery so
import foo statements can import from the archive. pyart is
SourceFileLoader, which is not aware of this. It tries to treat
the archive as a directory and fails. Reading more of that code, pyart is
violating several conventions that make configuring the library difficult.
Keep configuration within the API
By its nature, a library cannot control the execution environment users will use it in. This means the sole method of configuring the library must not depend on the execution environment. In this case it is loading a file and retrieving constants from it. In another case it might be reading values from environment variables. These methods constrain the places your library can be used.
At the lowest level a library should offer a configuration option that is
entirely within the bounds of the language it is written in, as that is the
only constraint to which the user of your library is also bound. In Python
this can mean letting a user pass a
Configuration object to the library
initialization routine. In C, this may involve a struct. This allows the user
to retrieve configuration from wherever they want, create the configuration
object, then start the library. They might be retrieving one setting from the
user’s location, another from the current temperature, and a third from a radio
transmission! They might be running code on embedded hardware that has no
filesystem! This will still work because clearly they have enough
infrastructure to get the language runtime executing.
If you’d like to expose a default configuration, it is trivial to export a
DefaultConfiguration object or constant that creates a
the right values.
Configuration should not be global state
When the user passes some configuration to your library, avoid modifying global library state with it. There are rare situations when an application needs to store configuration in global state, but I’ve yet to encounter a situation where a library has to do this.
Global state is bad for a bunch of other reasons. My top two – forced singletons and difficulty testing.
If an application wants to use the same library in multiple places in the code base, global state makes it impossible to use different configurations at those 2 points.
Testing also becomes annoying. Any tests the user of your library writes can no longer be run in parallel. If they want to speed up test execution, they’ve to now jump through a lot of hoops. If the user is not aware of internal global state, they will encounter hard to reproduce race conditions that will leave them scratching their head.
Once your basic configuration is moved into a data structure, it is definitely nice to provide utility functions that can read a configuration from a byte stream or a file. This is particularly useful if your configuration is complicated, or you need some validation beyond just “this is a string, this is a bool”. When the user is in a circumstance where they can use these helpers, they will be glad you provided them. Just don’t let it be the only way to configure the library. Don’t give in to the temptation to mix I/O and parsing!
Keep imports and initialization separate
I feel the situation is particularly bad in Python-land because arbitrary code execution is allowed during import. Avoid running code during an import. The import exists to declare classes and functions! Initialization must be a separate process. The library user may not want to use your library for several days after their app first starts running!
If pyart had not tried to initialize itself as soon as it was imported, things
would have been usable even with the previous two violations. The application
could’ve imported pyart along with all its other imports. Then it could read
default_configs.py file using some code that knew how to find the file in
a production environment, write it out to a normal directory, modify the
environment variable, then initialize pyart. Now the application code has to do
all this before it can even import pyart. This makes the application code ugly.
It has the import statement somewhere two levels deep in some function.
If you need global state, have a boolean (use
thread-safety) that determines if the
init() or similar function is called,
and fails other parts of the library API if it hasn’t.
You may have a legitimate use case where you’d like users to be able to use parts of your library as an application to do some basic tasks. For example your library may be able to process a CSV or something.
The library should ship with separate scripts that use the library as a library, and ask the user to run those scripts as binaries. In languages like Python, you can even hide the script like behavior behind
if __name__ == '__main__': do_some_work()
Library design is always challenging as authors have to balance usability and configurability. Exposing a nice way to configure the library leads to a pleasant initial experience for the user as they learn to use your library, and also lets them adapt a library to the needs of their application without jumping through hoops. It makes them happy!