Building Better Teams

Articles and Posts

Blog

 

An Intuitive Guide to Dependencies

Anyone working on a modern software project has at one time or another descended through various levels of Dependency Hell. Sometimes these problems seem difficult to trace or solve. But why do we have these problems, and what can we do about it? And what role does SemVer play?

A released package can mean a lot of things to many people, so it’s important to get very precise here. When we talk about “A Released Package” it really means “one and only one exact and very specific unchanging set of files“.

Your project might use “TestingFramework”, and have installed “1.4.6“, but this is one specific package release. Also, 1.4.7 is another entirely different package release. The difference is small, but important: a package release can sometimes belong in a series of releases. That package is an unchanging and specific set of code. Re-tagging code doesn’t revoke the first package release, but rather just introduces a new package release with a rule violation that can fool a package management system (and sometimes break them). It’s not a “rewrite” of a release, but rather a re-assignment of a tag.

Packages

Packages by themselves are simply islands of code adrift and lost in a virtual endless space. Zip files, git repositories, commits, local folders… they are all “packages”. Even if you don’t manage them correctly, or use some popular tooling, it still exists in “package space”.

Unstructured “package space” contains all releases.

Unstructured “package space” contains all releases.

Tagged versions

One of the strategies to pick specific packages is quite simple: pin down a list of released packages required. This is most commonly a “lock file”, which uses the rule “There is one list of packages needed (even among dependencies of dependencies), and only this one list will satisfy the requirement.

Lock files ensure idempotent installations.

Lock files ensure idempotent installations.

There are other ways to pin down specific versions. For example, before NPM had a “shrinkwrap” (their version of a lock file), it was common to see projects with all node_modules committed to ensure predictable installs.

One important distinction about tagging is that it is a layer of indirection to packages. Normally it’s a perfect 1:1 ratio, but sometimes re-tagged modules will get created. Imagine using a dependency management system that followed the rule: “Install all packages directly required by the project using a tag reference“ and also “if a package requires more packages, install those by tag also“.

Installing Foo and Bar packages from their tags, at one moment in time.

Installing Foo and Bar packages from their tags, at one moment in time.

One of the risks with using this system is the potential for an author to re-release a tag. In the above graph, the author of Foo released 21.0.0, and a few days later discovered the wrong contents were in the package! So they re-released the tag. Any projects that were using Foo 21.0.0 will suddenly get an entirely different package. This problem is compounded by the fact that someone also re-released Baz 21.0.0! It’s easy to imagine the possibility for unexpected and sudden breaks in a system like this without changing the code in a project.

Another challenge with this system is performance. Propagating through each dependency to find the next level of dependencies is of course slower than simply installing from a lock file.

Semantic Versioning

A big advantage to using tagged releasing is the ability to build more rules and meaning into the graph. The most common example of this is Semantic Versioning: a system that indicates if the next release in a series of releases (a) fixes a bug, (b) introduces a feature, or (c) breaks the public API of the previous series of a package. While there is often obscurity between those three concepts across all users of SemVer, it has proven quite useful and quickly become adopted by many major ecosystems.

The rules are quite simple. Instead of pinning a version there is a constraint rule. Imagine a package has released four tags 1.0.0, 1.1.1, 2.0.0, and 2.1.0. Different dependency constraints creates different sets of possible dependencies:

  • 1.1.1: This is pinned to one exact tag.

  • ^2.0.0: This permits 2.0.0, 2.1.0

  • ^1.0.0: This permits 1.0.0, 1.1.1

  • ~1.1.0: This permits 1.1.1

  • ~1.1.0 || ^2.0.0: This permits 1.1.1, 2.0.0, 2.1.0

These constraints are incredibly powerful, allowing dependencies to be automatically negotiated instead of manually specified in full detail. Imagine a situation where our project uses packages Foo and Bar, and typically Foo itself also needs to use Bar (but usually with different versions).

It’s easy to imagine how complex a real-world example becomes.

It’s easy to imagine how complex a real-world example becomes.

In most package managers, there are a few caveats and language-specific challenges to be aware of. For instance, some languages permit recursively installed dependencies, such as node+NPM. This means package managers never have to check for conflicts between package constraints. On the other hand, it means projects with many dependencies can quickly grow to gigabytes in size. Worse still, if two different versions of a package pass objects outside of their libraries, it’s possible that the application deals with two different versions of the same object (from different but similar released packages).

A System of Rules

SemVer is powerful but only one kind of rule for specifying packages. Package lockfiles are another kind of rule. Committing dependencies is still a rule for how to find packages in dependency-space, but just not very efficient. Automatically navigating package compatibility and interoperability is still a relatively new concept, with the leading idea in the industry being SemVer.

Dependency Hell is a result of any system of rules. There is a huge benefit to using these systems, and at the same time, many drawbacks. Some of the causes of Dependency Hell should be visible now: re-tagging, complex dependency graphs, conflicting dependencies at install, or recursive dependencies that conflict at runtime.

At the moment this is the state-of-the-art, but not all ecosystems implement SemVer, and a few projects outside of the mainstream try to find new ways to innovate in this area. The dependency tools of the future are sure to bring greater benefits and more interesting challenges. For now, minimizing Dependency Hell is a distributed problem that requires a more intuitive understanding of the problems from all package authors.

Brian Graham