This is not a tutorial. This is more like "here's a technique I used in the past and I'm writing it down so someone could do it again." Implementing this will require custom coding that I did not write for you. If you have ideas about how to do this concretely (or differently), please leave a comment!

The subject of this article is a TypeScript codebase, but the same ideas could apply to projects written in any language.

The Problem

Once upon a time, I worked on a large single-page React app written in TypeScript. When the app got big enough, we started getting frequent runtime crashes due to circular imports (for example, if A imports B and B imports A, A doesn't yet exist when B gets loaded). We'd fix one cycle but then it would soon happen somewhere else. We had to do something to keep this problem from impeding us.

There are a lot of reasons one might want to remove circular dependencies, but it's clear to me that keeping a codebase void of cycles is beneficial. Here are what I see as the two main advantages:

Prevent some runtime errors when the app is being loaded (the problem we had above);
Enable smaller compilation units, which sets the stage for all sorts of great stuff like incremental builds/tests, code splitting, and more.

A Solution

Our codebase was big, so one person couldn't just go in at once and remove cycles. Without a way to track them, it was far too easy for people to unknowingly introduce new cycles in the course of their normal work, and also easy for a person removing a cycle to accidentally add a different one. We needed to make sure that we could remove them one by one, over time, without slipping back.

My solution looked like this:

Add a local shell command that generates a list of all the import cycles in the codebase.
That list gets checked in to source control so we can easily compare the cycles in our locally-changed codebase to whatever was there before.
As part of every build (local and on the CI), the same command gets run to generate a list of all the cycles in the codebase. If the newly-generated file and the checked-in file are not an exact match, print the differences and fail the build. If a developer was changing cycles intentionally, they could first manually generate the file (by running the command) and check it in along with their changes.

This model was successful in keeping the number of cycles on a mostly-downward trajectory over time. By forcing the developer to manually "lock in" their changes every time they added or removed cycles (and supplying helpful error messages at build time), developers were able to understand the consequences of their changes and refactor their code accordingly to make sure cycles were being removed, not added.

How to detect the cycles

There are many ways to do this, but ideally it needs to be fast.

I used the Webpack circular-dependency-plugin, which has a big disadvantage in that it only detects runtime imports. TypeScript type-only imports are stripped out by the time Webpack runs. This is not terribly slow, but it's not terribly fast either. I'm pretty sure we could do better.

There's also https://www.npmjs.com/package/eslint-plugin-import, but it will run very slowly on large codebases, because the linter (by design) analyzes each file in isolation, so it's duplicating a lot of work when traversing the dependency graph.

If I were to do this again, in lieu of finding an existing tool that's fast enough, I'd consider writing a custom standalone tool. We should be able to get the imports from each file using the TypeScript compiler API and analyze the cycles using an O(n) algorithm such as Tarjan's strongly connected components algorithm. The downside here is that we wouldn't get in-IDE feedback (like we would with ESLint) when a cycle exists.

File format

As for the file generated by the tool, I used a JSON format so it could be easily parsed by scripts but also read by humans. Importantly, this file must be normalized, so that the cycles will be reported the same even if some code has moved around in minor ways. I normalized the raw results by sorting them three times:

Sort cycles by length, descending. That makes it easy to find the biggest ones to refactor first.
Rotate each cycle so that its first-alphabetical filename is at the start. So, B->C->A and C->A->B would both be reported as A->B->C.
Within each length and after rotating, remove duplicates and sort alphabetically.

Also in the output file, I included a JSON object that mapped each file name to the number of cycles in which it appeared (sorted in descending order by count). This also proved to be a helpful tool in deciding which file to break down next.

Build output and manual inspection

The command line tool would have two modes (specified via args): one to generate the file in memory (run by the build script), and another to actually write it to disk (run by developers). The role of doing this in the build script is to make sure that cycles aren't being added, but also to make sure that if cycles are removed, a new version of the tracking file gets checked in. In my solution, the build generated the cycle list in memory and compared it to the one on disk. Depending on the result, one of the following could happen:

If the files were an exact match, quietly succeed/continue the build.
If the files were not a match, compare the array of normalized cycles to figure out which were added and which were removed. Display this information to the user.
- If only removals were found, invite the user to manually regenerate the file and build again. With the regenerated file, subsequent builds would pass.
- If there were additions (or a mixture of removals and additions), invite the user to manually inspect the list and decide what to do (either check in the new file anyway, or change their code to avoid adding cycles).

Because the file is normalized and checked in to source control, easily-inspectable diffs would show up on pull requests so everyone is aware of the cycles being changed. And, of course, the CI build would fail unless the checked-in file matched the freshly-calculated result.

Conclusion and next steps

A technique like this can help a team remove cycles over time on a large codebase, if you're willing to write some custom build steps. This system would remain in place to track the cycles until it gets down to zero, at which point it could be replaced with a much simpler build step that simply fails if any cycles are detected (no need for the tracking file).

Please comment if you know of another good cycle-detection tool that could work with this system!

How to remove circular dependencies from a fast-moving codebase

The Problem

A Solution

How to detect the cycles

File format

Build output and manual inspection

Conclusion and next steps

Comments

More from this blog

Drbr’s A11Y Resources

Test post

Command Palette

The Problem

A Solution

How to detect the cycles

File format

Build output and manual inspection

Conclusion and next steps

Comments

More from this blog