Why Your Company's Documentation Sucks

2021-03-10 - (8 min read)

There is a common trope about software engineers that they hate to write and maintain documentation. From a shallow view of all of the companies I have worked for, this stereotype seems to hold. Each company has had absolutely garbage documentation: it didn't exist, it was poorly organized, it was completely out of date, and/or it was written terribly.

Yet, the engineers I have worked with have never been unhelpful; all of them would go out of their way to write long Slack comments or emails or even ad-hoc documents in order to explain concepts or projects. These were all forms of documentation, and extremely useful, but were ephemeral and lost in time (or linked-to in a messy manner).

Interestingly, when a manager came in and tried to force everyone to use "proper" documentation tools, the documentation would get worse! Instead of using ad-hoc methods, which caused you to get yelled at, people just stopped -- they didn't start using the "proper" documentation tools. Well, they did, for a little while, but inevitably the "proper" documentation just decayed and eventually nobody used it. My favorite behavior was engineers linking to the documentation in public Slack channels, and then immediately following up in a private correspondence explaining that the docs were terrible and provided a short-hand explanation (the ad-hoc method, but in private).

The amount of time and effort spent on ad-hoc documentation shows that software engineers don't hate writing documentation. We all know that it saves time and effort, and helps make our coworkers more autonomous. So why the hell is documentation consistently so bad, and why do engineers spend more effort on ad-hoc documentation than using the proper documentation tooling that the company specifies?

The issue is that the documentation tools that so many companies have standardized on are absolutely garbage for actually writing documentation. And it isn't just because they are buggy, slow, and/or UX disasters (Confluence I'm looking at you!). The issue is in the way that these documentation tools attempt to organize information. I call it the Tree approach versus the Graph approach.

The Tree Approach🔗

The obvious way to organize information is with a tree. For instance, look at the way filesystems work. We have a hierarchy of files and everything is organized within this system. To find some X piece of information, we can simply start at the top and travel down, choosing the branch which makes the most sense for what we are looking for. For instance, the documentation for a set of projects at a company might look like so:

Projects
|- Project A
|  |- Architecture Overview
|  |- Setup Guide
|  `- ...
|
|- Project B
|  |- Architecture Overview
|  |- Setup Guide
|  `- ...
|
`- Project C
   |- ...

As software engineers, we know that a balanced tree like this has lookup time of O(log(n)). This seems great: as our collective knowledge grows the amount of effort to find any given part of it is logarithmic to the size of that knowledge, which means we should always be able to find what we're looking for in a short amount of time. And this is true, for a perfectly organized body of knowledge.

The issue is when we have a piece of knowledge we want to document that pertains to both Project A and Project B -- where does it go in this hierarchy? Do we duplicate the information to both sections? Do we create a new section entirely just to deal with documenting these edge-cases? Do we just put it in Project A and somehow link to it somewhere in Project B's documentation? Or do we just put it in Project A because it is slightly more relevant and just ignore Project B? Each person trying to add to the collective knowledge has to go through this thought process, and they will not make the same decision. This is documentation rot, and it makes it so that you are not confident where some bit of information might be if it doesn't perfectly fit into the hierarchy.

Looking back at our knowledge lookup time, it is clear that it is only O(log(n)) if we actually make the right decision on each branch. But if our information is poorly organized and we don't make the right decision on a branch, we are now in backtracking territory. Unlike computers, humans are really bad at backtracking, because we think "well maybe I just missed it when I went down branch X." This leads to frustration, this leads to annoyance, and it leads to engineers loosing trust in the documentation. And if they don't trust the documentation they won't use it, and if they don't use it they won't add to it.

The Graph Approach🔗

Let us consider two of the most widely revered documentation systems: Wikipedia and the Arch Linux Wiki. Anyone who has used either knows that they are just incredible. In terms of managing information and collective knowledge, I don't know of anything that is better than Wikipedia or the Arch Linux Wiki. This is in large part due to the wonderful job of the maintainers of these bodies of knowledge (huge thanks to these folks), but I would argue it is also because they organize their information in a different way from a strict hierarchy.

Both take a graph approach to organizing data. Instead of a hierarchy in which everything is organized, each document is basically free-floating. Rather than where a document lives determining which other documents it is related to, it instead uses links between documents to create that relationship. Look at a random Wikipedia page and see how many links there are to other pages that are related. This dense network created by links between pages is what makes these documentation systems the best around.

Let us consider the lookup time for any given piece of information. This is a bit trickier to determine since it is dependent on the density of links. In fact, I don't think we can determine the algorithmic runtime without specifying a few other parameters. It is certainly less than O(n), and likely approximates O(log(n)), but it doesn't actually matter. What matters is worst-case lookup time.

If we reach the wrong page, rather than having to potentially backtrack to the root, we can just do a random walk around the page we are at. In a minimally organized system, this should still lead us to the information we want, because the pages will be linking to other pages that are relevant.

More importantly, new pieces of information can just be added, and there are no decisions that need to be made about where to put that information. If a piece of information is relevant to Project A and Project B, we just add it as a new document and link from both Project A and Project B. This is now the natural process, and scales organically. Of course, some care is required to prevent this from becoming a complete mess, but it isn't particularly difficult to put some guidelines on this.

It should come as no surprise that the ad-hoc documentation methods I described above are effectively the graph approach. Free-standing documentation that is linked by people (rather than within the documents). Yet instead of leaning into this natural way of doing things, I have seen companies use tools which fight against this way of organizing information.

Conclusion🔗

Every company I have worked at has used a tree-based approach to organizing documentation, and they have all been absolutely horrible. Two of the most popular, and largest, sources of documentation, Wikipedia and the Arch Linux Wiki, both use a graph-based approach to documentation and are widely loved and easy to use. One group has paid employees whose job is partly to keep this documentation up-to-date, and the other group is full of volunteers. If that is not argument enough to ditch Confluence for Media Wiki then frankly I don't know what is.

Of course, this does not mean you should not have other guidelines for your documentation. A graph-based system does not solve all of the problems. For instance, the 4 types of documentation is important to keep in mind and use. A mess will always be a mess, regardless of whether it is in a hierarchy or a graph.

Stop using hierarchy based documentation. It doesn't scale, it requires far more effort, and it is simply holding your organization back.

Do start using a graph-based system for documentation. It scales easily and engineers will naturally contribute.