Experiences with Legacy Systems
Making sure a legacy system becomes healthy again is a struggle for any mature company. Dealing with it requires patience, perseverance, and strategy.
👋 Hey, I’m Milan. This is Organizing Automation, my newsletter on building state-of-the-art digital products and the organizations that deliver them.
A recent assignment has had me looking into a legacy system. Although it’s not the oldest system, it’s massive. Trying to understand it with a team - and add new features along the way - is a big challenge. My team is tasked with maintenance, functional updates, and technical improvement. The system is at the core of the organization, but adding new functionality becomes harder and harder. Slow updates means that alternative systems are created. This leads to a difficult and incoherent system landscape.
A recent conference visit showed me that alongside all of the exciting new topics like LLMs, dealing with legacy systems was a steady topic that got a lot of attention. It is natural for long-running systems to turn legacy, and all mature organizations must deal with this. The project has increased my depth of understanding when it comes to building systems for the long term.
I want to share my journey with legacy. Diving into the difficulties for my specific situation and sharing our hard-earned lessons will help you deal with similar issues. We are definitely not done with this project, but we’ve come a long way. I’d love to hear your own experiences and suggestions to effectively handle systems like this.
Before we start, an important note: legacy systems require you to leave your judgment at the door. It’s easy to look at the final result and call out everything that’s wrong with it, but that does not give credit to the team that built the application and had to make many choices along the way. Everything has a reason and made sense to the people that built it at the time, with their knowledge and ability. By not wasting any time with judgments, you can stay focused on improvement.
What Makes It Legacy
The Data Model
As an information analyst, I often start understanding the technology of a new system in the database. It tells me which objects exist, what relationships exist between them, and the size of the system. Understanding the objects that the system works on, helps me contextualize the behavior.
In the case of the project, the data architecture was complicated. The database is not normalized. The same data occurs in many places and is synchronized with stored procedures. There are few foreign key constraints. Table names are far away from real-life objects, containing vague words like “Item”, “Import”, or “Staging”.
This makes it hard to understand what is actually happening in the system. Discussions quickly become hard to understand, because they are not referencing conceptual objects. The system has drifted away from its business context.
Solving this is not as simple as renaming the tables. In many cases, the data in one table does not actually represent a (conceptual) object, but a collection of bits and pieces that works together. The reason we are in this situation is that each table should be able to fill a frontend page without needing to join extra tables.
No Hints
The most difficult part of our journey: the system worked, but was undocumented and did not contain many automated tests. Code was complicated (partly caused by the complicated data model) and took a lot of time to understand.
The system was running, meaning people were depending on it, but we didn’t know all of the things it could (and would) do. This together with the lack of tests made change extra risky - any time we made a change, we ran the risk of accidentally knocking some other functionality over.
The architecture of the system was not described and not self-explanatory. Additionally, the system landscape around our system had grown, and there was no clear demarcation of which system had which responsibilities. Every system has an architecture though, and when none is (purposefully) chosen, one occurs. This is usually not favourable. Not being able to read documented architecture tradeoffs made it a lot harder to understand the rationale of the code.
All of this made it hard to get started. We decided to simply get started, do what we could to create understanding, and monitor our progress. Let’s now look at the lessons we learned and how we turned legacy into vintage1.
Turning Legacy Into Vintage
It starts with the business
Any useful system (especially core ones like the one I was looking at) plays a role in an organization. That usually means there are people interacting with it. One of the biggest things we did for understanding is talk to the people to understand parts of the system, discussing with them what they used it for and why. This is a continuous effort that improves your understanding over time.
The goal is to understand the actual domain objects that might exist and see how they behave and interact. Although you may still not know exactly what the system does, you do know what is expected of it and which domain underlies it.
When we understood that there were many domains in the application, an initial attempt was made to separate these domains.
Knowing that there are multiple business domains puts everything in perspective. However, drawing the lines between domains can be very tedious, especially at the detailed level. This meant that we did not want to commit too early to definitive new domains. We kept the monorepo setup for the majority.
Remove, Refactor, Rebuild
When you understand what the system does, a great first question is: “should it (still) do this?” Any code that you can get rid of, because it’s no longer relevant or another solution exists, can be cleaned up. We ran into quite some features that were not used anymore, never used in the first place, or used by one or two users. These were good candidates to get rid of, meaning the team could remove this from their cognitive load.
Because the system was still live, we employed both refactor and rebuild strategies to improve the system. Rebuilding was hard because it required us to be confident in our outlined domain, and still needed to integrate with the rest of the system. However, refactoring efforts were usually unable to remove the complexity, as the code was just too tangled. The strangler pattern2 is very useful when tackling legacy software.
Monitor and Test
If the system is live, you must make sure it keeps working while you change it. This is where monitoring and testing come into play.
There is a balance between monitoring and testing - if you invest extremely into testing, you will need to monitor less. The chance of mistakes slipping through is very small. We had no automated test suite, which meant that monitoring was a top priority. In our scenario, it is okay if something breaks, but we need to know about it as soon as possible to take appropriate action, like rolling back or repairing data. Monitoring gave us more confidence so that we did not just have to rely on the best-effort manual tests.
In addition, we put a policy in place that any changes, including bug fixes, were provided with tests that would prevent regression. This made fixing more tedious, but allowed us to slowly get more grip.
Prioritize and Dedicate
Understanding legacy software requires a huge amount of deep thinking, reasoning, modelling and validating. Undertaking a major refactor from a legacy system is not something that should be done on the side: it requires a team to fully dedicate itself to it. Figuring out how the system worked for us was highly unpredictable. Any estimates made were practically useless - the system was simply too unknown. If getting rid of legacy is a priority, it still pays to identify isolated steps to be taken, but anticipate some extended timelines when taking the steps.
We spent significant time on deciding the best way to start this whole process. In hindsight, starting somewhere would have been better than considering where to begin. It may cause more rework in the future, but only rebuilding or refactoring will actually give you more understanding. A no-brainer has been to introduce our software engineering standards “from here on out” - being strict on enforcing our high-standards way of working led to more stability, despite the codebase.
I’ve put the most important points in a flow:
Our effort of turning legacy code into vintage is definitely not complete, but I hope that you can benefit from my insights! As mentioned, please provide your own strategies for working with such a complex issue.
Calling maintainable legacy “vintage” was introduced to me by Shawna Martell at QCon San Francisco, 2024.