Figuring out how to test complex, distributed systems causes a lot of headaches.
It’s a tough question.
And the thing is, there is no one-size-fits-all answer.
The way you test your distributed system really depends on the nature of your system.
Distributed systems serve a specific need, so its architecture is built in a way to serve that business need. Chances are, the testing methods vary as well, and I haven’t even mentioned the number of independent services that also influences the testing methodology.
Since it’s impossible to come up with a universal answer for testing distributed systems, we did interviews with 6 cool tech leaders to reveal their methods of testing.
In this post, they will talk about how their systems work and the methods they use to test them.
Number of services: There are 7 major subsystems in our platform, totaling around 300 services.
Yroo allows you to search millions of products online and compare the prices of different shops.
Yroo is pretty complex, but I think we can say the main components are clustered, serving the end user whether they are on a mobile app or website or another environment. We have a cluster that’s distributed, serving the end user.
The other side is basically the data processing. We’re a big data company, so we process hundreds of gigabytes of data every day.
The cluster that serves the end users is pretty traditional, servers that run program on it serve as web endpoints and API endpoints.
On the data processing side, we have a serverless architecture. We use lambda AWS to handle processing data because of cost-effectiveness and ease of management.
We can’t predict how high the volume of data will spike when Christmas comes in the retail industry. In this case, serverless just makes sense.
The third component is the database, where we store a mass amount of information. We use Dynamo, distributed and managed by AWS.
The final component is the machine-learning component. We moved everything to SageMaker AWS, which is still flexible but at the same time, it makes scaling much easier.
We follow a pyramid approach.
We have unit tests at the bottom. It doesn’t matter if you’re distributed or not; units are always tested at full coverage
Then comes the integration layer for the distributed components. We follow a very service-oriented interface; the big components we built (whether its lambda or a trigger on the database) make integration tests a lot easier here.
System-level testing: When the distributed nature comes in, we typically follow well-set out standards.
We essentially outsourced distributed aspects to the platform provider (AWS lambda function). As long as the lambda function works, we can trust the distributed aspects of lambda. We draw a boundary where our test ends and where lambda starts. We don’t have to worry about scaling up or scaling down.
We use different tools for different environments. Most of our lambda is written in Java. Integration tests are built using Java framework, while our server layer is mostly running ruby (ruby’s stack). SageMaker is mostly written in Python but that side is a little bit different, but tests run in Python.
We use whatever language is appropriate.
We used to write our own distributed code that process the data, but that introduced a lot of headaches, because you have to deal with scaling, failovers and fatal tolerance. A year ago, we moved to lambda.
Since we switched to lambda, it allowed application developers to focus on business logic while alleviating the low-level concerns to the platform providers. It makes testing so much easier since you don’t have to test multi threadings and fillovers.
The beauty of lambda is that you basically test at the unit, and as long the function does what it needs to do, then we release it to lambda, and that’s it.
Since switching to lambda, our infrastructure stack became much simpler. Now, it fully leverages AWS infrastructure.
Number of services: ~60 microservices
Shippable is a CI/CD platform. Most of our work is triggered by webhooks; it’s all externally connected to external systems.
From a macro perspective, there are two parts of the products:
Our testing is focused on 3 different phases.
Phase 1: Pure engineering tests, including code quality, static code analysis. We do a bunch of engineering tests. These are not customer-focused scenarios; these are basically unit tests. It doesn't care about business logic but rather coding practices. It happens on an ongoing basis, and nothing gets merged until this basic test is done.
Phase 2: Business tests or integration tests. We have a bunch of use case scenarios. We used to do it as part of the unit tests and we realized it’s impossible to do that with a distributed system because everything is changing at the same time, so it’s hard to test across the entire system.
If you want to run the test through the entire system, it would run for 8-12 hours. So, we segmented our scenarios into core and basic ones.
The first thing to do is to test core scenarios, checking for fundamental issues. Then we run basic tests, which cover most of the scenarios but not all of them (for example: not all operating systems), then we run tests on the whole system, which takes 8-12 hours.
Phase 3: Only before launch, we run a bunch of stress tests.
Most of our integration tests are workflows. Since we’re a node.js platform, we use Mocha to run all our tests. It goes through series of tests: switching things, different types of operating systems, images, etc. We have a list of well-defined static use cases that never change and no one is allowed to change them unless a new scenario is added.
If you start changing those, you don’t know what effects what, so you can’t create a baseline and run tests. You need to get predictable results from these tests.
We’re not really testing the UI component with our workflows.
Number of services: 50
We have over 50 microservices that perform different functionalities in different languages. We have APIs, Test libs, Spark clusters, Zookeepers, Kafka, Rabbit MQ, and they’re connected to each other.
We also have 7 different UIs in different languages.
50 different apps are running in containers. It’s a web-based system and everything can be scaled in its own way as well.
There are different levels: unit, integration and full function tests.
We have every one of these; microservices have unit tests.
Testing on individual function, every git commit push is tested. The way we deploy code is continuous deployment, so when we branch off, our master branch or push something on GitHub then CI takes and runs all our tests (unit tests inside that container).
After that, we deploy a fully functional integrated system for that microservice.
When we add new code, we build a new container, then test that container, then run that container like we would in production. This enables us to know at the smallest commit whether it functions at the unit level, integrates with other services, and is going to run properly in production.
Then we run an integration test: a broader test for how the things work all over the microservices. That happens on every commit.
Finally, there is a manual testing as well. This is to make sure it performs the function it’s intended from a product management point of view.
Our testing philosophy: Once we fix a bug, we should write a test around that so it never comes up again.
Advice:
Defining what you want to test is important. I made this mistake in the past where I wrote full test frameworks, without any input from other people, based on my assumptions that could be really wrong.
Tests need to be prioritized as well. Find the most important scenarios, keep the business logic in mind and test those things that serve these needs.
Number of services: 40
Codefresh is a CI/CD platform, which means we run hundreds thousands of customer workflows that basically compile and build and deploy docker containers. This is the main functionality our users are using.
Codefresh is a distributed, multi-cloud platform, that includes tens of different services. Part of them are related to user management, workflow management and different business logics and integration forms.
We’re executing the workflows on cloud-based services on Google and Amazon.
We’re a microservice-based platform with isolated functional units we can develop in parallel. The system should be able to recover itself to manage, even if some services are not functioning. We’re pushing production on a daily basis.
We have fully automated scenarios for every single service. Most of the effort is on testing the service from a unit test perspective.
Unit tests and API tests.
I run unit tests and end-to-end tests, trying to isolate as much as possible from the rest of the system. Kind of a smoke test, unintegration environment. We have an integration environment before the service is deployed and tested.
We continue testing our system in production. However, we invest time in validation before deploy but because of the intensity of the deployment process, it’s clear our production system should be:
Our testing philosophy:
Intensive unit testing and API testing for single services. Sanity/smoke test in an integration environment and gradual exposure on production. We’re trying to avoid UI tests because they’re less effective.
Number of services: ~300
Our system is a heavily distributed one, built from feature services. Originally, it was a big monolith system and we started taking out parts of it, transforming it into microservices.
Complexity is distributed into many small services. The problems coming from the monolithic system were eliminated but new problems emerged from the distributed system.
We’re avoiding long-calls in order to keep the architecture clean and simple. For processing a request, we use a maximum of 2-3 service calls and even when you call a service, it calls only one other service.
When you start our application, the frame of the app is served by the monolithic system and it also serves as an integration platform. Every page you load is an iframe that renders content from a service.
It’s a hybrid infrastructure; part of it is in our internal network and the rest is in the cloud (Heroku or Compose). In the last years, we started using Google Cloud’s Bigtable, BigQuery.
We use a lot of message queues. In order to make the system more resilient, we make the communication async using RabbitMQ with Compose.
We interviewed András Fincza in episode 16 of the Level-up Engineering podcast.
We follow the test pyramid, actively testing the system. We have unit tests, integration tests and functional tests.
In our domain, the following means:
One service has unit tests and also integration tests where the service can test its components, such as the database. It also goes further to end-to-end tests where the API is tested.
There are services where functional tests are run using Cypress and Kapibara.
We were fed up with testing that tests many services at the same time because it increased complexity, was expensive and didn’t provide that much value.
Our infrastructure is duplicated. Every service has a staging and production version. We’re planning to create a functional test framework that can be used to run tests on services in production.
The problem with the test pyramid is that as you go up and reach functional tests and the more the app is covered, the more the system becomes more rigid.
I work with a variety of clients, so I don't have one answer. Generally, they fall into 3 buckets:
First, the monoliths being modernised/rescued. The main one I'm working with here is a financial org, and so you see the main monolith, and we've broken out the bits that have scaling and change pressures. These are offline reconciliation and regulatory exposed pieces. Apart from that, everything is in the monolith, and we rely on the underlying platform to keep things running well.
Second, "web services lite.". I'm working with a large government body in the UK on rolling through a set of projects. So, 6 months on one, 6 months on the next as things are developed and then left mostly in maintenance.
They tend to be created as a fairly small set of services (up to around 10) that are functionally decomposed. They essentially look like small web services as many people have talked about before. Not a huge amount of thought is put into the data design, as a deterministic delivery process is more important than absolute velocity. This means picking a standardised pattern and sticking with it is better viewed by the organisation.
So, we've got two types of service: simple http with a swagger contract and a SEDA style message-driven service processing work queues. This is very much driven by cap ex budgets, leading to the need for deterministic project budgets. It really is more expensive to do it this way, but overall, it is less risky.
The last type of project I see are ones that are managed differently. They are Opex managed, having a team in place working on the same project for the long term. They tend to invest heavily in newer tech and approaches. Beyond a certain scale, they move to a data-driven approach, and look at systems such as Kafka, Kinesis and the like to manage flows of data around the system. This is my favourite type of system, but it requires a certain level of investment, and so scale to pull off successfully at the moment. As the tech commoditises, I expect this to become more common in the smaller end as well.
It all depends on the team.
For highly experienced teams, that have set up effective, well-defined software contracts and have ways of simulating them at test time (mocks, stubs, virtualisation, etc.), then testing in isolation gives lots of benefits.
For new teams, full stack testing is the best way to start off. For the government project above, due to the way budgets are managed, it’s always a fairly new team, and a new project, so it’s always full stack to get things working. Some technical services have been broken out with their own lifecycle, and they are tested independently for the most part now, but it’s not something I would expect to be generally adopted in that client.
Integration tests
For integration tests, most projects I'm working on are in nodejs/ typescript, so we use those for the majority of the integration tests as well.
Full stack performance is often missed when testing services in isolation. Often, issues appear in the no deterministic distributed interactions you get at high load. Hammering a system as part of your continuous delivery pipe is very helpful for gaining confidence in the system as a whole and shaking out those hard-to-find inconsistencies. JMeter is still the most common tool I see used for that.
As you can see, different distributed architectures require different approaches to testing. In most cases, you can’t test every aspect of your system due to the caused complexity and costs. What you can do is to handle risks and focus on the parts of the system that are critical for the business logic. You can create scenarios and focus your testing efforts to the most critical ones or you can focus on the most important services of your system, making sure it works as it’s intended.
About the author:
Tamas Torok is a marketer, helping tech companies to grow. He currently leads the marketing operations at Coding Sans and focuses on crafting high-quality, research-based content for engineering leaders. He started publishing the State of Software Development report and supports the growth of the Level-up Engineering podcast, dedicated to engineering leaders.