Top 10 things to do for building better pipelines
CI/CD Pipelines are everywhere nowadays. There is a lot you can do, and a lot you should think about. This is the compilation of the top 10 things from my experience of building pipelines professionally for almost 5 years now.
Pin versions
Pinning your versions is the most crucial part of building solid pipelines. This means making the version as specific as required and possible.
The most prominent use case for this are container image tags. Most tools out there use container images to run your pipelines. If you pin them very specific, you make sure the environment used for build is the same, no matter if it is the first time the build is run or in a few years.
The same applies for CLI tools you use in your pipelines. They tend to have breaking changes over time. And nothing is worse than having a pipeline failing without any change from your side.
Tl;dr
Use lock files, specific versions in installs, avoid using dynamic versions.
Examples
- Instead of using docker image
python:3
usepython:3.11.4
- Instead of running
npx package
usenpx package@version
Use descriptive names
When you define names for your workflows, steps, job names, workflows etc. make sure you use a self-explanatory name. You should have a rough idea what's going on when you just see a pipeline graph.
This also means avoiding numbers in your job names and workflows, like “build2” or “test-all”.
Good names tend to be short and pragmatic. If you can get a rough idea about what's going on, you are on a good path.
Tl;dr
Use self-explanatory names, avoid cryptic numbers and so on
Examples
- Job-Names: build-frontend, build-frontend-shared-library
- Workflow-Name: frontend, application
Utilize caching
The fastest way to do something is not doing it. Every download you don't have to do or rebuild you have not to perform saves time. As we all know, time is money, and less time spent means less costs running your pipelines, less waiting for developers.
Every CI tool out there provides caching. Make sure to only cache what you need, and make sure the caching is actually faster than performing the action. Don't trust your gut feeling but try it out and measure actual timings.
When using caching, also make sure to have a proper cache invalidation. Best practice is utilizing checksums on directories or lock files, specifying your dependencies. A false negative match is better than a false positive in this case.
Tl;dr
Use caching to avoid having to perform time-intensive operations. Always measure to make sure caching is faster than performing the action. Make sure your cache gets invalidated when necessary.
Examples
- Cache
node_modules
based onpackage-lock.json
- Cache
.m2
folder based onpom.xml
Integrate test results early on
Most CI/CD tools out there provide you with test insights, visualization etc. Make sure to integrate them early on to make testing visible to developers. Nothing motivates more than a fancy dashboard with green ticks!
If there are no real tests in the project yet, still make sure to integrate analysis and visualization of test results.
Tl;dr
Test results displayed in a fancy way motivate, integrate them early.
Examples
- Use JUnit compatible output format and supply that to your CI provider test results as most of them use this as a kind of pseudo standard
Make it easy to use and maintain
Your pipeline should be easy to use by developers with zero knowledge about it. Following the branching strategy you chose, the CI should just work as the workflow used by git.
It's not always easy to find the balance between making it work out of the box and keeping it easy to maintain. The key is making jobs flexible enough, making a mix of parameters and environment awareness decide their faith. This way it keeps the pipeline simple and easy to maintain while being able to build complex pipelines for your use case that just work.
Tl;dr
Use a mix of parameters and environmental awareness to keep jobs flexible, still being able to build complex pipelines with ease.
Examples
- If you use feature-branches and main-branch, make sure tests and build run on all branches, deploy on the main branch with a deployment to production
Use prebuilt images
If I got a dollar for every wasted minute spent on installing packages in every damn job execution, I would be a millionaire by now.
If you know you will need a package for a job running every 5 minutes, think about including these tools in a dedicated build image. You can either build specialized images containing all the tools you will potentially need. In other cases, it might make sense to use relatively small ones, that are suited for a specific set of use cases. Here it is always a trade-off of maintenance, network load and organizational layouts.
This keeps execution times low and reduces build failures due to external network conditions, offline mirrors and what not. Save time investigating temporary problems and looking at the installation progress bar.
Tl;dr
Include tools in your build environment that you will need, instead of installing them on demand every time.
Examples
- Instead of installing
curl
and theopenjdk
in every build, create a docker image withcurl
andopenjdk
preinstalled
Avoid the root user
Running as root, the most super privilege user out there on your *nix machine. What could go wrong? Not only on your application servers and container applications, it is a bad idea to rely on running as root.
Build tools can also potentially compromise your system, making your life terrible. Not relying on root and running as a normal user makes you think more about privileges, and proper set up. It avoids using tools not designed to be used in that way, and much more.
Tl;dr
Don't use root in your builds, it helps to prevent compromise of your systems and empowers best practices.
Examples
- Create a dedicated build user in your docker image, granting sudo permissions, so you can still install packages if required without exposing it by default
Keep your workflows lean
Often, the real world out there is really complicated, and that means your pipeline can get very complicated as well. When it gets to much, take a step back, refactor, make things easier. Keeping things lean makes your life easier. It makes it easy to add new features, jobs, as well as making your developers feel more confident making small changes and working with it.
Avoid overcomplicating jobs with too many parameters and environment variables. Often, a dedicated job with reusable parts is easier than having an environment variable hell for a single script or job.
Many CI tools provide ways to move reusable parts out of your pipeline, making them includable into multiple projects. With GitLab CI, for example, you can use includes to extend your pipeline with snippets. CircleCI provides entire jobs, steps, and executors with so-called orbs.
Making parts of your pipeline just a call to an include or command, reduces complexity in your pipeline, as you can just rely on the functionality provided.
Tl;dr
Move common parts out of your pipeline to make them reusable. Avoid having a parameter hell, sometimes duplication is better.
Examples
- Create a reusable snippet to run deployments to k8s, include it with a parameter for service name, namespace, and application name into your deployment job
- Add a stage parameter to your deployment job, deploying to a different environment based on that, use the same jobs for all stages, supplying the stage parameter as the only difference
Make your builds reproducible
You are now pinning your dependencies, bundling tools directly in your build environment. If you've followed all 9 tips, your environment is consistent with every build. When you run a pipeline now, it will have the same environment as on the day it was created.
The only thing that's missing is the application build itself. The idea is pretty simple: Same input, same output. Make sure all your build artifacts don't change regardless of when or how often you execute your builds.
Use build packs instead of Dockerfile-based images wherever possible. Or use tools like deterministic-zip to get consistent zip files when you want to store a number of loosely coupled files.
Tl;dr
Not only your build env should be reproducible, but also the artifacts and compilation itself.
Examples
- Use
deterministic-zip
instead ofzip
- Remove metadata from files before storing them
- Reference external files with full qualified paths from your project root
- Avoid using dynamic data to be pulled from external systems
Have fun and measure!
These things worked pretty well in the past for me and will help you build pipelines you will love to maintain, and developers will love to use. Reduce failed builds due to external dependencies, keep your pipelines easy to maintain while adapting to real-world scenarios.
Always measure changes you did and see their real impact, sometimes a speed-up may just be a subjective manner which didn't change the actual time.
If you are using GitLab I can warmly recommend the gitlab-exporter from Andreas Kunze, one of my colleagues at DeepL. We use the exporter daily and it helped us measure a lot and optimize based on facts rather than gut feeling.