Everything as code - Making your life easier
If you work in, IT is almost inevitable to encounter configuration or code at some point. They are the heart of almost any system we use daily. It allows us to define behavior, configure systems. But still, some of those actions are performed manually over and over again. This is where everything as code comes in to save the day.
So what is Everything as code (EaC) exactly?
Everything as Code is a way of making software and managing resources. The code representation of resources makes it easier for developers to:
- Audit changes
- Improve consistency
- Scale resources
- Transfer settings from one environment to another
Taken literally, EaC is an ideal state where every part of the software lifecycle is code.
When we talk about code here, it can, but does not have to be actual code in terms of a programming language. This ranges from simple YAML files, over custom DSL, to simply using existing programming languages and configuring it via program code.
There is already a ton of Everything as Code approaches:
... as Code | Examples for implementations |
---|---|
Infrastructure | Terraform, AWS CloudFormation, Pulumi |
Configuration | Ansible, Chef, Puppet |
Security | Opa, HashiCorp Sentinel |
Pipeline | Gitlab CI/CD, GitHub Actions, Jenkinsfile |
Monitoring | Grafana, Gatus, Prometheus |
Network | Cilium, Calico |
Documentation | mkdocs, docusaurus |
Without those approaches, DevOps would be a lot harder and less fun. Chances are you already used either directly or indirectly one of those.
Benefits
Everything as code comes with a few giant benefits, including:
- Consistency: Your configuration will always be the same no matter to which target you apply it. This reduces human error and improves reliability of systems, being able to roll back quickly to exactly how things were before.
- Scalability: If you want to scale up servers, configuration, or even user permissions or firewall set-ups, the process will always stay the same, and you can easily apply the same configuration to any number of targets.
- Portability: You can share your configuration easily with others, sharing knowledge or even replicate environments in other contexts
- Auditability: You can audit your systems easily, with automated compliance checks, peer reviews, automated scans or even staging environments to test how a change will behave
- Disaster Recovery: In case your whole resource goes down beyond repairing or restoring backups, you are no longer faced with having to configure everything manually again. Simply apply the configuration and be running again in minutes or hours, not weeks or months.
Real-world examples
I encountered quite some real-world examples during my job, where Everything as Code has made my life easier. I want to focus on two cases here, without mentioning the company or any confidential details.
Cluster in inconsistent states beyond repair
I was part of the platform team back then, and the Logging System Cluster went completely wild, the state was so corrupted that we could neither apply backups as they would haven taken too long or fix it. As basically the entire company was dependent on it, we had to act fast. Luckily, we could do a quite radical step: We deleted everything.
This meant deleting all VMs, including their configuration, and starting from scratch. The virtual machines and network resources were defined in Terraform, 10 minutes, and they were recreated. For the configuration, we used Ansible and a CI pipeline approach. Clicking on “Run pipeline” and after roughly 20 minutes the configuration was back, and before we could even log in, logs were coming in already and users had begun to use the Web UI again.
Without infrastructure and configuration as code, we would have had a really hard time setting up in total 10 VMs, and countless configurations for various sources and teams. It turned out we still had some backups from a week ago that were not corrupted and could simply put them into the (again running) systems. But that could be done with peace of mind, and without blocking developers from using our logging infrastructure.
New instance for a legacy monolith
Everyone loves a good legacy monolith, with almost no documentation about the set-up. The servers had been set up ages ago, by someone who left the company, and I could no longer ask. The requirement was clear: We need to scale the old monolith to be able to serve the workload. Horizontally scaling the servers was no longer an option, we had to do it vertically, we needed 2 more instances.
Every server was configured completely differently in some aspects, which also led to some funny bugs in production. So to get this into a consistent state it was a bit of reverse engineering required.
- Log in to each server, check the configuration and diff it to the other instances.
- Find a common configuration that works for all and roll it out to all servers.
- Apply the configuration with Ansible, while keeping the old configurations on the host as backups
- Use Ansible to rollback if something goes wrong.
This was mostly rinse and repeat for every little configuration detail. Once we had the configuration there, we routed new traffic incrementally to the new instances.
It worked quite well, and it did not take too long until we had to yet add another instance. This time it was as easy as ordering a new server from the provider, adding it to our Ansible inventory and simply applying the configuration.
Instead of spending weeks with set up and always being scared to have forgotten something crucial, the new server was ready almost the same day we got it.
Onboarding new teams to AWS
When we started using AWS, every account was set up manually, there was a handy little guide in Confluence, step by step. Plenty of things were done wrong by various teams, including naming buckets, that were intended to store terraform states later on. The result was a madness of inconsistency. Some accounts lacking basic security measurements, that were required by our internal policies even.
Furthermore, this process took way to much time, with perspectively wanting to enable each team to have a dedicated AWS account inside our organization for each environment.
To overcome this, we introduced a common terraform module that was applied to every new account, containing the basic set of resources required to get started. The result was being able to bootstrap new accounts easily and fast. These accounts were not only more consistent, but also ready in a few hours after approval for budgets.
Ultimately, this allowed each engineering team to take more end to end responsibility without overwhelming them with base setups.
I am a small shop or one man IT, this all seems like overkill to me!
I hear this regularly from friends and people that work in smaller companies, saying the initial effort for learning it would be simply too much. At first, it is all those fancy technologies and new terms. But It's something that's ultimately worth it because it allows you to move faster, with working less at the end of the day.
I can only encourage everyone in that situation to take a look at the details, before being scared away by the technology and phrases. It is really easier than it seems.
Let's get into a few real-world examples here, how Everything as Code can improve your (work)life.
Creating new users in Active Directory
I witnessed this way to often, the process is:
- Create a Jira ticket with request
- Wait for the completely overloaded IT department to work on it
- Someone just copies a reference user, which either has not enough or even worse, far too many permissions
- Requester gets feedback that the account is there
- The created user either cannot log in at all or has some issues with permissions
- Rinse and repeat.
This happens for every hire, for every member that switches teams, even for small companies this happens more often than you might think.
It adds up to a load of extra work and annoyance for everyone involved, so instead of this, a process with Everything as Code could look like this:
- Create a Jira ticket with request
- IT gets a notification with priority accordingly
- They check for existing users, that are already defined in code so they can either reuse existing permission set or defining a new one, adjusting accordingly
- IT commits this change to their repository
- CI runs and automatically creates the user, updates the ticket and sends out a mail to the requester
The degree of automation here can vary and also might increase over time as there is need. But this removes a lot of mental load on clicking around in UIs, checking permissions that are sometimes rather cryptic. Compared to the code approach where one can leave comments, copy & paste text and automate all the things.
New firewall rules
I am unable to count the days I waited for elementary firewall rules. When one wants to create a new network they will have to create a ton of tickets, which than are processed by IT, and the process might look like this:
- Create ticket with Network Engineers
- They log in to the Web UI of the firewall and create a new rule set
- They add each rule manually
- They apply and ask the requester to test
- Rinse and repeat for all rules
This takes time and leads to countless human errors.
A more automated process looks like this and experienced developers could even contribute to this via merge request, just requiring expertise from the network engineers for verification:
- Create ticket with Network Engineers
- They add the new network to the configuration file for networks, this will create the base set of resources automatically
- They add the rules for the network to a YAML file
- They commit their changes to the network repository
- After applying the configuration, a network probe automatically verifies the rules work as intended.
Again, the degree of automation here can vary and increase over time. At some point, developers might even be able to create merge requests directly in the project, and network engineers have a set of automated checks to run and just have a final look and hit Approve.
So what are you waiting for?
While there is an initial learning curve, the tools and syntax are rather simple for most of them, and the benefits outweigh this by far. Considering the amount of time, you will save on avoiding human errors, clicking the wrong button or having to do the same things over and over again.
Once you put your resources in code, you get entirely new possibilities of automating things. Saving even more times on the way and catching typical errors early on.
How to adopt Everything as Code
- Start Small: Begin with a single component, such as infrastructure or policies.
- Use Version Control: Store configurations and scripts in a version-controlled repository.
- Automate Testing: Validate configurations and processes through automated tests.
- Iterate: Continuously refine and expand your EaC implementation.
Beyond tech
Everything as code can be used by many more things, and you might just want to get creative with it.
For example, I manage my recipes using GitHub pages, defining markdown files with front matter for ingredients and instructions, adding an image which results in a fancy recipe page.
So the following markdown results in a recipe page:
While this is not only a neat thing to have, it also allows me to version control my recipe, seeing when I did which changes to a recipe. In case things go wrong, and I realize this way later, I can simply go back to any previous revision.
Besides easier management, this also allows consistent recipes and changes to layout or features are easily applied for all the recipes in my collection.
What I didn't tell you until now is that this is not a one-off solution, but something I forked from clarklab/chowdown. Used by many people out there, managing their recipes with the same brilliant idea, while being able to customize as they please.
Conclusion
This technology will stay for a while, especially with text-based AI agents taking over more and more tasks. While currently most of these workflows are managed by humans, there is increased potential by automating even more of it. And it's just easier to automate API calls, and interact with files, than clicking around in UIs.
Everything as code allows moving faster, making less mistakes on the way and focusing on what really matters, the desired outcome. It's not without reason that is embraced by all the big tech companies out there. With technology becoming easier to learn and integrate by the day, it is no longer a big player only field.
Embracing everything as code helps to tear down silos and removing load from specialists for the areas where they really excel. Consulting others and troubleshooting issues.
I am curious, what is your experience with Everything as Code? — Do you already use it or what challenges do you face? — Feel free to leave a comment here or write me via mail or on LinkedIn.