Container images, commonly referred to as docker images, are everywhere nowadays, from the developers' workstation to the data center or even the cloud. It's easy to build those, but also easy to build something that's not ideal.
A few useful tips for making your container images smaller, more secure and easier to maintain.
Avoid running as root
This sounds pretty straightforward, right? Root is the user with the most privileges on Linux systems, which all containers are at the end of the day. At this point you might think, but It's just inside a sandbox, so it's isolated. This is true, but your container will probably talk to the outside world at some point. For example, when you mount a volume or access a NFS share.
Almost all base images default to the root user. Many applications out there simply run with superuser privileges. From node apps, over python to the enterprise Spring Boot application.
Avoiding that is pretty simple. Adding an unprivileged user, utilizing the
USER command to run as non-root. Seeing it in practice:
RUN addgroup -g 1002 app && \
adduser -D -u 1002 -G app app
All following actions and the entry point will run as the user app. No more root, no easy privilege escalation.
Remove unnecessary packages
When you install packages, be careful what you install. Only install libraries actually used by your application. Avoid including utilities that you might need for debugging, like a text editor.
This not only keeps your image small, but also reduces the attack surface. Every package and library you don't include can't be abused. Pretty intuitive, right?
Multi-stage builds allow you to separate your build into multiple containers. This, for example, is useful when you don't need the build tool chain to run your actual images, which usually is the case.
Let's take a go app, for example:
COPY app.go ./
RUN CGO_ENABLED=0 go build -a -installsuffix cgo -o app .
RUN apk --no-cache add ca-certificates
COPY --from=0 /build/app ./
The go build runs inside the golang image, providing all the compiler infrastructure. In the end, we use a slim alpine image to run the thing. And yes, this example misses running as non-root and much more. Furthermore, the container has a shell and a dozen packages you will never need. That's where distroless can help, so read on. :)
Use distroless images where possible
Distroless is a bit of a scam, like serverless. You still need servers to run the code, as you require a distribution to install packages and so on. Of course, you will need a distribution to run certain things. But it is pretty slimmed down, so you don't realize you use a full-blown distro. The most prominent example is GoogleContainerTools/distroless. It uses Debian under the hood, without apt, or apt-get, so you can't install packages inside the container with ease.
Without the possibility to install packages, it is already a bit better. It also just includes the libraries needed to run the application. There are also prebuilt versions for node, python, and other popular ecosystems, making it work out of the box for most common applications.
It doesn't always need to be a Dockerfile
Wait, what? You are recommending me to not write a Dockerfile? How do I get an image to run my stuff?
Cloud Native Buildpacks provide exactly that. You can build docker images that are deterministic by default, with sane defaults. Not running as root by default, no shell included, only the files your application needs to run.
No extra files, no potential bloat leading to a smaller attack surface.
For example, building a go application with pack:
pack build my-app \
--buildpack paketo-buildpacks/go \
Utilizing these command, you can build a reproducible go image. You get the minimal set of libraries needed to run the app. No need to fiddle around with privileges, lib installs and what not.
Use layers wisely
Layers in docker are something very tricky. For example, each
RUN creates a new layer. So if you don't combine commands in a smart way, you will end up with a ton of unnecessary layers. That's a small thing that can lead to a multiple times bigger image than combining commands.
Every so often, that seems a bit inconvenient to have giant command chains. But unfortunately, that's the only way we can avoid creating countless layers.
Combining commands also helps you to improve cache utilization. For example, in a classical node app, your code will change way more typically than your dependencies. It makes sense to ensure the cache reflects exactly that.
COPY package.json package-lock.json .
RUN npm install --only=production
COPY src/ src/
Every time the package file or lock changes, the layer is invalidated, leading to a subsequent NPM install. If only the src folder with your source code has changed, it will only copy over your code, reusing the dependencies layer. Only subsequent commands will be executed again, saving you a lot of build time.
These are some tiny improvements to your images with a giant effect. They will be more secure and also smaller. It just takes a few minutes for you, but saves a lot of time pulling images in production. Attackers will have a harder time abusing your system, with fewer privileges and less tooling available.