Building secure python container images for production

Building secure python container images for production
Photo by Marco Bianchetti / Unsplash

I love container technology and wrote a while ago about Small Things for Building Better Container Images. Back then I also mentioned distroless images, but now I encountered other use cases where they simply did not work well, Python applications being one of them.

The problem with distroless for Python images

Distroless is not really distribution-less, as they are based on Debian, and just stripped down to the core libraries and a few commonly used packages.

For static builds that require glibc, SSL etc., distroless is great. It works like a charm. The relatively old base is negotiable and is outweighed by the ease of use.

The update is That there is no update - Conspiracy Keanu Meme Generator

Unfortunately, distroless is still in an experimental phase for Python and does not support the latest Python versions. This makes it Impossible in most cases to use it for modern python applications.

Wolfi powered by Chainguard — The Perfect Fit

Wolfi is the first community Linux (un)distribution declaratively designed to create a secure base layer for your containers!

That's the first line on their GitHub profile, a bold statement. So how is this different from just using distroless containers?

Firstly, they don't rely on a kernel in Wolfi by design, which is not needed as the host OS kernel is used via the container runtime anyway. This not only slims down the images, but also reduces the amount of software in the container.

Its Beautiful GIFs | Tenor

The biggest selling point for me is that, unlike Alpine, they use glibc instead of musl. While this may not be a big deal for many users out there, it is for native dependencies in Python.

Alpine comes with musl, which means that many pip installs require a build from source, as pre-built wheels are often only provided for glibc, or simply don't build at all. This led to extra build times, annoying bugs, and in the worst case, not being supported at all. With Wolfi, this problem has disappeared.

And amazingly, Chainguard constantly updates the images, fixing security flaws sooner than alpine or distroless ever could. So every rebuild means you get a really fresh base with all the security patches out there.

Things to consider

Version tags for paying customers only

If you are willing to pay for it, Chainguard provides ready-to-use base images for common ecosystems like Python, Java, etc. for all versions. At work, I would rather not convince people to pay for it. For private projects, it does not really make sense for me to pay for my little helper services and open-source projects. So I take matters into my hands and build them myself. This also gives me more control over the build process. We will also go into more details below, but it is a lot easier than you think.

apk, but different

apk is the package manager used by Alpine, supported by many packages from various publishers.

The wolfi-base image comes with apk by default, which allows you to install packages. Although you use the same package manager to install packages, it does not use the full range of Alpine packages as the index is different. Instead, it will only install what is supported by Wolfi OS.

If you miss a package from their index, you can always go the extra mile and contribute it. However, I have not come across this case at all because there are quite a massive number of commonly used packages already.

The Interview: We Are Same-Same, But Different

Having a package manager as part of the image also means that it could potentially be abused. In addition, this adds a few megabytes to our image. However, I still prefer it over using apko. This tool would only install what it is told to install in a declarative and deterministic way. Still, I stick with Docker files because everyone is used to them. In addition, they are well-supported by build tools, existing CI infrastructure, etc. This comes at the cost of images not being fully deterministic, for the benefit of being able to reuse existing tools for better acceptance out of the box.

Building the image

Let's say you have a straightforward application, with requirements maintained by pip and specified in our requirements.txt.

The Dockerfile in this case looks like this:

# Specify the Python version to use here across build & runtime image
ARG python_version=3.12

FROM chainguard/wolfi-base as build 
WORKDIR /build

# Make it explicit python_version arg will be used
ARG python_version 

# Install python and pip in build container
RUN apk add --no-cache python-${python_version} py${python_version}-pip

# Create venv
RUN python -m venv venv

# Copy over dependency relevant files
COPY requirements.txt /requirements.txt 

# Install dependencies into venv
RUN venv/bin/pip install -r /requirements.txt

# This will be our runtime image
FROM chainguard/wolfi-base 

# Make it explicit python_version arg will be used
ARG python_version

# Install python in the runtime image
RUN apk add --no-cache python-${python_version} 
WORKDIR /app 

# Copy over dependencies, this only needs to be done when dependency related configuration changed
COPY --from=build /build/venv venv 

# Copy over app sources
COPY . . 

# dont run as root, so one also can't install packages and the container runs unprivileged
USER nobody 
ENTRYPOINT ["/app/venv/bin/python", "main.py"]

This results in a small, secure runtime image, with Wolfi as the base, topped by a Python installation of your choice. Thanks to using glibc instead of musl and the up-to-date base layer.

Since the container also contains a sh shell through busybox, os.system calls etc. also work right away.

Magic Meme Generator

Great choice for Python and beyond

Wolfi has become my preferred choice as base image for containerizing Python, PHP, and Node.js applications and tools. It provides a distro-free experience and is more efficient and up-to-date than Alpine, while providing almost the same level of support for native stuff like the Ubuntu and Debian based images.

Recently, I also introduced it for my php-app base image, resulting in a reduced size of my images down from ~800 MB to ~100 MB, which previously used the slim (Debian base) version of the official PHP image. The services run smoothly just like before, but are just a lot more compact and contain a few CVEs instead of hundreds with the latest build.