Using webhookd for automated deployments

Using webhookd for automated deployments
Photo by Anne Nygård / Unsplash

I recently updated the deployment workflow for my homepages and self-hosted services. In the process I've found webhookd, which now replaces my prior deployment workflow using SSH. Since this works quite nicely, I decided to share my solution with you.

Why not just use Kubernetes or docker swarm?

While platforms like these provide a lot more tooling for this kind of things, I like to keep my private service deployments pretty simple, as they only run on one node and instance anyway.

Infrastructure for my services in a nutshell

Foremost, let's talk about the infrastructure itself. All my services are grouped in namespaces (docker-compose file in a dedicated folder) which contains multiple services that are part of one docker network. This allows easy separation of networks based on the group they are living in. For example, the monitoring namespace contains Prometheus, Grafana, and a set of exporters that talk to each other via the docker network. The websites run dedicated and have no access to the monitoring containers.

keep it simple - Imgflip

To put it all together, traefik routes traffic based on docker labels on the containers. For that to work, they are also in the gateway network, where they can be accessed. The proxy itself is only reachable via CloudFlare Tunnel and prevents exposing the HTTP ports directly to the internet.

Old Workflow with SSH

I heavily rely on GitLab CI/CD. Every push to main builds a docker image using kaniko and pushes it to the GitLab Registry of the according project. When that succeeds, an update is triggered via SSH, where I simply pull the container using docker compose and replace the container with the newer version.

This has a few downsides:

  • A CI user, which can connect via SSH to the server is needed
  • Password / SSH Key needs to be stored on gitlab.com
  • SSH Connection from gitlab.com must be allowed
Risky GIFs | Tenor

I mitigated some of this already with fail2ban, limiting the CI users as much as possible and rotating the secret regularly. But after all, an SSH connection is mighty, especially as the user needs to access the docker socket via compose, where you probably already lost the battle if someone got access.

New Workflow with webhookd

So webhookd sparked my interest, which in a nutshell allows you to have any program executed via a webhook. Putting the heavy lifting directly on the server and away from gitlab.com.

This has quite a few benefits:

  • heavy lifting and privileges are only required on the server
  • one way communication
  • validation allows narrowing down abuse

Luckily, webhookd comes with a distrib image, which besides the service, also contains bash, curl, docker compose, docker etc. This can be used as a base and is actually quite well-made.

Building a custom image based on that

I prefer having a container with batteries included, without the need to mount scripts locally. This allows easy versioning and fits in quite well with my current Continuous Delivery setup.

Dockerfile

I ended up with the following Dockerfile:

FROM ncarlier/webhookd:1.19.0-distrib
USER root
# Make sure webhookd can use the docker socket
RUN delgroup ping \
    && addgroup --gid 999 docker \
    && addgroup webhookd docker \
    && mkdir /.docker \
    && chown webhookd:webhookd /.docker
RUN apk add --no-cache curl file zip
# Copy webhook scripts
ENV WHD_HOOK_SCRIPTS=/opt/webhookd/scripts
WORKDIR /opt/webhookd
COPY --chown=webhookd:webhookd --chmod=555 scripts/ scripts/

# Switch back to webhookd user
USER webhookd
ENTRYPOINT ["webhookd"]

As base image, it uses a pinned distrib version of webhookd. To allow the scripts to use the docker socket a few adjustments are necessary. The Dockerfile makes sure the docker group exists and adds the service user of webhookd to it.

Once that is done, the latest update for curl as well as file and zip, which are used in some scripts, are installed. The files are located in the same repository as the Dockerfile, so every build & script change produces a new image that can be used.

It Just Works | Know Your Meme

Scripting time

I will go step by step through the deployment script and provide the full version at the end of the section in case you also want to use it.

Define a few helpers

The service can set the HTTP status based on the exit code of a script, where the logic is HTTP Status = exit code + 300.

To make this is easier to use, I created a helper that takes over the conversion:

# @description Exit with a message and HTTP status exit code for webhookd
# @arg $1 HTTP Status to exit with
# @arg $2 Message to print before exiting
# @exit ($1 - 300) HTTP status for webhookd translated to exit code
exit_with_message() {
  local http_status; http_status="$1"
  local message; message="$2"

  echo "$message" >&2
  exit "$((http_status-300))"
}

All parameters are provided as environment variables, so we can verify them pretty easily:

# @description Require a parameter to be set
# @arg $1 Name of the required environment variable
# @exit 100 If the parameter is missing
# @stderr Error message, if the parameter is missing
require_parameter() {
  local name; name="$1"

  if [ -z "${!name}" ];
  then
    exit_with_message 400 "Missing parameter ${name}"
  fi
}

Now we just need some fancy output and we can start with the actual implementation:

# @description Output to stderr for webhookd
# @stderr Message
output() {
    echo "$@" >&2
}

# @description Print spacer between sections
# @stderr Spacer output
print_spacer() {
  output " "
}

# @description Print heading for a logical section
# @stderr Heading for the section
print_section() {
  local heading; heading="$1"
  local heading_length; heading_length=${#heading}
  local pad_count_end; pad_count_end=$((30-heading_length))
  output "$(printf '=%.0s' {1..5}) $(printf "%-10s" "${heading}") $(printf '=%.0s' $(seq "${pad_count_end}"))"
}

Parameter validation

We need the information about which service to update and which image & the authentication:

require_parameter "namespace"
require_parameter "service"
require_parameter "docker_username"
require_parameter "docker_password"
require_parameter "docker_image"

Login

To pull the image from the GitLab Container registry, we need authentication information:

print_section "Login using docker CLI"
# shellcheck disable=SC2154
echo "${docker_password}" | docker login registry.gitlab.com --username "${docker_username}" --password-stdin > /dev/null  || exit_with_message 401 "Invalid registry authentication"
print_spacer

Validate the namespace exists

As mentioned before, the namespace is the logical group for services, so we need to validate that it exists first:

print_section "Check Namespace is valid"
# shellcheck disable=SC2154
cd "/opt/containers/${namespace}" 2>/dev/null || exit_with_message 400 "Namespace ${namespace} does not exist"
output "Valid."
print_spacer
💡
Each namespace that is updatable is mounted read-only into the container under the same path as on the host. It must be the same, as docker compose uses it as metadata and maps the running container to a compose file.

Pull the image

After we are logged in and are certain we can pull the image, we do so:

print_section "Pull image"
# shellcheck disable=SC2154
docker pull "${docker_image}" >&2  || exit_with_message 500 "Could not download image"
print_spacer

Recreate the service

Now that the new version is pulled, we can recreate the container with the latest image:

print_section "Update container"
# shellcheck disable=SC2154
docker compose --progress plain  up  -d "${service}" || exit_with_message 500 "Could not start up containers"
print_spacer

Putting it all together

Wrap it all up in one script:

#!/usr/bin/env bash
export BUILDKIT_PROGRESS=plain

# @description Exit with a message and HTTP status exit code for webhookd
# @arg $1 HTTP Status to exit with
# @arg $2 Message to print before exiting
# @exit ($1 - 300) HTTP status for webhookd translated to exit code
exit_with_message() {
  local http_status; http_status="$1"
  local message; message="$2"

  echo "$message" >&2
  exit "$((http_status-300))"
}

# @description Require a parameter to be set
# @arg $1 Name of the required environment variable
# @exit 100 If the parameter is missing
# @stderr Error message, if the parameter is missing
require_parameter() {
  local name; name="$1"

  if [ -z "${!name}" ];
  then
    exit_with_message 400 "Missing parameter ${name}"
  fi
}

# @description Output to stderr for webhookd
# @stderr Message
output() {
    echo "$@" >&2
}

# @description Print spacer between sections
# @stderr Spacer output
print_spacer() {
  output " "
}

# @description Print heading for a logical section
# @stderr Heading for the section
print_section() {
  local heading; heading="$1"
  local heading_length; heading_length=${#heading}
  local pad_count_end; pad_count_end=$((30-heading_length))
  output "$(printf '=%.0s' {1..5}) $(printf "%-10s" "${heading}") $(printf '=%.0s' $(seq "${pad_count_end}"))"
}

require_parameter "namespace"
require_parameter "service"
require_parameter "docker_username"
require_parameter "docker_password"
require_parameter "docker_image"

print_section "Login using docker CLI"
# shellcheck disable=SC2154
echo "${docker_password}" | docker login registry.gitlab.com --username "${docker_username}" --password-stdin > /dev/null  || exit_with_message 401 "Invalid registry authentication"
print_spacer

print_section "Check Namespace is valid"
# shellcheck disable=SC2154
cd "/services/${namespace}" 2>/dev/null || exit_with_message 400 "Namespace ${namespace} does not exist"
output "Valid."
print_spacer

print_section "Pull image"
# shellcheck disable=SC2154
docker pull "${docker_image}" >&2  || exit_with_message 500 "Could not download image"
print_spacer

print_section "Update container"
# shellcheck disable=SC2154
docker compose --progress plain  up  -d "${service}" || exit_with_message 500 "Could not start up containers"
print_spacer
Leonardo Dicaprio Cheers Meme - Imgflip

Triggering a deployment

From within GitLab CI, we can now easily trigger a deployment:

curl <webhood-instance>/<deployment-script-name>
    -sS
    -XPOST
    --header "X-Hook-Mode: buffered"
    --header "Authorization: <your auth>"
    --header "Content-Type: application/x-www-form-urlencoded"
    --fail
    -d "namespace=${SERVICE_NAMESPACE}"
    -d "service=${SERVICE_NAME}"
    -d "docker_username=${REGISTRY_USERNAME}"
    -d "docker_password=${REGISTRY_PASSWORD}"
    -d "docker_image=${REGISTRY_IMAGE}"

Making sure the scripts work fine before deploying them

To verify all scripts work fine before deploying a changed version, you can utilize bats to test the scripts.

Test it Now or Regret It Later…. In this blog post we will talk about… | by  Miguel Mendez | Yik Yak Engineering | Medium

For example, this is one of the tests that I use to verify docker compose failures are handled properly:

@test "docker compose up failure is handled properly" {
    docker() {
        if [[ "$1" == "login" ]] || [[ "$1" == "pull" ]];
        then
            return 0
        else
            return 1
        fi
    }
    cd() {
        return 0
    }
    export -f docker
    export -f cd
    export namespace="1foo"
    export service="bar"
    export docker_username="user"
    export docker_password="password"
    export docker_image="image"

    run "${BATS_SCRIPTS_DIR}/deploy-service.sh"

    is_http_exit 500
    output_contains Could not start up containers
}

Pitfalls with webhoodk in the beginning

Automated tests are crucial

Without automated tests for the scripts, it quickly transforms into a very annoying and nervous procedure to roll out changes to it. I can only strongly recommend writing bats tests and running them before each webhookd deployment in the pipeline.

Timeouts for longer pulls

By default, scripts have a timeout of 10s, this can be configured via the WHD_HOOK_TIMEOUT environment variable. In my case, 120s turns out to be the longest it takes for a complete base image swap for a service.

Streamed response is nice to look at but lets the execution fail subtly

By default, webhookd streams the script output, but after the headers are sent the HTTP status can't change, so you might see “exit with status xxx” without a failure. Which makes a lot of sense, but is nothing you want to have in your pipelines as they wont fail this way when an error occurs. So make sure to always set the header X-Hook-Mode: buffered. Retrospectively, it is quite obvious but was a hassle to troubleshoot (rtfm would have helped), only to find out it works as intended.

RTFM - InstallGentoo Wiki

Only support for basic auth

I prefer having a few options for authentication, but I simply overcame this with a “traefik” middleware that adds authentication in front of the service.

So many possibilities

You can not only use this for deployments but any other automation tasks for your personal servers. I would not recommend using it for enterprise though, where I would rather rely on either an immutable host or some Infrastructure as Code tool like “Ansible” to handle this.

It is quite stable and effortless to extend with custom scripts. I have used it for all my deployments and CDN uploads for a few weeks now. It works like a charm and is pretty solid, while not taking up more than a few MB of RAM for the service. It retired the CI user and drops the necessity for storing the SSH key on gitlab.com.