Crafting the Perfect Pipeline in GitLab

Introduction

When using a traditional single-server CI, fast/incremental builds are simple. Each time the CI builds your pipeline, it is able to use the same workspace, preserving the state from the previous build. But what if you are using Kubernetes runners to execute your pipelines? These can be spun up or down on demand, and each pipeline execution is not guaranteed to be executed on the same runner, same machine, or even in the same country as the previous execution.

In this case the most basic configuration, that just clones and compiles the code, possibly pulling down build dependencies first, will have to always do a full rebuild. For a small project that compiles quickly this is probably fine, but for a larger project, that may take a while to compile, you lose the quick feedback that you might otherwise have. Using the features of GitLab CI is it possible to do better?

Docker

The first step to tackle is the build dependencies. Compilers, third-party libraries, and other tools that need to be in place before you can even begin to compile your project itself. We do not want to have to install the dependencies every time we want to build the project, and since we cannot depend on them being installed globally on any runner the most natural solution is to use a container. Since you might also want to use a container for deployment, the simplest approach is to just have a single Docker container that installs the compile-time and run-time dependencies, copies in the code, builds the code, then this container is used to execute the code. The downside to this is that any single change to the code will cause the entire copy layer to be executed again, resulting in a full rebuild of the code. The resulting deployment container will also be larger than necessary, as it will contain all the compile-time dependencies which are not required at run-time.

A better solution is to use two separate containers. A builder container, which includes the compile-time dependencies, and a runner container, which only includes the run-time dependencies. The builder container, rather than copying the code when the container is built, will mount the code and build output as a volume when it is launched. This way the state of the build-system, as well as the file-modification times of the source-code, can be preserved between builds, allowing for incremental building. The process of building then becomes first building the builder container, then running the builder container to build the project, before then building the runner container which utilises the build output from the previous stage.

Docker & GitLab CI

So, how do we achieve this within GitLab CI? First, while it is not entirely necessary to split the build into several stages, it is generally a good idea to do so, as this will provide better feedback regarding which particular stages have failed. More stages also allows for better load-balancing of the stages between runners when multiple pipelines are running simultaneously. So, we will define stages for building the builder container, for building the project itself, and for building the runner container.

GitLab includes its own Docker registry which we can use for storing the images between stages, or you could use an external Docker registry if preferred. Either way we must log in to the preferred registry, as well as setting up our stages and some variables we will use later to tag the images, all within the .gitlab-ci.yml file that is committed to version-control along with our source-code:

variables:
  BUILDER_IMAGE_TAG: ${CI_REGISTRY_IMAGE}/builder:${CI_COMMIT_REF_SLUG}
  BUILDER_IMAGE_HASH_TAG: ${CI_REGISTRY_IMAGE}/builder:${CI_COMMIT_SHA}
  IMAGE_TAG: ${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_SLUG}
 
  before_script:
  - docker login -u gitlab-ci-token -p ${CI_JOB_TOKEN} ${CI_REGISTRY}
 
  stages:
  - docker-build
  - build
  - build-container

We can then define the stage which will build the builder container. If we were always building this on the same machine, we could rely on the Docker cache to ensure that we only rebuild layers that have changed, but since this could be executed anywhere, we cannot rely on the cache of previous builds. As such, we must explicitly pull down any previous image and use Docker’s --cache-from option to instruct it to use this image for any cache checks. We tag the image both with a tag specific to this pipeline, which will be used for caching in the next execution of this pipeline, as well as with a tag specific to this hash of this commit, which will be used in the subsequent stages of this pipeline. If we used only the first of these, then multiple pipelines running at the same time may end up interfering with each other.

docker-build:
  stage: docker-build
  script:
    # Build the Docker image, using the previous images as the cache sources
    - docker pull ${BUILDER_IMAGE_TAG} || true
    - docker build --cache-from ${BUILDER_IMAGE_TAG} -t ${BUILDER_IMAGE_TAG} -t ${BUILDER_IMAGE_HASH_TAG} --target project_builder -f ./Docker/Dockerfile.builder ./Docker
    - docker push ${BUILDER_IMAGE_TAG}
    - docker push ${BUILDER_IMAGE_HASH_TAG}

One more thing to keep in mind here is that if your Dockerfile uses a multi-stage build, you must push all the stages to the repository, and provide multiple --cache-from arguments, one for each stage. If doing this, you must also ensure that the first layer/command in the Dockerfile for each stage is different, as otherwise Docker may use the wrong image as cache and unnecessarily rebuild later layers.

GitLab CI Build Cache

The next step is to use the builder container to achieve incremental builds of the project itself, which requires caching of the build output between executions. The default caching will not actually work with Kubernetes runners due to their distributed nature, so you must first configure GitLab to use a central cache, such as S3.

Caching the build-output alone is also not sufficient to achieve incremental builds. Each time the stage is run, the repository is cloned, which resets all the file-modification times to the current time, which will cause everything to be rebuilt anyway. Not only that, but if we extract the source-code over a different Git clone, then doing a git reset --hard will still detect a difference in the file modification times, and reset them all, again causing a full rebuild.

As such, the only way I have found to achieve an incremental build is to configure GitLab to cache the build-output, the source-code itself, and the .git directory as well, and then configure it to not clone the repository itself, so we can instead update the existing clone from the cache.

variables:
  GIT_STRATEGY: none
cache:
  key: ${CI_JOB_NAME}-${CI_COMMIT_REF_SLUG}
  paths:
    # Cache build output between runs so we can build incrementally
    - Docker/build
    # Also cache the code to maintain file modification times
    - src
    # Cache the Git metadata
    - .git

The problem with caching all of this, is that we do not actually have the current version of the source-code that we intend to build, so before building we must make sure to have pulled down the correct commit, and to tell Git to reset the repository to ensure it is in the correct state. We also have to manually clone the Git repository if this is the first time this pipeline has been built. Only once all this is done can we can pull down the builder image and use it to build the project:

script:
  # Clone the repository if it hasn't been cloned already
  - if [ ! -d .git ] ; then git clone ${CI_REPOSITORY_URL} . ; else echo "Respository already cloned" ; fi
  # Update the Git repository
  - git remote set-url origin ${CI_REPOSITORY_URL}
  - git pull origin ${CI_COMMIT_REF_NAME}
  # Restore any files that may have been overwritten by older versions from the cache
  - git reset --hard ${CI_COMMIT_SHA}
  - git submodule init
  - git submodule update
  - git reset --hard --recurse-submodule HEAD
  # Remove any files from the cache that have been deleted from the repository
  - git clean -f
  - git submodule foreach --recursive git clean -f
  - docker pull ${BUILDER_IMAGE_HASH_TAG}
  - docker run --rm -v ${PWD}:/root/project -w /root/project -e GITLAB_USER=gitlab-ci-token -e GITLAB_PASSWORD=${CI_JOB_TOKEN} ${BUILDER_IMAGE_HASH_TAG}

Artefacts

Next, we need to transfer the build artefacts on to the next stage, so that they can be copied into the runner container. I recommend that you use the install feature of your build system to copy only the necessary binaries into an install folder, for example if using CMake the install directive is used to choose a file for installation:

install(TARGETS project RUNTIME DESTINATION bin)

For the end result to be your project binary placed inside a bin directory under the specified install directory, you would then just need to ensure that the entry-point of your builder container is a script which runs something like the following:

cmake -DCMAKE_INSTALL_PREFIX=$INSTALL_DIR ../..
make
make install

All that remains is then to configure your build stage to treat these files as artefacts:

artifacts:
  name: project-${CI_COMMIT_REF_SLUG}
  paths:
    - Docker/install

Then when you create the container-builder step, add the builder step as a dependency, which will cause it to automatically pull across any related artefacts:

build-container:
  stage: build-container
  dependencies:
    - build
  script:
    - docker pull ${IMAGE_TAG} || true
    - docker pull ${RUNNER_IMAGE_TAG}
    - docker build --cache-from ${RUNNER_IMAGE_TAG} --cache-from ${IMAGE_TAG} -t ${IMAGE_TAG} --target project_runner -f ./Docker/Dockerfile.runner ./Docker
    - docker push ${IMAGE_TAG}

As you can see, the building of the container itself is then similar to the building of the builder container, using a Dockerfile which copies in the binaries from the install directory.

Running Tests

You will probably also have a suite of tests that you need to run as part of the CI pipeline. The container for these can be built similarly to the runner container except with the test binary as the entry-point, but there are some things to keep in mind when running test containers.

If you are running integration tests then the tests might require several containers, so you want to use the --exit-code-from option to make sure that your CI stage fails if the tests fail. There is also an issue in docker-compose versions older than 1.23 which means that, if another container fails before the container supplied to --exit-code-from, you will not receive a failure exit code, so you should try to use docker-compose 1.23 or newer, otherwise extra steps are necessary, i.e. parsing the output from docker-compose ps to determine if any containers failed.

Multi-Project Pipelines

In larger projects you might also wish to split the CI build into multiple pipelines. For example, it would be normal to have separate projects for separate components, each of which build and test the container for their component, then an integration project which pulls down the component containers and runs integration tests against them. Since our previous steps have each pushed their containers to a central Docker registry, the downstream pipelines can pull down and launch the containers that they require.

If you want to be able to trigger these downstream pipelines automatically though, then you will need GitLab Premium for its "Multi-Project Pipelines" feature, which is not available in the Community Edition.

Conclusion

By combining GitLab CI and Kubernetes runners with Docker and the techniques described here, you can achieve the scalability of Kubernetes while maintaining some of the speed of incremental builds. There will inevitably still be some slow-down, as the pushing/pulling of containers takes some additional time, but build times, especially on larger projects, can be improved enough to retain the fast feedback loop that is one of the benefits of using Continuous Integration.

Watch our interactive demo Explore
Schedule a live demo Schedule
Get in contact with a specialist Contact us