Experiments in Docker & APT

Contents

Purpose

Most of my current work involves a project with a PR test pipeline that takes 5 minutes on average, with the majority of that time building the image for the test container.

The purpose of this experiment:

  • Test my intuition of where time is spent in this image build
  • Find reliable ways to measure/analyse build times
  • Marginally speed up the pipeline without significant output changes

Hypothesis

Given an unoptimised Docker image build (i.e. I assume optimisations have not been performed) on an AWS based build runner, in which the build installs software via apt-get:

If I change apt-get mirror configuration for AWS local mirrors, the build time will consistently improve (by at least 10%).

Tests

Preconditions

To establish a testing “bench”:

# I set up the security group via the console, allows SSH in
$ aws ec2 run-instances \
  --count=1 \
  --key-name=docker-apt-experiment-pair \
  --image-id=ami-0b8b10b5bf11f3a22 \
  --instance-type=m5.large \
  --region=ap-southeast-2 \
  --instance-initiated-shutdown-behavior=terminate \
  --security-group-id=sg-f797a492

{
    "Groups": [],
    "Instances": [
        {
            "ImageId": "ami-0b8b10b5bf11f3a22",
            "InstanceId": "i-xxxxxxxxxxxxxxxxx",
            "InstanceType": "m5.large",
            "KeyName": "docker-apt-experiment-pair",
            "StateReason": {
                "Code": "pending",
                "Message": "pending"
            },
            ...
        }
    ],
    ...
}

$ sleep 120
$ aws ec2 describe-instances --instance-id=i-xxxxxxxxxxxxxxxxx --query="Reservations[0].Instances[0].PublicIpAddress"
"xxx.xxx.xxx.xxx"
# access via:
$ ssh ec2-user@xxx-xxx-xxx-xxx
ec2-user$ sudo yum update -y \
  && sudo amazon-linux-extras install docker -y \
  && sudo service docker start \
  && sudo usermod -a -G docker ec2-user \
  && exit
$ ssh ec2-user@xxx-xxx-xxx-xxx
ec2-user$ cat <<'DOCKERFILE' >Dockerfile
...
DOCKERFILE
ec2-user$ DOCKER_BUILDKIT=1 docker build . --progress=plain
#5 [2/3] RUN apt-get update &&   apt-get install -y curl gnupg2 &&   curl -...
#5       digest: sha256:882acf9407b8212a91cbf59b1307c5a749b0c267a0f515382761732937768ba2
#5         name: "[2/3] RUN ..."
#5      started: 2020-02-05 03:28:47.354765107 +0000 UTC
#5    completed: 2020-02-05 03:28:47.354765107 +0000 UTC
#5     duration: 0s
#5       cached: true
ec2-user$ docker tag e9fc75e8ae42 base-test-image:latest

Procedure

AWS Local Mirrors

  • Check initial apt mirror configuration:
ec2-user$ docker run --rm -it base-test-image:latest cat /etc/apt/sources.list
# deb http://snapshot.debian.org/archive/debian/20200130T000000Z buster main
deb http://deb.debian.org/debian buster main
# deb http://snapshot.debian.org/archive/debian-security/20200130T000000Z buster/updates main
deb http://security.debian.org/debian-security buster/updates main
# deb http://snapshot.debian.org/archive/debian/20200130T000000Z buster-updates main
deb http://deb.debian.org/debian buster-updates main
  • Copy files from the app (Gemfile, Gemfile.lock, vendor/cache directory)
  • Create (control) test Dockerfile, a stripped down copy of the original:
FROM base-test-image:latest

RUN apt-get update -y \
    && apt-get install -y curl gnupg2 ca-certificates \
    && echo "deb http://nginx.org/packages/mainline/debian `awk '/VERSION=/ { print $2 }' /etc/os-release | tr -d '\"()'` nginx" \
       | tee /etc/apt/sources.list.d/nginx.list \
    && curl -fsSL https://nginx.org/keys/nginx_signing.key | apt-key add - \
    && apt-get update -y \
    && apt-get install -y nginx \
    && apt-get clean

RUN gem list --installed bundler -v 1.17.3 || gem install bundler -v 1.17.3 --no-document

COPY Gemfile $APP_HOME/Gemfile
COPY Gemfile.lock $APP_HOME/Gemfile.lock
COPY vendor/cache $APP_HOME/vendor/cache
RUN bundle install --jobs=8 --deployment
ec2-user$ DOCKER_BUILDKIT=1 docker build . --progress=plain | tee log.txt
#6 [2/7] RUN apt-get update -y     && apt-get install -y curl gnupg2 ca-cer...
#6       digest: sha256:f7310a936dc93e9b447cbcc8e02a5bb3478dc788fca217fcde3a063a9a4ff5b9
#6         name: "[2/7] RUN apt-get update -y     && apt-get install -y curl gnupg2 ca-certificates     && echo \"deb http://nginx.org/packages/mainline/debian `awk '/VERSION=/ { print $2 }' /etc/os-release | tr -d '\"()'` nginx\"
#  | tee /etc/apt/sources.list.d/nginx.list     && curl -fsSL https://nginx.org/keys/nginx_signing.key | apt-key add -     && apt-get update -y     && apt-get install -y nginx     && apt-get clean"
#6      started: 2020-02-05 04:00:32.401271015 +0000 UTC

#6    completed: 2020-02-05 04:00:43.43985234 +0000 UTC
#6     duration: 12.038887614s
ec2-user$ grep -Ee ' duration: (.*)' log.txt
#5     duration: 666.359µs
#5     duration: 421.233µs
#2     duration: 54.33µs
#2     duration: 11.87551ms
#1     duration: 40.827µs
#1     duration: 16.065335ms
#3     duration: 381.784µs
#4     duration: 216.107µs
#4     duration: 838.391µs
#8     duration: 47.895µs
#8     duration: 354.487512ms
#4     duration: 1.830249432s
#6     duration: 12.038887614s
#7     duration: 1.168140122s
#9     duration: 382.25184ms
#10     duration: 370.907128ms
#11     duration: 449.935383ms
#12     duration: 59.634452659s
#13     duration: 1.837259766s

The apt-get step took approximately 12 seconds.

Clean all images except the base image:

ec2-user$ docker run --name=placeholder base-test-image:latest \
  && docker image prune --all -f \
  && docker builder prune -f \
  && docker rm -f placeholder

Repeat the process, with the test Dockerfile:

FROM base-test-image:latest

# use cloudfront debian mirrors
RUN printf 'deb http://cloudfront.debian.net/debian buster main\ndeb http://security.debian.org/debian-security buster/updates main\ndeb http://cloudfront.debian.net/debian buster-updates main' > /etc/apt/sources.list

RUN apt-get update -y \
    && apt-get install -y curl gnupg2 ca-certificates \
    && echo "deb http://nginx.org/packages/mainline/debian `awk '/VERSION=/ { print $2 }' /etc/os-release | tr -d '\"()'` nginx" \
       | tee /etc/apt/sources.list.d/nginx.list \
    && curl -fsSL https://nginx.org/keys/nginx_signing.key | apt-key add - \
    && apt-get update -y \
    && apt-get install -y nginx \
    && apt-get clean

RUN gem list --installed bundler -v 1.17.3 || gem install bundler -v 1.17.3 --no-document

COPY Gemfile $APP_HOME/Gemfile
COPY Gemfile.lock $APP_HOME/Gemfile.lock
COPY vendor/cache $APP_HOME/vendor/cache
RUN bundle install --jobs=8 --deployment
ec2-user$ DOCKER_BUILDKIT=1 docker build . --progress=plain | tee log.txt
#6 [3/7] RUN apt-get update -y     && apt-get install -y curl gnupg2 ca-cer...
#7       digest: sha256:3028bb4cbad3496c5efd0bef1a84d077938633daaa0c44e0044ea4c38f1d8ced
#7         name: "[3/8] RUN apt-get update -y     && apt-get install -y curl gnupg2 ca-certificates     && echo \"deb http://nginx.org/packages/mainline/debian `awk '/VERSION=/ { print $2 }' /etc/os-release | tr -d '\"()'` nginx\"        | tee /etc/apt/sources.list.d/nginx.list     && curl -fsSL https://nginx.org/keys/nginx_signing.key | apt-key add -     && apt-get update -y     && apt-get install -y nginx     && apt-get clean"
#7      started: 2020-02-05 04:24:11.935575415 +0000 UTC

#7    completed: 2020-02-05 04:24:24.240061429 +0000 UTC
#7     duration: 12.304486014s
ec2-user$ grep -Ee ' duration: (.*)' log.txt
#2     duration: 11.595249ms
#3     duration: 319.704µs
#5     duration: 818.323µs
#5     duration: 402.762µs
#4     duration: 691.54µs
#4     duration: 34.544µs
#1     duration: 17.267575ms
#9     duration: 65.526µs
#9     duration: 345.609248ms
#6     duration: 739.33537ms
#7     duration: 12.304486014s
#8     duration: 1.055141355s
#10     duration: 513.281193ms
#11     duration: 349.507442ms
#12     duration: 461.50172ms
#13     duration: 59.460601654s
#14     duration: 1.826289245s

Conclusion

  • The default Debian mirrors in the base Ruby image perform equivalently to the Cloudfront mirrors. In terms of total image build time, any timing differences are imperceptible.

  • The apt-get installation step takes less than 20% of the entire image build time: native compilation of Ruby gems takes the majority of build time, even though the gems are vendored. (Because vendored gems don’t have compiled native extensions, or because the vendored gems were vendored on a macOS machine?)

  • While I deliberately chose not to profile the build before forming and testing a hypothesis, this is a great demonstration of how my intuition can differ from reality!

Other findings

  • DOCKER_BUILDKIT=1 makes it much easier to see how a build unfolds.
    • Unfortunately, Buildkit isn’t supported via docker-compose.