Introspect image info from inside docker executor

I’m trying to introspect into the image used by the current docker executor from inside the same container. I’d like to use this to derive a unique key to later recognise if the same docker image is used in future CI jobs; thus I could restore the cache that is unique and binary compatible to container’s image.

Knowing the digest or immutable identifier, aka the RepoDigests from docker image inspect would be one such example. A CIRCLECI_ ENV including this sha256 could help ID and image.

The build I’m working with at hand is very principled, with a wide set of system dependencies. This is automated, as source dependencies are self declared. But to reduce the setup time in CI jobs, we pre-install many such depnedies in the CI docker image.

I’d like to add build caching that reuses the build workspace shipped in the CI image, or using ccache between consecutive builds on a branch/PR. However, we update the CI image nightly to keep dependencies fresh, so I’d like be able to determine what docker image is being used, so I don’t load cache for an different docker image.

I’d like to avoid having to alter the CI image, such as baking in DateTime environment variable or other such nonces via image layers. Many images we use are outside our control or from external communities; thus generalizing to any docker image would be preferable.

Could you determine the folders that contain your dependencies, and do a ls -lR on them, and then feed that output to a hashing program?

They are a mix of system dependencies installed from debian packages, pip installs and who know what else from upstream parent images in the hierarchy. That’s why I’d just like to pin the cache key to the image digest and keep it simple while accounting for any environment changes.

You could have a look at your env vars to see if anything is available, such as an image digest. I’ve not tried it.

The other thing you can do - and which I would do myself - is to Dockerise the process, and do your build using Docker in Docker. To speed it up, you could use layer caching.

I looked, but there doesn’t seem to be a built-in environment variable specific to docker executors:

The docs do mention the benefits of pining docker images by digest, but do not offer a method for programmatically determining what it was from inside the container:

That’s what we largely do already; aside from restoring the workspace first built by DockerHub as a cache, as we can’t determine the image’s immutable identifier to guarantee binary compatibility.

1 Like

Any suggestions here?
Would it be possible to retrieve the print out from the Spin up Environment step from the same job? As the print out from that first step does indeed include the Digest: sha256:... that I am after to determine if the image pulled for the executor is binary compatible with the cache I could restore.

From the comment section of a related idea:

@ndintenfass mentions pipeline-level variables:

Would it be possible to use pipeline-level variables to convey the immutable digest of the docker image?

<< pipeline.executor.docker.image.digest >>

Another issue I’m encountering is that while I’m separating my build and test stages into sequential jobs in my workflow, so that I may enable parallelism for only the testing stage, it is possible for the test job to pull a difference docker image compared to the build job it depends on, given the image tag on the docker registry could have been updated over the span of time for the build job. This of course brakes the caching strategy, as then docker executors are not the same across jobs. Would it be possible for circleci to first query the registry for the immutable identifier upon config expansion at the start of the workflow, so that all jobs would attempt to pull the exact same docker image.

For example, an affix parameter could be added to the docker config, that would express that the image should be affixed with the discovered digest when interpreting the config for the workflow.
added E.g.:

      - image: library/ubuntu:bionic
+         affix: true
    working_directory: /foo/bar

This would expand the tag to include the digest, so that later jobs, either later in the same workflow or from re-run of failed jobs, would pull from the same repeatable image. Above would expand to this:

+     - image: library/ubuntu:bionic@sha256:d1d454df0f579c6be4d8161d227462d69e163a8ff9d20a847533989cf0c94d90
    working_directory: /foo/bar

I think this expansion should only be done once at the start of the workflow, and Rerun from failed should keep the same expansion from the workflow it originated from.

I was looking into the API and discovered that its does in fact expose a steps->actions->output_url field for a single job, allowing a job to introspect its own startup and acquire the sha.

GET Request: Returns the full details for a single job. The response includes all of the fields from the job summary.

This is a bit of a roundabout hack, so I’d still prefer a simpler method using pipeline variables.