The Standards Behind the Modern Container Images


In the previous lesson, we’ve briefly looked at the way Docker stores it’s images. We’ve also used Skopeo to copy those images between different types of image stores. Now it’s time to talk about the standards behind the modern container images.
Container Image Format Specification or, in short, image-spec, is the part of OCI that defines what container image is. Examining this specification is the way to understand how container images are structured and how they work. Our plan for this lesson is to take a real container image and then look inside it while at the same time examining the specification. This time, instead of using an alpine image, let’s use an httpd image.
First, let’s get an image in the OCI format. Skopeo supports two OCI storage types - an archive with the image inside, or just the image. We will use the latter one:
skopeo copy docker://docker.io/httpd:2.4.48 oci:httpd:2.4.48
We run scopeo copy from the docker source storage to the oci destination storage.
To better understand the structure, let’s copy the previous version as well:
skopeo copy docker://docker.io/httpd:2.4.47 oci:httpd:2.4.47
Both commands copy the image contents into the httpd directory.
The first file we are going to examine is called index.json, which lies at the root of the httpd directory:
{
"schemaVersion": 2,
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:a2edc9fb0bf7a443594a249fe8d63ad50aa8b2f8e6099429fb5bed3177d8d3d2",
"size": 975,
"annotations": {
"org.opencontainers.image.ref.name": "2.4.48"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:0830fa69c5f7cf4ab523ebe5ca1e7c6e5f09ed0298051e7f4819f1208effb9e0",
"size": 975,
"annotations": {
"org.opencontainers.image.ref.name": "2.4.47"
}
}
]
}
Index file points to one or more index manifests, as indicated by the “mediaType” field. The most interesting part here is an “annotations” block - that’s how image tags look like inside image-spec images. Note that there is not even a “tag” word in there. The tag is just a simple annotation with a reference to some name. When you work with tags in Docker or any other tool, keep in mind that in the underlying image implementation tags do not exist - it’s just some metadata that we are used to have.
In the case of the httpd image we have here, the only other fields we have is the digest, and the size. If you look at the specification, you will see that index file can also specify for which platform each manifest is valid - the same image could be used for both Linux and Windows.
The digest part is the way image-spec handles dependencies and addressability of different content types. Digest consists of the name of the algorithm, used to get the digest, and the digest itself - in theory, using this digests allows the tools that implement image-spec to verify the content itself, as long as the digest was received in a secure way.
We saw the first content type image contains - the image manifest. We can find the manifest itself inside the blobs directory. This directory has one sub-directory per algorithm name used to get a digest of the content, in our case - sha256. Inside sha256 directory we will find a list of files with hardly readable names. Each file name in this directory is the digest of the content of the file itself.
For example, let’s take the digest of the first image manifest in the index file and try to find this file only by it’s digest:
cat blobs/sha256/0830fa69c5f7cf4ab523ebe5ca1e7c6e5f09ed0298051e7f4819f1208effb9e0 | jq
We got an image manifest contents! If we now get a sha246 checksum of this file, we will get the filename:
cat blobs/sha256/0830fa69c5f7cf4ab523ebe5ca1e7c6e5f09ed0298051e7f4819f1208effb9e0 | sha256sum
These digests, of course, are not meant to be read by humans. This convention is handy for various container tools that are compatible with image-spec, because it allows them to verify checksums as well as discover required files by using those digests. Lets now examine the image manifest:
cat blobs/sha256/a2edc9fb0bf7a443594a249fe8d63ad50aa8b2f8e6099429fb5bed3177d8d3d2 | jq
I will cut the file and pipe to jq for a nicer output. Two main sections of the image manifest are config, which points to the image config file, and layers, which points to all the image layers for this particular manifest. Notice, how each block has the size, the mediaType and the digest - image-spec defines the same way to address content via digests regardless if its a config file or an image layer.
Let’s look at the config - we need to grab the digest and simply look inside the file:
{
"created": "2021-05-26T00:23:59.233513625Z",
"architecture": "amd64",
"os": "linux",
"config": {
"ExposedPorts": {
"80/tcp": {}
},
"Env": [
"PATH=/usr/local/apache2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"HTTPD_PREFIX=/usr/local/apache2",
"HTTPD_VERSION=2.4.48",
"HTTPD_SHA256=1bc826e7b2e88108c7e4bf43c026636f77a41d849cfb667aa7b5c0b86dbf966c",
"HTTPD_PATCHES="
],
"Cmd": [
"httpd-foreground"
],
"WorkingDir": "/usr/local/apache2",
"StopSignal": "SIGWINCH"
},
"rootfs": {
"type": "layers",
"diff_ids": [
"sha256:02c055ef67f5904019f43a41ea5f099996d8e7633749b6e606c400526b2c4b33",
"sha256:15fd28211cd0b81fe82199f8949e4f7c911c7b27e3bd15ec146e0c217c1f5f5d",
"sha256:33c6c92714e0ddf7c4797f75c79ef89291ec02df482fa1513d355cb2b2a6bd23",
"sha256:33de34a890b77461e0b431de1979bf21ee5d30060ed0b6f183f03c46593e0210",
"sha256:98d580c48609d84e2dd6d04beed7fd0815c7b04998f623e42c3383174fedb851"
]
},
"history": [
{
"created": "2021-05-12T01:21:22.128649612Z",
"created_by": "/bin/sh -c #(nop) ADD file:7362e0e50f30ff45463ea38bb265cb8f6b7cd422eb2d09de7384efa0b59614be in / "
},
{
"created": "2021-05-12T01:21:22.552826465Z",
"created_by": "/bin/sh -c #(nop) CMD [\"bash\"]",
"empty_layer": true
},
{
"created": "2021-05-12T07:45:53.511522639Z",
"created_by": "/bin/sh -c #(nop) ENV HTTPD_PREFIX=/usr/local/apache2",
"empty_layer": true
},
{
"created": "2021-05-12T07:45:53.732983951Z",
"created_by": "/bin/sh -c #(nop) ENV PATH=/usr/local/apache2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"empty_layer": true
},
{
"created": "2021-05-12T07:45:54.982144308Z",
"created_by": "/bin/sh -c mkdir -p \"$HTTPD_PREFIX\" \t&& chown www-data:www-data \"$HTTPD_PREFIX\""
},
{
"created": "2021-05-12T07:45:55.199499185Z",
"created_by": "/bin/sh -c #(nop) WORKDIR /usr/local/apache2",
"empty_layer": true
},
{
"created": "2021-05-12T07:46:00.749230874Z",
"created_by": "/bin/sh -c set -eux; \tapt-get update; \tapt-get install -y --no-install-recommends \t\tlibaprutil1-ldap \t; \trm -rf /var/lib/apt/lists/*"
},
{
"created": "2021-05-26T00:21:41.949257923Z",
"created_by": "/bin/sh -c #(nop) ENV HTTPD_VERSION=2.4.48",
"empty_layer": true
},
{
"created": "2021-05-26T00:21:42.135829965Z",
"created_by": "/bin/sh -c #(nop) ENV HTTPD_SHA256=1bc826e7b2e88108c7e4bf43c026636f77a41d849cfb667aa7b5c0b86dbf966c",
"empty_layer": true
},
{
"created": "2021-05-26T00:21:42.336246549Z",
"created_by": "/bin/sh -c #(nop) ENV HTTPD_PATCHES=",
"empty_layer": true
},
{
"created": "2021-05-26T00:23:58.249211771Z",
"created_by": "/bin/sh -c set -eux; \t\tsavedAptMark=\"$(apt-mark showmanual)\"; \tapt-get update; \tapt-get install -y --no-install-recommends \t\tbzip2 \t\tca-certificates \t\tdirmngr \t\tdpkg-dev \t\tgcc \t\tgnupg \t\tlibapr1-dev \t\tlibaprutil1-dev \t\tlibbrotli-dev \t\tlibcurl4-openssl-dev \t\tlibjansson-dev \t\tliblua5.2-dev \t\tlibnghttp2-dev \t\tlibpcre3-dev \t\tlibssl-dev \t\tlibxml2-dev \t\tmake \t\twget \t\tzlib1g-dev \t; \trm -r /var/lib/apt/lists/*; \t\tddist() { \t\tlocal f=\"$1\"; shift; \t\tlocal distFile=\"$1\"; shift; \t\tlocal success=; \t\tlocal distUrl=; \t\tfor distUrl in \t\t\t'https://www.apache.org/dyn/closer.cgi?action=download&filename=' \t\t\thttps://www-us.apache.org/dist/ \t\t\thttps://www.apache.org/dist/ \t\t\thttps://archive.apache.org/dist/ \t\t; do \t\t\tif wget -O \"$f\" \"$distUrl$distFile\" && [ -s \"$f\" ]; then \t\t\t\tsuccess=1; \t\t\t\tbreak; \t\t\tfi; \t\tdone; \t\t[ -n \"$success\" ]; \t}; \t\tddist 'httpd.tar.bz2' \"httpd/httpd-$HTTPD_VERSION.tar.bz2\"; \techo \"$HTTPD_SHA256 *httpd.tar.bz2\" | sha256sum -c -; \t\tddist 'httpd.tar.bz2.asc' \"httpd/httpd-$HTTPD_VERSION.tar.bz2.asc\"; \texport GNUPGHOME=\"$(mktemp -d)\"; \tfor key in \t\tDE29FB3971E71543FD2DC049508EAEC5302DA568 \t\t13155B0E9E634F42BF6C163FDDBA64BA2C312D2F \t\t8B39757B1D8A994DF2433ED58B3A601F08C975E5 \t\t31EE1A81B8D066548156D37B7D6DBFD1F08E012A \t\tA10208FEC3152DD7C0C9B59B361522D782AB7BD1 \t\t3DE024AFDA7A4B15CB6C14410F81AA8AB0D5F771 \t\tEB138C6AF0FC691001B16D93344A844D751D7F27 \t\tCBA5A7C21EC143314C41393E5B968010E04F9A89 \t\t3C016F2B764621BB549C66B516A96495E2226795 \t\t937FB3994A242BA9BF49E93021454AF0CC8B0F7E \t\tEAD1359A4C0F2D37472AAF28F55DF0293A4E7AC9 \t\t4C1EADADB4EF5007579C919C6635B6C0DE885DD3 \t\t01E475360FCCF1D0F24B9D145D414AE1E005C9CB \t\t92CCEF0AA7DD46AC3A0F498BCA6939748103A37E \t\tD395C7573A68B9796D38C258153FA0CD75A67692 \t\tFA39B617B61493FD283503E7EED1EA392261D073 \t\t984FB3350C1D5C7A3282255BB31B213D208F5064 \t\tFE7A49DAA875E890B4167F76CCB2EB46E76CF6D0 \t\t39F6691A0ECF0C50E8BB849CF78875F642721F00 \t\t29A2BA848177B73878277FA475CAA2A3F39B3750 \t\t120A8667241AEDD4A78B46104C042818311A3DE5 \t\t453510BDA6C5855624E009236D0BC73A40581837 \t\t0DE5C55C6BF3B2352DABB89E13249B4FEC88A0BF \t\t7CDBED100806552182F98844E8E7E00B4DAA1988 \t\tA8BA9617EF3BCCAC3B29B869EDB105896F9522D8 \t\t3E6AC004854F3A7F03566B592FF06894E55B0D0E \t\t5B5181C2C0AB13E59DA3F7A3EC582EB639FF092C \t\tA93D62ECC3C8EA12DB220EC934EA76E6791485A8 \t\t65B2D44FE74BD5E3DE3AC3F082781DE46D5954FA \t\t8935926745E1CE7E3ED748F6EC99EE267EB5F61A \t\tE3480043595621FE56105F112AB12A7ADC55C003 \t\t93525CFCF6FDFFB3FD9700DD5A4B10AE43B56A27 \t\tC55AB7B9139EB2263CD1AABC19B033D1760C227B \t; do \t\tgpg --batch --keyserver ha.pool.sks-keyservers.net --recv-keys \"$key\"; \tdone; \tgpg --batch --verify httpd.tar.bz2.asc httpd.tar.bz2; \tcommand -v gpgconf && gpgconf --kill all || :; \trm -rf \"$GNUPGHOME\" httpd.tar.bz2.asc; \t\tmkdir -p src; \ttar -xf httpd.tar.bz2 -C src --strip-components=1; \trm httpd.tar.bz2; \tcd src; \t\tpatches() { \t\twhile [ \"$#\" -gt 0 ]; do \t\t\tlocal patchFile=\"$1\"; shift; \t\t\tlocal patchSha256=\"$1\"; shift; \t\t\tddist \"$patchFile\" \"httpd/patches/apply_to_$HTTPD_VERSION/$patchFile\"; \t\t\techo \"$patchSha256 *$patchFile\" | sha256sum -c -; \t\t\tpatch -p0 < \"$patchFile\"; \t\t\trm -f \"$patchFile\"; \t\tdone; \t}; \tpatches $HTTPD_PATCHES; \t\tgnuArch=\"$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)\"; \tCFLAGS=\"$(dpkg-buildflags --get CFLAGS)\"; \tCPPFLAGS=\"$(dpkg-buildflags --get CPPFLAGS)\"; \tLDFLAGS=\"$(dpkg-buildflags --get LDFLAGS)\"; \t./configure \t\t--build=\"$gnuArch\" \t\t--prefix=\"$HTTPD_PREFIX\" \t\t--enable-mods-shared=reallyall \t\t--enable-mpms-shared=all \t\t--enable-pie \t\tCFLAGS=\"-pipe $CFLAGS\" \t\tCPPFLAGS=\"$CPPFLAGS\" \t\tLDFLAGS=\"-Wl,--as-needed $LDFLAGS\" \t; \tmake -j \"$(nproc)\"; \tmake install; \t\tcd ..; \trm -r src man manual; \t\tsed -ri \t\t-e 's!^(\\s*CustomLog)\\s+\\S+!\\1 /proc/self/fd/1!g' \t\t-e 's!^(\\s*ErrorLog)\\s+\\S+!\\1 /proc/self/fd/2!g' \t\t-e 's!^(\\s*TransferLog)\\s+\\S+!\\1 /proc/self/fd/1!g' \t\t\"$HTTPD_PREFIX/conf/httpd.conf\" \t\t\"$HTTPD_PREFIX/conf/extra/httpd-ssl.conf\" \t; \t\tapt-mark auto '.*' > /dev/null; \t[ -z \"$savedAptMark\" ] || apt-mark manual $savedAptMark; \tfind /usr/local -type f -executable -exec ldd '{}' ';' \t\t| awk '/=>/ { print $(NF-1) }' \t\t| sort -u \t\t| xargs -r dpkg-query --search \t\t| cut -d: -f1 \t\t| sort -u \t\t| xargs -r apt-mark manual \t; \tapt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; \t\thttpd -v"
},
{
"created": "2021-05-26T00:23:58.611332938Z",
"created_by": "/bin/sh -c #(nop) STOPSIGNAL SIGWINCH",
"empty_layer": true
},
{
"created": "2021-05-26T00:23:58.846051936Z",
"created_by": "/bin/sh -c #(nop) COPY file:c432ff61c4993ecdef4786f48d91a96f8f0707f6179816ccb98db661bfb96b90 in /usr/local/bin/ "
},
{
"created": "2021-05-26T00:23:59.032759591Z",
"created_by": "/bin/sh -c #(nop) EXPOSE 80",
"empty_layer": true
},
{
"created": "2021-05-26T00:23:59.233513625Z",
"created_by": "/bin/sh -c #(nop) CMD [\"httpd-foreground\"]",
"empty_layer": true
}
]
}
If you remember the previous lesson, where we looked at how Docker stores an image, then this file will feel oddly familiar to you. This config specifes for which platform this image is, when it was created and different options that will be passed to container by default - like environment variables, command to execute, working directory and exposed ports. It also has a history of changes to this image. image-spec was largely inspired by Docker Image Specification - it took it’s best parts and then improved upon them and made it more generic and applicable to more use cases.
Every field in the image config is defined by the image-spec - what we see in this file, is just a subset of what you can use here. When you build an image from a Dockerfile or a Containerfile, many options you specify there will end up in this config file.
Finally, let’s look at the image layer. Lets get back to the manifest:
{
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:12e3add88bd376fd99da12fcd8c3b35303affea4dce27255fe49bf65c529ecad",
"size": 7580
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:69692152171afee1fd341febc390747cfca2ff302f2881d8b394e786af605696",
"size": 27145915
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:7284b4e0cc7b197edc206f815c5b24e67b9ed29abd9bbd8ae4bfdd5540bec6ec",
"size": 176
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:3678b2d55ccdc6dcbe11cf1ea518ab7426ab37656d94213f637bd843dc6b6ca4",
"size": 2794733
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:aeb67982a725b5d6e8a2b3114d1bc8ca4aaadb6b6797614b6831cd6703260768",
"size": 24583127
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:06954f8169fdab8e97f3e61ee1090df58c3362b8a4d00160d3da53ef6577b131",
"size": 300
}
]
}
Layers come in order, with the first layer being the base one, and everything else is layered on top. The mediaType field tells as, that each layer file is a gzipped tar archive, so to look at it we need to unpack it. Let’s do this for the base layer first:
tar -xvf ../blobs/sha256/69692152171afee1fd341febc390747cfca2ff302f2881d8b394e786af605696
This looks like a complete filesystem - which seems about right, because we do need a complete filesystem to run a container. Now let’s look at the next layer:
tar -xvf ../blobs/sha256/7284b4e0cc7b197edc206f815c5b24e67b9ed29abd9bbd8ae4bfdd5540bec6ec
This layer has only some changes to the user directory. The most interesting part here is that there are no actual files except a strangly named .wh..wh..opq
. This is a so called “whipeout” file, as also defined in the image-spec.
Whipeout files are needed to remove some files from the image. When some container image tool is building an image, it needs to work with the layers. But if you just layer things one on top of another, then how do you remove files? The answer of image-spec is those whipeout files. If you prefix a file with .wh.
, this file will be removed from this layer. And if you add a file called .wh..wh..opq
, then every sibling directory or file in this directory will be removed.
If I simply unpack each layer one after another with the tar command, none of the files will be removed. What is important to keep in mind here, is that image-spec only defines what whipeout file are and how to process them. You need an OCI-compliant image building tool to process those whipeout files and to properly put layers on top of each other.
Besides whipeout files, image-spec defines complete process how layers should look like and how the container image tool should work with them.
Let’s re-cap quickly the most important things about image-spec.
image-spec defines how container images are structured and different config files inside the image.
On the top level, you have an index file, that points to multiple image manifests - each manifest can be annotated with what we would call an image tag.
Image manifest points to an image config file and to image layers.
Image config file, in turn, defines how to run this image, by specifying which command to run, with which environment and so on.
Image layers are just tar archives that need to be layered on top of each other by a container tool.
Finally, all the files inside the image are referenced by digests and named after digests, except the index file.
This structure is at the core of how container images are build, updated and configured - at least in OCI compliant tools. But the truth is that to run a container, you don't need a container image in the first place. Instead, you need a container bundle - which will be the topic of our next lesson.