All Things Slim Engineering

Follow

All Things Slim Engineering

Follow
A tale of two Go container libraries

A tale of two Go container libraries

Eric Stroczynski's photo
Eric Stroczynski
·Apr 29, 2022·

11 min read

The Slim.AI engineering team writes lots of container and image plumbing code for our SaaS platform, slim.ai, to advance the current state of container ops. Our core competency is docker-slim, a tool that analyzes and optimizes container images via static and dynamic analysis; slim.ai wraps core docker-slim features with automation and image meta-registry capabilities to solve pain points users of managed image registries have, such as Docker Hub. As heavy container users ourselves we are able to design these capabilities as features distilled from our own experiences and desires. Two of these features currently in development are: a dashboard presenting a view of all images across your registry accounts, and the ability to easily copy images between registries. In this post I will give a quick introduction to how the meta-registry works, dive into image replication, and discuss several Go libraries I compared to implement efficient image copying between registries.

Meta-registry intro

Many of you probably read "meta-registry" and asked yourself, what is this technobabble the author is spewing? Fair question! Simply put, a meta-registry is a registry of registries. Imagine one view of all images across your Docker Hub, ECR, and Azure Container Registry (ACR) accounts! This idea is powerful because a single view over multiple registries normalizes their nuances and implementation details into one interface, allowing for single-source administration and automation.

Under the hood, slim.ai implements a Connector interface for each registry, which supports a set of common managed registry methods. These methods are not limited to those supported by the OCI distribution spec, which are useful but already provided by CLI's like docker. Search on an image reference or namespace/organization across one or more registries is an example of a non-standard method. Another is Copy, which this blog post covers.

Why copy images?

Lets start with the fundamentals of a V2 image registry. An image registry is effectively a file server serving content-addressed files that define container images. These files can be image layers, configurations, manifests, etc., and are accessible at various endpoints depending on their media type. Each registry instance knows only about the files and the images they comprise within it's underlying data store. If you are unfamiliar with anything I just said, take a look at this awesome video for an intro, or jump straight into the OCI distribution spec.

Copying an image in this context conceptually means replicating its manifest, configuration, and all layers specified by the manifest between two registries. For those familiar with the docker CLI, copying is effectively:

docker pull my.reg/foo/bar:latest
docker tag my.reg/foo/bar:latest myother.reg/foo/bar:latest
docker push myother.reg/foo/bar:latest

This is a form of replication that provides resiliency, and possibly pull latency reduction if the source and destination registry are not colocated geographically. Depending on registry implementation, either the registry server itself, it's underlying datastore, or both are distributed to provide these properties, as either or both can fail.

In cloud parlance, a registry can be distributed across multiple availability zones (AZ). While this is not a feature you can assume any given managed registry provider supports, many do: ECR added support for cross-region replication in late 2020; GCR lets you replicate across up to four regions; ACR supports automatic geo-replication. Unfortunately Docker Hub does not support multiple AZ configuration as far as I have observed across several online searches. Copying an image across AZ's may be as "simple" as configuring the data store for replication, ex. read-replicating an SQL database. This is not copying an image as conceptually outlined above but shares one underlying goal with image copying: to publish high-availability images.

Copying an image between registries is a form of replication that adds provider-based resiliency to the mix; if one managed registry provider has an outage, the image is still available at another. The docker CLI example I gave above is how many typically accomplish this goal, whether manually or in CI. A major drawback with that pull/tag/push flow is speed: pulling will HTTP GET and commit all image data to disk, then pushing will read and POST/PUT that data from disk. Additionally many CLI tools like docker and nerdctl proxy image operations via a daemon, which adds additional network overhead (small if on the same machine). Of course this explanation is eliding caching and file I/O optimizations on modern hardware, but suffice it to say reading and writing with as little file I/O as possible would be much better. Another drawback with this approach, at least for my particular use case, is the lack of multi-platform capabilities in many image APIs, only permitting interaction with images annotated for the current platform. Libraries providing these APIs tend to commingle distribution and runtime functionality, so single-platform support is enforced by design. A tool like docker-buildx can potentially support build and multi-registry push in one command, but that feature is not enabled as of writing this post.

So, how do we get a nicer, faster image copying experience than the pull/tag/push flow? Read on!

Disclaimer: components of this analysis are highly subjective. My opinions are not hard truths.

Choosing an image library

Enter container image libraries, of which there are many. I like to categorize them into two groups based on how they interact with an image registry, briefly discussed above: direct, which makes HTTP requests to an image registry, and indirect, which proxies registry requests via a daemon listening on a Unix socket. Since this post discusses efficient copying I focused my search on libraries with three characteristics: direct, which tend to be more efficient for certain tasks; those that expose some sort of in-memory Copy function, or lazy/buffered Pull and Push functions; and those written in Go1, since slim.ai's backend is written in Go.

Using these search criteria, I chose to assess:

  1. containers/image, a feature-rich library used by the skopeo image tool.
  2. google/go-containerregistry, a library with a concise API and design philosophy that nicely fits an image copying use case.

My evaluation criteria for each library, ordered from most to least important, are:

  1. Efficiency: first and foremost, can I copy many images at once relatively quickly with low resource usage?
  2. Ergonomics: how many boilerplate LoC do I need to get the thing working? Are there lots of confusing options to configure that will never benefit my use case? Is the code I am going to write maintainable?
  3. Library size: how much will the new dependency tree bloat slim.ai's backend binary? How many new dependencies will I have to manage?
  4. Extra functionality: there are tons of nice-to-haves when managing images like custom auth integrations, but since slim.ai already handles a lot of this, these are not needed.

Additionally I have included containerd in the benchmark and library size analyses, since it is one of the most well-known and widely-used container runtime libraries out there. containerd is an indirect library, does not offer a Copy function, and also does not purport itself to be tuned for my use case. I do think it is worth quantitatively analyzing due to it's ubiquity, and to show how slow a full push+pull can be. Due to these constraints it would not be fair to compare containerd to the other libraries on qualitative criteria, much like one does not conflate bananas with plantains. I treat Pull+Push as Copy in the benchmark, and refer to it as CopyContainerd.

1 I considered shelling out to a binary to be language-agnostic, but this adds environmental dependencies and operational complexity, which in my opinion is a major hit to ergonomics.

Efficiency

Time to run some benchmarks! I chose to use two in-memory registries, one (source) pre-loaded with N randomized images and the other (destination) empty, to assess efficiency of each library's Copy function. Note that go-containerregistry's benchmark gains no benefit from this choice of test harness.

My hardware setup (8 CPU cores, 32 GB DDR4):

goos: linux
goarch: amd64
pkg: github.com/slim-ai/go-image-lib-benchmark
cpu: 11th Gen Intel(R) Core(TM) i7-1195G7 @ 2.90GHz

The following benchmarks run on 1, 4, and 8 max CPUs, with max 8 goroutines per Copy call, over 100 iterations:

BenchmarkCopyContainersImage              100          14055206 ns/op         1045418 B/op       6487 allocs/op
BenchmarkCopyContainersImage-4            100          10218015 ns/op         1099314 B/op       6615 allocs/op
BenchmarkCopyContainersImage-8            100          10393303 ns/op         1241455 B/op       6673 allocs/op
BenchmarkCopyGoContainerregistry          100           2160080 ns/op          413823 B/op       2987 allocs/op
BenchmarkCopyGoContainerregistry-4        100           1356623 ns/op          407963 B/op       2936 allocs/op
BenchmarkCopyGoContainerregistry-8        100           1477929 ns/op          476178 B/op       2960 allocs/op
BenchmarkCopyContainerd                   100          24969750 ns/op          605232 B/op       7355 allocs/op
BenchmarkCopyContainerd-4                 100          24587615 ns/op          753558 B/op       7449 allocs/op
BenchmarkCopyContainerd-8                 100          24783194 ns/op          990529 B/op       7448 allocs/op

Clearly go-containerregistry's Copy wins here, across all categories. Note the order of magnitude difference in speed between go-containerregistry and containerd, which uses the push+pull pattern.

You can run these benchmarks yourself from this repo.

Ergonomics

For the ergonomics qualitative analysis I'll refer back to the benchmark repo's Copy functions, inlined below.

containers/image:

func CopyContainersImage(ctx context.Context, srcRef, dstRef string, parallelism int) error {
    srcIRef, err := alltransports.ParseImageName(srcRef)
    if err != nil {
        return err
    }
    dstIRef, err := alltransports.ParseImageName(dstRef)
    if err != nil {
        return err
    }

    srcCtx := &types.SystemContext{
        OCIInsecureSkipTLSVerify: true,
        DockerAuthConfig:         &types.DockerAuthConfig{},
    }
    dstCtx := &types.SystemContext{
        OCIInsecureSkipTLSVerify: true,
        DockerAuthConfig:         &types.DockerAuthConfig{},
    }

    policy := &signature.Policy{
        Default: []signature.PolicyRequirement{signature.NewPRInsecureAcceptAnything()},
    }
    policyCtx, err := signature.NewPolicyContext(policy)
    if err != nil {
        return err
    }

    opts := &copy.Options{
        SourceCtx:            srcCtx,
        DestinationCtx:       dstCtx,
        ImageListSelection:   copy.CopyAllImages,
        ReportWriter:         io.Discard,
        MaxParallelDownloads: uint(parallelism),
    }
    _, err = copy.Image(ctx, policyCtx, dstIRef, srcIRef, opts)
    return err
}

go-containerregistry:

func CopyGoContainerregistry(ctx context.Context, srcRef, dstRef string, parallelism int) error {
    opts := []crane.Option{
        crane.WithContext(ctx),
        crane.Insecure,
        func(o *crane.Options) {
            o.Remote = append(o.Remote, remote.WithJobs(parallelism))
        },
    }
    return crane.Copy(srcRef, dstRef, opts...)
}

The first function's setup and invocation has some boilerplate, which can be framed as an artifact of containers/image's feature-richness. The second function's setup and invocation is quite a bit simpler. We can see that this comparison holds true in the internals of each Copy function (container/image, go-containerregistry). I would go further to say that while container/image/copy.Image API is fine, the internal complexity unveils how difficult container/image's base API's are to compose.

By expanding our horizons a bit, we can see that go-containerregistry has taken a structured, layered approach to package organization: the top-level crane package is quite straightforward, while v1 subpackages expose a ton of primitives for handling various image components and states. containers/image's package structure is intuitive for Copy (the copy package) and basic registry operations (the docker package), but otherwise does not appear as coherent.

Library size

Another objective measurement! Here we look at sizes of binaries compiled with each library, using du, with and without symbols.

These binaries are compiled with only the eponymous dependency's Copy benchmark function:

26M     ./bin/containers-image-copy
19M     ./bin/containers-image-copy-nosymbols
7.7M    ./bin/go-containerregistry-copy
5.5M    ./bin/go-containerregistry-copy-nosymbols
20M     ./bin/containerd-copy
15M     ./bin/containerd-copy-nosymbols

Since most larger projects will have a dense dependency graph, the above only establishes an upper bound for compiled library size. Below are binaries compiled from slim.ai's backend codebase, which did not directly depend on any of these libraries prior to this analysis but does depend on docker/docker and docker/distribution, with imported benchmark functions:

42M     ./bin/sai-backend
38M     ./bin/sai-backend-nosymbols
49M     ./bin/sai-backend-containers-image-copy
45M     ./bin/sai-backend-containers-image-copy-nosymbols
42M     ./bin/sai-backend-go-containerregistry-copy
38M     ./bin/sai-backend-go-containerregistry-copy-nosymbols
44M     ./bin/sai-backend-containerd-copy
42M     ./bin/sai-backend-containerd-copy-nosymbols

Across both size comparisons, go-containerregistry wins with a <1% increase (38460/38588 bytes) in binary size with symbols stripped. This impressively small size different is likely due to the library's relatively small module footprint, and because its largest dependencies (on docker repos) are already depended upon by the slim.ai backend binary. It is worth noting that containers/image depends indirectly on containerd, so the latter's size win doesn't mean much here.

Extra functionality

This final qualitative analysis looks at what other features a library has outside of the core registry APIs.

Common features between go-containerregistry and containers/image:

  • Registry authentication with challenge protocol handlers.
  • Ephemeral image copying.
  • Image tar archive import and export.
  • Image (HTTP) transport configuration.
  • Blob caching.
    • containers/image supports boltdb caching for durability if you need that.
  • Read images from and write images to disk.
  • Basic interoperability with the docker daemon.

Features unique to containers/image:

  • GPG image signing.
    • This image verification strategy appears to be mainly used by Red Hat (disclosure: I am a former employee), and precedes Docker Content Trust (DCT) (obviously PGP itself is much older than DCT). Without diving into details, GPG image signing is nice because it uses an existing, established trust system present on many machines by default; DCT uses a different asymmetric key cryptosystem that has advantages over simple GPG signing.
  • OpenShift transport integration.

Features unique to go-containerregistry:

  • In-memory registry package for testing, optionally with TLS.
  • Random image and index image generation for testing.
  • Image mutation, ex. add/remove layers to/from an existing image.
  • Google Cloud (gcloud) integration.

I do not see a clear winner here. The common extra features supported by both are more than enough to do a lot of image-related things. I will note that the testing features of go-containerregistry are quite nice; unfortunately they do not integrate directly with containers/image's types so you would have to implement some shim interfaces to use both in concert.

Conclusion

I ended up choosing go-containerregistry for my use case due to it's high performance, small compiled footprint, and ergonomics. While this choice suits my use case, there is no absolute winner here. These criteria should be re-evaluated for different use cases; in fact, the stuff I threw into the "extra functionality" section might be essential to yours! My recommendation, and what I ended up doing, is write an ImageCopier interface to wrap library calls, use that library until you need features that another provides, then swap them.

I hope you enjoyed the read. This is the first of many technical blog posts to come from the engineering team at Slim.AI, so stay tuned!

P.S. we're hiring.

 
Share this