Part 1 of 4: From weekend project to product - how we evolved a minimal container registry into something worth charging for
When we started building Molnett, we needed a container registry to store the runnable artefacts produced by our customers. Our first priority was to integrate well into GitHub, but GitHub doesn't have a container registry. That meant that if we wanted to run containers, we either needed to store them ourselves, or recommend yet another service for our users to even be able to start using us. For most people just wanting to run some code, that felt like unnecessary complexity - they would've needed to set up accounts, manage credentials, and we'd need per-tenant credential management to access their private repositories at service Y. Building our own registry turned out to be simpler - roughly 400 lines of code:
// services/register/cmd/main.go - the entire original main function
func main() {
reg := register.NewRegistry(cfg, hydraClient, aclClient)
handler := reg.Start(ctx)
server := &http.Server{
Addr: ":8000",
Handler: handlers.LoggingHandler(os.Stdout, handler),
}
server.ListenAndServe()
}
We built just a thin wrapper around CNCF Distribution that made registries private for customers while allowing our platform services to access them. Customers could simply push images to us instead of managing third-party credentials.
Around 2000 container tags and a year later, we still had a working registry that served its purpose perfectly. Customers could push their images, our platform could pull them, and everyone's containers ran as expected. No complaints, no issues.
But as we prepare for our public launch, we realised this was an MVP, not a product. While customers never objected to eventually paying for registry usage in our conversations, we weren't ready to charge for something that was essentially a black box. We couldn't answer basic questions about what was actually happening inside our registry.
The Technical Foundation: Why CNCF Distribution Works
Before diving into our specific challenges, it's worth understanding why CNCF Distribution has become the de facto standard not just for container images, but for storing all kinds of binary artefacts. You'll find registries storing WASM binaries, Helm charts, and other artifacts using the same protocols.
The secret sauce is content-addressable storage - instead of storing files by name or path, you store them by the hash of their contents. Every piece of content (layers, manifests, configs) gets a SHA256 hash that becomes its address. If you have the same layer in ten different images, it's stored exactly once. When you pull an image, you're really pulling a manifest that lists the SHA256 addresses of all the layers you need.
// A simplified view of what CNCF Distribution stores
type Manifest struct {
Layers []Layer `json:"layers"`
}
type Layer struct {
Digest string `json:"digest"` // sha256:abc123...
Size int64 `json:"size"`
MediaType string `json:"mediaType"`
}
Even better, CAS allows deduplication on push. When you push a layer that already exists, the registry can immediately recognise it by hash and skip the upload entirely. This makes pushes incredibly efficient - you only transfer the layers that are actually new.
This design makes registries incredibly efficient for both storage and bandwidth.
Of course, nobody wants to remember sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
. That's why you need tags - human-readable names like api:latest
or api:v1.2.3
that point to specific content in the CAS system.
This is exactly like Git's architecture. Want to see it in action? Look in your code's .git
folder right now:
objects/
contains all the binary blobs (like layers in a registry)refs/
contains pointers to specific commits -refs/heads/main
points to the latest commit on main,refs/tags/v1.0
points to tagged releases
The brilliant insight Docker Hub had was applying this same pattern to binary distribution - content-addressable blobs with human-readable pointers. This content-addressable pattern is why registries can deduplicate so efficiently across images, and why the same storage format works for everything from container images to WASM modules.
The Visibility Challenge
In conclusion, our simple wrapper worked perfectly for its initial purpose, but we couldn't really discover what was inside. To build a registry we could confidently charge for, we needed to solve some fundamental problems:
Problem 1: No inventory We couldn't efficiently list what images a customer had without parsing S3 bucket contents. For 2000 tags across multiple customers, this meant that it took over 30 seconds for some inventories to just to show a basic repository list.
Problem 2: No metadata indexing Want to know if two images share layers? We'd have to download and compare their manifests. Want to see storage usage by project? Parse every blob reference across every manifest.
Problem 3: Multi-architecture complexity The OCI spec has one way of exposing this information in the manifest directly, the OCI Image Index, but people rarely create and push these. We couldn't tell customers which platforms their images supported without actually parsing the config blob of the pushed images.
Problem 4: Authentication mismatch Docker CLI uses one authentication flow, but a web dashboard needs another. We needed to support both without forcing customers to manage separate credentials.
Nothing is impossible, but implementing them meant building a lot more than a simple weekend project. The registry we had built previously worked, but to let our users manage their inventory as they are used to from other providers, we needed to address each of these.
The Solution Path
This series shows how we transformed our black-box registry into something customers can understand and we can confidently price:
Part 2: The OCI Revolution - Understanding multi-architecture images through OCI Image Indexes and Docker Manifest Lists, and why we chose one format for better tooling.
Part 3: Metadata Matters - Building a queryable metadata service on S3 that can answer usage questions instantly instead of parsing bucket contents.
Part 4: A Tale of Two Tokens - Supporting both Docker CLI and web dashboard authentication with a dual-token JWT system.