Nix: optimizing Haskell-based Docker-size

Few months ago, I joined the Haskell Security Response Team.

I've been involved in hsec-tools, which aims, in the medium terms, to support the whole security advisories infrastructure.

One of our distributions is nix, so, when I had to create a docker image, I started with:

pkgs.dockerTools.buildImage {
  name = "haskell/hsec-tools";
  tag = "latest";

  copyToRoot = pkgs.buildEnv {
    name = "image-root";
    paths = [
      self.packages.${system}.hsec-tools # is a haskellPackages.callCabal2nix
      pkgs.git.out
    ];
    pathsToLink = [ "/bin" "/" ];
  };
  config = {
    Cmd = [ "/bin/hsec-tools" ];
    Env = [
      "LOCALE_ARCHIVE=${pkgs.glibcLocales}/lib/locale/locale-archive"
      "LC_TIME=en_US.UTF-8"
      "LANG=en_US.UTF-8"
      "LANGUAGE=en"
      "LC_ALL=en_US.UTF-8"
      "GIT_DISCOVERY_ACROSS_FILESYSTEM=1"
    ];
    Volumes = {
      "/advisories" = { };
    };
    WorkDir = "/";
  };
};

Which gives a 671MB (compressed) / 4.17GB (loaded) docker image.

Terr... i... ble...

Thanks to dive, I had a look at the image, and it contains useless stuff such as GHC or the documentation of many packages.

...
 28 MB  │       ├── 0cddajm56ssb2y2k9his4sqa7z0v0n8h-pandoc-types-1.22.2.1  
 28 MB  │       │   ├── lib                                                 
 28 MB  │       │   │   └── ghc-9.2.5                                       
3.2 kB  │       │   │       ├── package.conf.d                              
3.2 kB  │       │   │       │   └── pandoc-types-1.22.2.1-1PXzD7fZz22Kk4Zvr 
 28 MB  │       │   │       └── x86_64-linux-ghc-9.2.5                      
2.9 MB  │       │   │           ├── libHSpandoc-types-1.22.2.1-1PXzD7fZz22K 
 25 MB  │       │   │           └── pandoc-types-1.22.2.1-1PXzD7fZz22Kk4Zvr 
 13 kB  │       │   │               ├── Paths_pandoc_types.dyn_hi           
 13 kB  │       │   │               ├── Paths_pandoc_types.hi               
 18 kB  │       │   │               ├── Paths_pandoc_types.p_hi             
4.2 MB  │       │   │               ├── Text                                
4.2 MB  │       │   │               │   └── Pandoc                          
 72 kB  │       │   │               │       ├── Arbitrary.dyn_hi            
 72 kB  │       │   │               │       ├── Arbitrary.hi                
 70 kB  │       │   │               │       ├── Arbitrary.p_hi              
140 kB  │       │   │               │       ├── Builder.dyn_hi              
140 kB  │       │   │               │       ├── Builder.hi                  
 92 kB  │       │   │               │       ├── Builder.p_hi                
1.0 MB  │       │   │               │       ├── Definition.dyn_hi           
1.0 MB  │       │   │               │       ├── Definition.hi               
1.0 MB  │       │   │               │       ├── Definition.p_hi             
 12 kB  │       │   │               │       ├── Generic.dyn_hi              
 12 kB  │       │   │               │       ├── Generic.hi                  
 11 kB  │       │   │               │       ├── Generic.p_hi                
 23 kB  │       │   │               │       ├── JSON.dyn_hi                 
 23 kB  │       │   │               │       ├── JSON.hi                     
 23 kB  │       │   │               │       ├── JSON.p_hi                   
148 kB  │       │   │               │       ├── Walk.dyn_hi                 
148 kB  │       │   │               │       ├── Walk.hi                     
150 kB  │       │   │               │       └── Walk.p_hi                   
7.4 MB  │       │   │               ├── libHSpandoc-types-1.22.2.1-1PXzD7fZ 
 13 MB  │       │   │               └── libHSpandoc-types-1.22.2.1-1PXzD7fZ 
 176 B  │       │   └── nix-support                                         
 176 B  │       │       └── propagated-build-inputs
...
679 MB  │       ├─⊕ 0jdbwn0ixmyk2irpiiygphjyrnzxyxnk-ghc-9.2.5-doc
...
1.7 GB  │       ├─⊕ 3abmvcz8b064a4l1k9vgbbqwaw3qxp7y-ghc-9.2.5
...

Hopefully, haskell.lib provides justStaticExecutables:

(pkgs.haskell.lib.justStaticExecutables self.packages.${system}.hsec-tools)

We got down to 174 MB / 574 MB.

We can do better, especially with:

224 MB  │       ├─⊕ zbaycxgvv3iaa18p889dagcfhinasvcx-glibc-locales-2.37-8

We need it to be able to read UTF-8 content, let's try another package:

"LOCALE_ARCHIVE=${pkgs.glibcLocalesUtf8}/lib/locale/locale-archive"

125 MB / 368 MB, great!

Let's keep going, which strikes me is git size: 285 MB (it pulls, among other things, python3), when alpine version is 21 MB.

It it used to inject creation/edition date during advisory parsing.

We can use an alternative package:

pkgs.gitMinimal.out

Now we're at 63 MB / 175MB.

One last thing useless is the /share directory:

 13 MB  └── share
   0 B      ├── bash-completion
   0 B      │   └── completions
   0 B      │       ├── git → ../../git/contrib/completion/git-completion.b
   0 B      │       └── git-prompt.sh → ../../git/contrib/completion/git-pr
567 kB      ├─⊕ git
 69 kB      ├─⊕ git-core
1.3 MB      ├─⊕ git-gui
429 kB      ├─⊕ gitk
 11 MB      └── locale
896 kB          ├── bg
896 kB          │   └── LC_MESSAGES
896 kB          │       └── git.mo
662 kB          ├── ca
662 kB          │   └── LC_MESSAGES
662 kB          │       └── git.mo

which is not particularly useful when you don't use it interactively:

runAsRoot = "rm -Rf /share";

Final image 59 MB / 159 MB.

Compared to our initial 671MB / 4.17GB image, it is reduced down to 8.8% / 3.7%.