Posted on March 20, 2020

How Nix Derivation Instantiation Works

The Nix package manager broadly procedes in three1 phrases:

  1. Evaluation of Nix expressions to produce (sets of) derivations
  2. Instantiation of derivations
  3. Building derivations to produce components (outputs)

Of these, instantiation is the most mysterious and is the subject of this post. We’ll look at some of the same examples described in the Nix pills, but in painstaking detail.

Fair warning: This might be a long tedious read, but so is the Nix source code… Alternatively you can consult Section 5 of Dolstra’s PhD thesis. Some information here may be buggy, particularly the pseudocode. I appreciate any bugs you find.

First, recall that a Nix derivation is a set of data describing how a component is built (not the final product itself, which is called the output). These data mostly contain pointers to raw data loaded in the Nix store (e.g. build scripts) and pointers to other derivations which form dependencies (e.g., the derivation for Bash so we can execute our build script).

A Nix object is anything stored in the Nix store via instantiation, and instantiation for our purposes mostly consists of calculating two things:

  1. A unique descriptor string which identifies the object
  2. A unique store path under which the thing will be stored

The descriptor is an ephemeral object used to calculate store paths and other descriptors. Both things are computed from SHA256 hashes. Frequently these hashes are computed from other hashes appended with various bits of metadata to ensure the right amounts of unique-ness. In particular, the (hash component of a) store path is a hash of the descriptor appended with metadata, and the descriptor is frequently just a hash of the object mixed together with information about its inputs. Remember that the inputs of an object are considered to be a part of its identity, so the descriptor needs to reflect this.

Finally, you should be aware that a derivation and its output form two separate objects. The first is a simple text file with extension .drv containing build data. The second is usually a directory containing executables, tar files, or anything else we might want a build system to produce. Therefore each derivation yields an output descriptor, output store path, derivation descriptor, derivation store path—four separate things computed in that order.

Now let’s jump in.

Basic Derivations

First, get set up with our sample default.nix and sample file myfile. Copy these into your local directory.

./default.nix

rec {
    foo = derivation { system = "x86_64-linux"; builder = ./myfile; name = "foo"; };
    bar = derivation { name = "bar"; system = "x86_64-linux"; builder = "none";
        outputHashMode = "flat"; outputHashAlgo = "sha256"; outputHash =
        "f3f3c4763037e059b4d834eaf68595bbc02ba19f6d2a500dce06d124e2cd99bb"; } ;
    baz = derivation { system = "x86_64-linux"; builder = "${foo}/bin/bazbuilder";
        args = [ "${bar}/var/bazargs" ]; name = "baz"; };
    zap = derivation { system = "x86_64-linux"; builder = "${baz}/bin/zapbuilder";
        args = [ ./myfile "${foo}/arg1" "${bar}/arg2" ]; name = "zap"; };

}

./myfile

mycontents

None of these derivations actually build anything and will generate errors if you try. That doesn’t matter because instantiation is totally independent of building.

Now instantiate package foo with nix-instantiate in the current directory. Nix will pick up on our ./default.nix and determine which foo we are talking about.

$ nix-instantiate -A foo
/nix/store/y4h73bmrc9ii5bxg6i7ck6hsf5gqv8ck-foo.drv

Nix has produced a fully-evaluated derivation and put it into derivation path /nix/store/y4h73bmr...-foo.drv as a side effect. 2 Let’s talk about the structure of such a file. Generally a .drv will mention a few kinds of things: * other .drv paths in the Nix store * raw data (e.g., source files) in the Nix store * output paths in the Nix store (i.e., paths whose contents are generated from derivation files)

When I say “in the Nix store” I mean the paths begin with /nix/store, but not all such paths will necessarily exist yet. As an invariant, if derivation A points (refers) to derivation B, then the derivation path of B is also present in the Nix store. This is also true of any input source files A points to. However, if A points to output path C, C doesn’t have to exist yet—it may be one of the outputs built from B for example. In that case, derivation B will contain an attribute identifying C as one of its output paths (typically, its only output path).

So now that we instantiated foo, we should be able to find both foo and myfile in the Nix store, since the former mentions the latter.

$ ls -l /nix/store/ | grep "foo\|myfile"
-r--r--r--  1 lawrence lawrence       10 Dec 31  1969 xv2iccirbrvklck36f1g7vldn5v58vck-myfile
-r--r--r--  1 lawrence lawrence      368 Dec 31  1969 y4h73bmrc9ii5bxg6i7ck6hsf5gqv8ck-foo.drv

Homework: Note that the foo derivation mentions myfile as a path (an unquoted string involving a ’/ character), which is how Nix figured out it should load the file into the store. What if we had written a quoted string? What if we had left it unquoted but removed the leading ./?) Experiment. Use the garbage collector to reset.

Now why did Nix choose these two filenames? First we will examine the story of myfile, since this file is put into the store first (being a dependency of foo). Since this file is just static data, its story isn’t very complicated.

Loading ./myfile3

Because myfile is a mere data file, having no dependencies, its descriptor is simply the SHA256 hash of its contents. The only thing special is that file is converted into the Nix Archive or NAR format, which is a simple representation as you can see here. The motivations and specification for NAR archives are described in Section 5.2 of Dolstra’s thesis, but essentially it is just a canonical serialization unlike a TAR or ZIP archive.

$ nix-store --dump ./myfile | xxd
00000000: 0d00 0000 0000 0000 6e69 782d 6172 6368  ........nix-arch
00000010: 6976 652d 3100 0000 0100 0000 0000 0000  ive-1...........
00000020: 2800 0000 0000 0000 0400 0000 0000 0000  (...............
00000030: 7479 7065 0000 0000 0700 0000 0000 0000  type............
00000040: 7265 6775 6c61 7200 0800 0000 0000 0000  regular.........
00000050: 636f 6e74 656e 7473 0a00 0000 0000 0000  contents........
00000060: 6d79 636f 6e74 656e 740a 0000 0000 0000  mycontent.......
00000070: 0100 0000 0000 0000 2900 0000 0000 0000  ........).......
$ nix-store --dump ./myfile | sha256sum
2bfef67de873c54551d884fdab3055d84d573e654efa79db3c0d7b98883f9ee3  -
$ nix-hash --type sha256 myfile
2bfef67de873c54551d884fdab3055d84d573e654efa79db3c0d7b98883f9ee3
$ nix-store --dump ./myfile > tmp; nix-hash --type sha256 --flat tmp
2bfef67de873c54551d884fdab3055d84d573e654efa79db3c0d7b98883f9ee3

I used sha256sum here just to show there’s no tricks up my sleeve. Notice that nix-hash will convert the file to an archive for us unless we use --flat. Unfortunately nix-hash can’t read from standard input, hence redirecting to tmp in the third example. sha256sum is also inconvenient because it prints the filename it computed the hash from (above, - means “standard input”). None of these are great for the scripting we show below. Pick your poison.

On to calculating the output path. Nix output paths are calculated from SHA256 hashes, but truncated to 160 bits and expressed in a base-32 format. 4 The input to the hash function is the descriptor of the object appended with metadata describing such things as: * The type of the thing whose store path we are computing (a source file, a .drv object, an output, etc.) * Our choice of hashing function * The location of the Nix store * The name of the thing

In our case, myfile is a source file for derivation foo. The output path hash is the truncated hash of

source:sha256:$MYFILE_DESC:/nix/store:myfile

where $MYFILE_DESC is the descriptor. In detail,

$ MYFILE_DESC=$(nix-hash --type sha256 myfile)
$ echo -n "source:sha256:$MYFILE_DESC:/nix/store:myfile" > tmp
$ MYFILE_PATH=/nix/store/$(nix-hash --type sha256 --truncate --base32 --flat tmp)-myfile
$ echo $MYFILE_PATH
/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile
$ if test -f $MYFILE_PATH; then echo "It worked"; else echo "It failed"; fi
It worked

That’s all there is to myfile. By the way, notice the -n argument to echo, which is necessary to prevent a trailing newline character. Don’t forget this!

If you want to formalize the algorithm it would look something like this.

loadSource:
    doc:
        Load a source file into the Nix store
    inputs:
        a file f
    outputs:
        the store path for f, load source file into Nix store as a side effect
    procedure:
        name <- filename f
        desc <- sha256 (makeNAR f)
        storehash <- truncatedHash(source:sha256:<desc>:/nix/store:<name>")
        storepath <- /nix/store/<storehash>-<name>
        load_in_store(f, <storepath>)
        return <storepath>

Instantiating foo

Now let’s look at foo. Here’s the original derivation expression again.

foo = derivation { system = "x86_64-linux"; builder = ./myfile; name = "foo"; };

Now let’s see what the final derivation file actually contains:

$ nix show-derivation /nix/store/y4h73bmrc9ii5bxg6i7ck6hsf5gqv8ck-foo.drv 
{
  "/nix/store/y4h73bmrc9ii5bxg6i7ck6hsf5gqv8ck-foo.drv": {
    "outputs": {
      "out": {
        "path": "/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo"
      }
    },
    "inputSrcs": [
      "/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile"
    ],
    "inputDrvs": {},
    "platform": "x86_64-linux",
    "builder": "/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile",
    "args": [],
    "env": {
      "builder": "/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile",
      "name": "foo",
      "out": "/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo",
      "system": "x86_64-linux"
    }
  }
}

This is a pretty-printed version. We can also look at the raw file (formatted for readability):

$ cat /nix/store/y4h73bmrc9ii5bxg6i7ck6hsf5gqv8ck-foo.drv 
Derive(
    [ ("out","/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo","","")]
    , []
    , ["/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile"]
    , "x86_64-linux"
    , "/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile"
    , []
    , [ ("builder","/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile")
      , ("name","foo")
      , ("out","/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo")
      , ("system","x86_64-linux")
    ]
)

It’s worth spending a minute examining these. Notice that the printy-printed version is just labelling the fields of the list in the raw .drv file (although the order of inputSrcs and inputDrv fields is swapped from the raw file). Also, the pretty-printer prints the store path of the derivation, but this is not actually present in the derivation file. However, the output path of foo is present. This implies that Nix must be able to determine the output path without having the fully-instantiated contents of the .drv file (since the .drv doesn’t have the output path yet). On the other hand, the derivation store path is calculated from the full contents shown above, as we will see. This implies something that may seem odd at first: the derivation store path depends on the output store path, even though the output will later be computed based on the derivation. Time for a blockquote.

The derivation store path depends on the output store path, even though the output will later be computed based on the derivation.

Anyway, let’s call the derivation file foo.drv. When Nix is evaulating the foo expression during instantiation, it notices the reference to ./myfile. This triggers the process we describe earlier, meaning myfile gets put inside the Nix store and we learn its store path. That store path is then inserted where appropriate (here, the builder attribute, list of input sources, and builder environmental variable). Now the output path of foo is built from a version of foo.drv in which the output string is simply left empty, like so:

# partial foo.drv used to calculate the foo output path
Derive(
    [ ("out","","","")]
    , []
    , ["/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile"]
    , "x86_64-linux"
    , "/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile"
    , []
    , [ ("builder","/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile")
      , ("name","foo")
      , ("out",""")
      , ("system","x86_64-linux")
    ]
)

This file is what I mean by “partial” derivation. The descriptor for the foo output is the SHA256 hash of this file (without extra whitespace). To reproduce this, copy foo.drv to your current directory and delete the output fields.

$ cp /nix/store/y4h73bmrc9ii5bxg6i7ck6hsf5gqv8ck-foo.drv foo.drv
$ chmod +w foo.drv
$ vim -b foo.drv # modify the file by deleting the output path

Caution: The derivation file does not contain a trailing newline character. Many text editors will automatically add one if you open the file and save it again. Here I’m using vim’s -b option to open the file as a binary file, and then I would use :set noeol before saving to prevent this behavior.

Check that you have modified foo.drv appropriately. You should end up with a file with a SHA256 hash that matches this one:

$ sha256sum foo.drv
1bdc41b9649a0d59f270a92d69ce6b5af0bc82b46cb9d9441ebc6620665f40b5  foo.drv

This is the foo descriptor string used to calculate the output store path. (If your hash begins with 9a7fc2e..., you have a trailing newline. Boo!) The output path is the truncated hash of this string:

output:out:sha256:<hash>:/nix/store:foo

The reason we add these metadata fields is because we want to output path to reflect all of these things. In particular, if you have some other foo that differs in any of the data shown so far, it shouldn’t be mixed up with our foo. Here’s the summary:

$ FOO_DESC=$(nix-hash --type sha256 foo.drv --flat)
$ echo -n "output:out:sha256:$FOO_DESC:/nix/store:foo" > tmp
$ FOO_PATH=/nix/store/$(nix-hash --type sha256 --truncate --base32 --flat tmp)-foo
$ echo $FOO_PATH
/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo
$ if [ $(grep $FOO_PATH /nix/store/y4h73bmrc9ii5bxg6i7ck6hsf5gqv8ck-foo.drv -c) -eq 1 ]; \
    then echo "It worked"; else echo "It failed"; fi
It worked

Now that we have calculated the out path of package foo, we have enough information to complete foo.drv as we looked at earlier. Either insert the output path back into your modified foo.drv or recopy the original foo.drv into our local directory for experimentation.

$ cp /nix/store/y4h73bmrc9ii5bxg6i7ck6hsf5gqv8ck-foo.drv foo.drv
$ chmod +w foo.drv

The descriptor for foo.drv is simply the SHA256 hash of foo.drv. As we saw with myfile and the foo output, our store path for foo.drv is computed from this descriptor and some metadata. Here the metadata consists of

  • The type of the thing (here, a simple text blob)
  • The store paths of the dependencies of the object (i.e. myfile)
  • The descriptor of the object, i.e. the hash of foo.drv
  • The derivation name

Specifically, the store path of foo.drv is calculated from the string

text:<path of myfile>:sha256:<hash of foo.drv>:/nix/store:foo.drv

Try it yourself:

$ FOO_DRV_DESC=$(nix-hash --type sha256 foo.drv --flat)
$ echo -n "text:$MYFILE_PATH:sha256:$FOO_DRV_DESC:/nix/store:foo.drv" > tmp
$ FOO_DRV_PATH=/nix/store/$(nix-hash --type sha256 --truncate --base32 --flat tmp)-foo.drv
$ echo $FOO_DRV_PATH
nix/store/y4h73bmrc9ii5bxg6i7ck6hsf5gqv8ck-foo.drv

By the way, notice something. The description of the derivation file, FOO_DRV_DESC, was calculated exactly like the output descriptor FOO_DESC except that the latter reflects the output path and the former doesn’t (since it’s actually an input to the output path). This is something of a coincidence but it plays a role later. Generally, the derivation descriptor is precisely the hash of the final derivation file, but the output descriptor is different. It is calculated from the derivation with an empty output path and by recursively replacing all input derivations (inputDrvs) with the descriptor of their outputs, but calculated with the output path present. Since foo has no input derivations. We’ll walk through an example later, but keep this in the back of your mind.

Fixed Output Derivations

You probably know that Nix supports a type of derivation for which output is known in advance, such as a derivation that downloads a known tarball. Our bar derivation is an example of this. The output store path (but not the derivation store path) needs to be computed differently for such objects, since this time we do not want the output path to depend on the derivation—changes to the derivation shouldn’t matter as long as the output remains the same. In particular, dependencies of bar shouldn’t need to be recompiled in this case, which would happen if the output path of bar changes. See the Nix manual for more information, or wait until our baz example in the next section.

The bar derivation is simulating a derivation that simply outputs myfile—notice the pre-known hash in the bar expression is equal to sha256sum myfile. Of course, like foo, the derivation can’t actually build anything because we didn’t define a real builder, but this doesn’t matter.

$ nix-instantiate -A bar
/nix/store/ymsf5zcqr9wlkkqdjwhqllgwa97rff5i-bar.drv
$ nix show-derivation /nix/store/ymsf5zcqr9wlkkqdjwhqllgwa97rff5i-bar.drv 
{
  "/nix/store/ymsf5zcqr9wlkkqdjwhqllgwa97rff5i-bar.drv": {
    "outputs": {
      "out": {
        "path": "/nix/store/a00d5f71k0vp5a6klkls0mvr1f7sx6ch-bar",
        "hashAlgo": "sha256",
        "hash": "f3f3c4763037e059b4d834eaf68595bbc02ba19f6d2a500dce06d124e2cd99bb"
      }
    },
    "inputSrcs": [],
    "inputDrvs": {},
    "platform": "x86_64-linux",
    "builder": "none",
    "args": [],
    "env": {
      "builder": "none",
      "name": "bar",
      "out": "/nix/store/a00d5f71k0vp5a6klkls0mvr1f7sx6ch-bar",
      "outputHash": "f3f3c4763037e059b4d834eaf68595bbc02ba19f6d2a500dce06d124e2cd99bb",
      "outputHashAlgo": "sha256",
      "outputHashMode": "flat",
      "system": "x86_64-linux"
    }
  }
}

Above we’re looking at the pretty-printed version of the bar derivation. Like last time, we’re trying to understand how Nix came up with the output path of /nix/store/a00d5f71k0vp5a6klkls0mvr1f7sx6ch-bar and the derivation path of /nix/store/ymsf5zcqr9wlkkqdjwhqllgwa97rff5i-bar.drv. And like last time, our starting point is the derivation shown above but output path (outputs.out.path and env.out) set to an empty string, waiting to be filled in. The only thing different for bar is how we calculated the descriptor for the output. Last time we hashed the partially-filled derivation file, but this time we’ll compute it from the known hash, appending a bit of extra metadata first. Namely we hash this string:

fixed:out:sha256:<known hash>:

Then we’ll compute the output path for bar just like normal, appending some more metadata to the descriptor and taking the truncated SHA256 hash of that.

$ BAR_HASH=f3f3c4763037e059b4d834eaf68595bbc02ba19f6d2a500dce06d124e2cd99bb # known constant
$ echo -n "fixed:out:sha256:$BAR_HASH:" > tmp
$ BAR_DESC=$(nix-hash --type sha256 --flat tmp)
$ everything after this is just like normal
$ echo -n "output:out:sha256:$BAR_DESC:/nix/store:bar" > tmp
$ BAR_PATH=/nix/store/$(nix-hash --type sha256 --truncate --base32 --flat tmp)-bar
$ echo $BAR_PATH
/nix/store/a00d5f71k0vp5a6klkls0mvr1f7sx6ch-bar

Finally, we compute the store path of the derivation itself. This also works like last time—hash the derivation (with output path), then append some metadata and take the (truncated) hash of that to compute the store path.

Yes, it’s hashes of hashes of hashes of hashes of hashes. Hashes all the way down.

$ # copy the derivation so we can reproduce calculating the derivation store path
$ cp /nix/store/ymsf5zcqr9wlkkqdjwhqllgwa97rff5i-bar.drv bar.drv
$ BAR_DRV_DESC=$(nix-hash --type sha256 bar.drv --flat)
$ echo -n "text:sha256:$BAR_DRV_DESC:/nix/store:bar.drv" > tmp
$ BAR_DRV_PATH=/nix/store/$(nix-hash --type sha256 --truncate --base32 --flat tmp)-bar.drv
$ echo $BAR_DRV_PATH
nix/store/ymsf5zcqr9wlkkqdjwhqllgwa97rff5i-bar.drv

So besides the calculation of BAR_DESC, instantiating bar is just like instantiating foo.

You might have noticed that BAR_DESC is calculated from a string with a weird ‘:’ at the end:

fixed:out:sha256:f3f3c4763037e059b4d834eaf68595bbc02ba19f6d2a500dce06d124e2cd99bb:

What’s up with the trailing colon? This is a field which would store the output path of bar. We don’t have that information while computing BAR_DESC, since the descriptor is one of the inputs used to calculate it. We mentioned above that when output descriptors are calculated recursively, we replace the list of input derivations with their output descriptors, but on the recursive calls the descriptor is calculated with the output path. That means derivations depending on bar will take the hash of fixed:out:sha256:f3f3c476...:/nix/store/a00d5f71...-bar, but BAR_DESC leaves the output field empty. Try to keep this in the back of your mind.

Recursion

So now we know how to instantiate basic derivations of both the normal and fixed-output type, as well as how to load data files into the Nix store. However our examples have been limited by the fact that foo and bar have no dependencies on other derivations. You should have a few questions:

  • How does this algorithm handle dependencies?
  • Why are these strings calculated the way they are?
  • How does the fixed-output optimization actually work?

Instantiating baz

We’ll try to answer these questions by looking at a more complex example. Let’s instantiate baz, which you can see depends on both foo and bar. We’ll pretty-print the derivation and then retrace Nix’s steps to calculate it. For quick reference, here’s the baz derivation expression again:

baz = derivation { system = "x86_64-linux"; builder = "${foo}/bin/bazbuilder"
    ; args = [ "${bar}/var/bazargs" ]; name = "baz";};

And here’s the derivation, printed pretty.

$ nix-instantiate -A baz
/nix/store/sn57y8p4b19d389gf8n4n06pmamr2wvv-baz.drv
$ nix show-derivation /nix/store/sn57y8p4b19d389gf8n4n06pmamr2wvv-baz.drv
{
  "/nix/store/sn57y8p4b19d389gf8n4n06pmamr2wvv-baz.drv": {
    "outputs": {
      "out": {
        "path": "/nix/store/w3lg0fablf6qkw0hsmznsdajkc1ws631-baz"
      }
    },
    "inputSrcs": [],
    "inputDrvs": {
      "/nix/store/y4h73bmrc9ii5bxg6i7ck6hsf5gqv8ck-foo.drv": [
        "out"
      ],
      "/nix/store/ymsf5zcqr9wlkkqdjwhqllgwa97rff5i-bar.drv": [
        "out"
      ]
    },
    "platform": "x86_64-linux",
    "builder": "/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo/bin/bazbuilder",
    "args": [
      "/nix/store/a00d5f71k0vp5a6klkls0mvr1f7sx6ch-bar/var/bazargs"
    ],
    "env": {
      "builder": "/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo/bin/bazbuilder",
      "name": "baz",
      "out": "/nix/store/w3lg0fablf6qkw0hsmznsdajkc1ws631-baz",
      "system": "x86_64-linux"
    }
  }
}
$ # Make yourself a local copy of the final product
$ cp /nix/store/sn57y8p4b19d389gf8n4n06pmamr2wvv-baz.drv baz.drv
$ chmod +w baz.drv

Remember that the store paths of the dependencies were calculated recursively. We only need to explain where the w3lg0fab... (output path has) and sn57y8p4... (.drv path hash) came from. As in the past, we first calculate the descriptor and store path of the output before using this to compute the final derivation and derivation store path. Like with foo, our starting point will be a partial baz.drv with the output path left blank. Normally this is what we’d hash to get the output descriptor. Instead we hash this:

# Data used to calculate baz output descriptor
Derive(
    [ ("out","","","") ]
    , [ ("ddc42b2d75b1f211d43d085ccd932b35a8dfcea9cd766cf4595a5b4bc73735da",["out"])
    ,   ("dee6f3f1877f934ebb02f67890c5a6283e5f9a6598c5bf53d14e32f35586a7a9",["out"])]
    , []
    , "x86_64-linux"
    , "/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo/bin/bazbuilder"
    , [ "/nix/store/a00d5f71k0vp5a6klkls0mvr1f7sx6ch-bar/var/bazargs"]
    , [ ("builder","/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo/bin/bazbuilder")
      , ("name","baz")
      , ("out","")
      , ("system","x86_64-linux")
      ]
)

What’s notable here is that instead of the inputDrvs content, see two hashes beginning with ddc42b and dee6f3 respectively. What are these? This first happens to be equal to FOO_DRV_DESC.

$ echo $FOO_DRV_DESC
ddc42b2d75b1f211d43d085ccd932b35a8dfcea9cd766cf4595a5b4bc73735da

The second is very similar to BAR_DESC, but calculated with the output path included.

$ echo -n "fixed:out:sha256:$BAR_HASH:$BAR_PATH" > tmp
$ BAR_DESC_OUTPUT=$(nix-hash --type sha256 --flat tmp)
$ echo $BAR_DESC_OUTPUT
dee6f3f1877f934ebb02f67890c5a6283e5f9a6598c5bf53d14e32f35586a7a9

The first of these is essentially a coincidence. The field was calculated exactly as if we were calculating FOO_DESC, but with the output field included. As it happens, this gives us something equal to FOO_DRV_DESC, but this only happens because foo had no derivation dependencies. If it had, bar would have recursively replaced foo input derivations with their output descriptors (calculated with output path included) and that’s what we would hash and place in the data structure above.

This all sounds pretty complicated. Let’s try to formalize the algorithms here. One subroutine will accept a derivation, possibly with an output path, and calculate its descriptor by recursively calculating the descriptors of its outputs. The main routine will accept a partial derivation (with empty output paths) and instantiate it.

computeDescription:
    doc:
        Compute the description of a Nix derivation
    inputs:
        a derivation d
    outputs:
        a descriptor of d
    procedure:
        if d is a fixed-output derivation:
            hash <- d.known_hash
            out <- d.output_path
             # out may be empty when called from instantiateDerivation!
            desc <- sha256(fixed:out:sha256:<hash>:<out>)
            return <desc>
        else:
            for each input_drv_path in (d.input_Drvs):
                # Read the contents of the input derivation
                input_drv <- readFile(input_drv_path)
                input_drv_desc <- computeDescription(input_drv)
                 # replace all references to input with its description
                d <- d {input_drv_path = input_drv_desc}
            desc <- sha256(d)
            return <desc>
instantiateDerivation:
    doc:
        Load a derivation file into the Nix store
    inputs:
        a partially computed derivation d, i.e.
        - output path is left empty
        - some fields require interpolating output paths of our dependencies
    outputs: 
        the store path of d with outpath filled in
        load finished d into Nix store as a side effect
    procedure:
        recursively_instantiate_dependencies()
        # Some fields, e.g. builder, may need information about dependency outputs
        interpolate_dependency_output_paths()
        desc <- computeDescription(d)
        out <- truncatedHash(output:out:sha256:<desc>:/nix/store:<d.name>)
        # Fill in the output path
        d.output_path <- out
        deps <- d.inputSrcs union d.inputDrvs
        drv_desc <- sha256(d)
        drv_path_hash <- truncatedHash(text:<deps>:sha256:<desc>:/nix/store:<d.name>.drv)
        drv_path <- /nix/store/<drv_path_hash>-<d.name>.drv
        load_in_store(d, <drvpath>)
        return <drvpath>

For now, let’s finish instantiating baz like normal.

$ echo -n 'Derive([("out","","","")],[("ddc42b2d75b1f211d43d085ccd932b35a8dfcea9cd766cf4595a5b4bc73735da",["out"]),("dee6f3f1877f934ebb02f67890c5a6283e5f9a6598c5bf53d14e32f35586a7a9",["out"])],[],"x86_64-linux","/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo/bin/bazbuilder",["/nix/store/a00d5f71k0vp5a6klkls0mvr1f7sx6ch-bar/var/bazargs"],[("builder","/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo/bin/bazbuilder"),("name","baz"),("out",""),("system","x86_64-linux")])' > baz_partial.drv
$ BAZ_DESC=$(nix-hash --type sha256 baz_partial.drv --flat)
$ echo -n "output:out:sha256:$BAZ_DESC:/nix/store:baz" > tmp
$ BAZ_PATH=/nix/store/$(nix-hash --type sha256 --truncate --base32 --flat tmp)-baz
$ echo $BAZ_PATH
/nix/store/w3lg0fablf6qkw0hsmznsdajkc1ws631-baz
$ cp baz_partial.drv baz.drv # now modify baz.drv to include the output string and .drv paths OR copy it
$ cp /nix/store/sn57y8p4b19d389gf8n4n06pmamr2wvv-baz.drv baz.drv
$ BAZ_DRV_DESC=$(nix-hash --type sha256 baz.drv --flat)
$ echo -n "text:$FOO_DRV_PATH:$BAR_DRV_PATH:sha256:$BAZ_DRV_DESC:/nix/store:baz.drv" > tmp
$ BAZ_DRV_PATH=/nix/store/$(nix-hash --type sha256 --truncate --base32 --flat tmp)-baz.drv
$ echo $BAZ_DRV_PATH
/nix/store/sn57y8p4b19d389gf8n4n06pmamr2wvv-baz.drv

Instantiating zap

From here, this should be enough information to compute the store paths of zap. I’ll just dump enough information to reproduce what Nix does, and you can try to see why everything makes sense.

$ nix-instantiate -A zap
/nix/store/9m038wks299zzr1padmra96xnyiqcaxq-zap.drv
$ echo -n 'Derive([("out","/nix/store/w3lg0fablf6qkw0hsmznsdajkc1ws631-baz","","")],[("ddc42b2d75b1f211d43d085ccd932b35a8dfcea9cd766cf4595a5b4bc73735da",["out"]),("dee6f3f1877f934ebb02f67890c5a6283e5f9a6598c5bf53d14e32f35586a7a9",["out"])],[],"x86_64-linux","/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo/bin/bazbuilder",["/nix/store/a00d5f71k0vp5a6klkls0mvr1f7sx6ch-bar/var/bazargs"],[("builder","/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo/bin/bazbuilder"),("name","baz"),("out","/nix/store/w3lg0fablf6qkw0hsmznsdajkc1ws631-baz"),("system","x86_64-linux")])' > tmp
$ BAZ_DESC_OUTPUT=$(nix-hash --type sha256 --flat tmp)
$ echo $BAZ_DESC_OUTPUT 
7a9606da57892b43a1bde881fa190c85027e13dd58de321472195d6a784355c6
$ echo -n 'Derive([("out","","","")],[("7a9606da57892b43a1bde881fa190c85027e13dd58de321472195d6a784355c6",["out"]),("ddc42b2d75b1f211d43d085ccd932b35a8dfcea9cd766cf4595a5b4bc73735da",["out"]),("dee6f3f1877f934ebb02f67890c5a6283e5f9a6598c5bf53d14e32f35586a7a9",["out"])],["/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile"],"x86_64-linux","/nix/store/w3lg0fablf6qkw0hsmznsdajkc1ws631-baz/bin/zapbuilder",["/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile","/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo/arg1","/nix/store/a00d5f71k0vp5a6klkls0mvr1f7sx6ch-bar/arg2"],[("builder","/nix/store/w3lg0fablf6qkw0hsmznsdajkc1ws631-baz/bin/zapbuilder"),("name","zap"),("out",""),("system","x86_64-linux")])' > tmp
$ ZAP_DESC=$(nix-hash --type sha256 --flat tmp)
$ echo -n "output:out:sha256:$ZAP_DESC:/nix/store:zap" > tmp
$ ZAP_PATH=/nix/store/$(nix-hash --type sha256 --truncate --base32 --flat tmp)-baz
$ echo $ZAP_PATH
/nix/store/c8frqbckra241rkj2l075z2481wb9pvf-zap
$ echo -n 'Derive([("out","/nix/store/c8frqbckra241rkj2l075z2481wb9pvf-zap","","")],[("/nix/store/sn57y8p4b19d389gf8n4n06pmamr2wvv-baz.drv",["out"]),("/nix/store/y4h73bmrc9ii5bxg6i7ck6hsf5gqv8ck-foo.drv",["out"]),("/nix/store/ymsf5zcqr9wlkkqdjwhqllgwa97rff5i-bar.drv",["out"])],["/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile"],"x86_64-linux","/nix/store/w3lg0fablf6qkw0hsmznsdajkc1ws631-baz/bin/zapbuilder",["/nix/store/xv2iccirbrvklck36f1g7vldn5v58vck-myfile","/nix/store/hs0yi5n5nw6micqhy8l1igkbhqdkzqa1-foo/arg1","/nix/store/a00d5f71k0vp5a6klkls0mvr1f7sx6ch-bar/arg2"],[("builder","/nix/store/w3lg0fablf6qkw0hsmznsdajkc1ws631-baz/bin/zapbuilder"),("name","zap"),("out","/nix/store/c8frqbckra241rkj2l075z2481wb9pvf-zap"),("system","x86_64-linux")])' > tmp
$ ZAP_DRV_DESC=$(nix-hash --type sha256 --flat tmp)
$ echo $ZAP_DRV_DESC 
41eb6445f62621e29d38b3207c63423a78feccd79c670e40f16d310ee0215948
$ echo -n "text:$BAZ_DRV_PATH:$MYFILE_PATH:$FOO_DRV_PATH:$BAR_DRV_PATH:sha256:$ZAP_DRV_DESC:/nix/store:zap.drv" > tmp
$ ZAP_DRV_PATH=/nix/store/$(nix-hash --type sha256 --truncate --base32 --flat tmp)-zap.drv
$ echo $ZAP_DRV_PATH
/nix/store/9m038wks299zzr1padmra96xnyiqcaxq-zap.drv

Final thoughts

Q: How does the fixed-output optimization work?

A: The output path of a fixed-output derivation A is not dependent on the derivation file, only the known hash of the output. The output path of a downstream derivation B is calculated from the A descriptor calculated with the A output path, neither of which depends on the derivation for A. Therefore B’s output path is not affected by a change in A either (provided the fixed output is constant). Likewise another downstream derivation C’s output path is not affected. Notice that the contents and store paths of downstream derivations (.drv files) are affected by a change in A’s derivation, but these are not expensive to recompute.

Here’s a table with some of the hashes that are involved in these computations.

Origin Description Value Calculated From Hash Algorithm
myfile sha256 f3f3c476 simple hash SHA256
myfile descriptor 2bfef67d hash as NAR archive SHA256
myfile store path xv2iccir source:sha256:2bfef6…:/nix/store:myfile Truncated Base32 SHA256
foo descriptor 1bdc41b9 hash of derivation file without foo out path SHA256
foo out path hs0yi5n5 output:out:sha256:1bdc41b9…:/nix/store:foo Truncated Base32 SHA256
foo .drv descriptor ddc42b2d hash of final derivation file SHA256
foo .drv path hash y4h73bmr text:/nix/store/xv2iccir…-myfile:sha256:ddc42b2d…:/nix/store:foo.drv Truncated Base32 SHA256
foo descriptor w/ output ddc42b2d output descriptor calculated with output field included (equal to .drv descriptor by coincidence) SHA256
bar descriptor 423e6fde fixed:out:sha256:f3f3c476…: SHA256
bar path a00d5f71 output:out:sha256:423e6fde…:/nix/store:bar Truncated Base32 SHA256
bar .drv descriptor dbc6984b hash of final derivation file SHA256
bar .drv path hash ymsf5zcq text:sha256:dbc6984b…:/nix/store:bar.drv Truncated Base32 SHA256
bar descriptor w/ output dee6f3f18 fixed:out:sha256:f3f3c476:/nix/store/a00d5f71 SHA256
baz descriptor 74714a18d hash of .drv with empty out path, foo descriptor w/ output, bar descriptor w/ output SHA256
baz out path w3lg0fabl output:out:sha256:74714a18d…:/nix/store:baz Truncated Base32 SHA256
baz .drv descriptor 8183fd963 hash of final derivation file SHA256
baz .drv path hash sn57y8p4b text:/nix/store/y4h73bmr…-foo.drv:/nix/store/ymsf5zcq…-bar.drv:sha256:8183fd963…:/nix/store:baz.drv Truncated Base32 SHA256
baz descriptor w/ output 7a9606da5 output descriptor calculated with output field included SHA256

  1. Acshually, evaluation and instantiation are intertwined. Evaluating a derivation triggers its instantiation as a side effect. In fact, a derivation cannot be fully evaluated until after performing the calculations necessary to instantiate its dependencies, such as computing their store paths.↩︎

  2. If at any point you want to start over with these examples, just run nix-collect-garbage and these will be removed from the store (since we never register these derivations as gc roots).↩︎

  3. Nix doesn’t use the term “instantiating” to describe loading source files into the store, but the process here is roughly the same as the instantiations we examine.↩︎

  4. This “truncation” is more complicated than simply dropping bytes. For details on truncation and the particular base-32 scheme, see Section 5.1 of Dolstra’s PhD thesis.↩︎