hckrnws
> Our most important finding is that the reproducibility rate in nixpkgs has increased steadily from 69% in 2017 to about 91% in April 2023. The high reproducibility rate in our most recent revision is quite impressive, given both the size of the package set and the absence of systematic monitoring in nixpkgs. We knew that it was possible to achieve very good reproducibility rate in smaller package sets like Debian, but this shows that achieving very high bitwise reproducibility is possible at scale, something that was believed impossible by practitioners4
I think people in this thread are focusing on the wrong thing. Sure, not all packages are reproducible, but the project is systematically increasing the percentage of projects that are reproducible while ALSO adding new projects and demonstrating conclusively that what was considered infeasible is actually readily achievable.
> The interesting aspect of these causes is that they show that even if nixpkgs already achieves great reproducibility rates, there still exists some low hanging fruits towards improving reproducibility that could be tackled by the Nix community and the whole FOSS ecosystem.
This work is helpful I think for the community to tackle the sources of unreproducible builds to push the percentage up even further. I think it also highlights the need for automation to validate that there aren't systematic regressions or regressions in particularly popular packages (doing individual regressions for all packages is a futile effort unless a lot of people volunteer to be part of a distributed check effort).
What's even crazier is that Nix builds are this reproducible for free. Like, joe random developer can:
nix build nixpkgs#vim
nix build nixpkgs#vim --rebuild
The first invocation will substitute binaries, and the second will rebuild those locally and validate the bit for bit reproducibility of the results.In Debian there is significant ceremony and special tools/wrappers required to set up the reproducible environment, so no one would bother to use it unless they were specifically working on the https://wiki.debian.org/ReproducibleBuilds initiative.
Some interesting related stats from Debian also show good reproducibility progress
https://tests.reproducible-builds.org/debian/reproducible.ht...
I think this debate comes down to exactly what "reproducible" means. Nix doesn't give bit-exact reproducibility, but it does give reproducible environments, by ensuring that the inputs are always bit-exact. It is closer to being fully reproducible than most other build systems (including Bazel) -- but because it can only reasonably ensure that the inputs are exact, it's still necessary for the build processes themselves to be fully deterministic to get end-to-end bit-exactness.
Nix on its own doesn't fully resolve supply chain concerns about binaries, but it can provide answers to a myriad of other problems. I think most people like Nix reproducibility, and it is marketed as such, for the sake of development: life is much easier when you know for sure you have the exact same version of each dependency, in the exact same configuration. A build on one machine may not be bit-exact to a build on another machine, but it will be exactly the same source code all the way down.
The quest to get every build process to be deterministic is definitely a bigger problem and it will never be solved for all of Nixpkgs. NixOS does have a reproducibility project[1], and some non-trivial amount of NixOS actually is properly reproducible, but the observation that Nixpkgs is too vast is definitely spot-on, especially because in most cases the real issues lie upstream. (and carrying patches for reproducibility is possible, but it adds even more maintainer burden.)
> The quest to get every build process to be deterministic [...] will never be solved for all of Nixpkgs.
Not least because of unfree and/or binary-blob packages that can't be reproducible because they don't even build anything. As much as Guix' strict FOSS and build-from-source policy can be an annoyance, it is a necessary precondition to achieve full reproducibility from source, i.e. the full-source bootstrap.
Nixpkgs provides license[1] and source provenance[2] information. For legal reasons, Nix also defaults to not evaluating unfree packages. Not packaging them at all, though, doesn't seem useful from any technical standpoint; I think that is purely ideological.
In any case, it's all a bit imperfect anyway, since it's from the perspective of the package manager, which can't be absolutely sure there's no blobs. Anyone who follows Linux-libre releases can see how hard it really is to find all of those needles in the haystack. (And yeah, it would be fantastic if we could have machines with zero unfree code and no blobs, but the majority of computers sold today can't meaningfully operate like that.)
I actually believe there's plenty of value in the builds still being reproducible even when blobs are present: you can still verify that the supply chain is not compromised outside of the blobs. For practical reasons, most users will need to stick to limiting the amount of blobs rather than fully eliminating them.
[1]: https://nixos.org/manual/nixpkgs/stable/#sec-meta-license
[2]: https://nixos.org/manual/nixpkgs/stable/#sec-meta-sourceProv...
you can slap a hash on a binary distribution and it becomes "reproducible" in the same trivial sense as any source tarball. after that, the reproducibility of whatever "build process" takes place to extract archives and shuffle assets around is no more or less fraught than any other package (probably less considering how much compilers have historically had to be brought to heel, especially before reproducibility was fashionable enough for it to enter much into compiler authors' consideration!!)
I'm curious, why couldn't packages that are fully reproduceable be marked with metadata, and in your config you set a flag to only allow reproduceable packages? Similar to the nonfree tag.
Then you'd have a 100% reproduceable OS if you have the flag set (assuming that required base packages are reproduceable)
You could definitely do that, I think the main thing stopping anyone is simply lack of demand for that specific feature. That, and also it might be hard to keep track of what things are properly reproducible; you can kind of only ever prove for sure that a package is not reproducible. It could be non-deterministic but only produce differences on different CPUs or an infinitesimally small percentage of times. Actually being able to assure determinism would be pretty amazing although I don't know how that could be achieved.
I assume it would be somewhat of a judgement call. I mean that is the case with nonfree packages as well - licenses and whatnot have to be evaluated. I assume that there are no cases of non-trivially large software packages in the wild that have been formally proven to be reproducible, but I could be wrong.
> It is closer to being fully reproducible than most other build systems (including Bazel).
How so? Bazel produces the same results for the same inputs.
Bazel doesn't guarantee bit-exact outputs, but also Bazel doesn't guarantee pure builds. It does have a sandbox that prevents some impurities, but for example it doesn't prevent things from going out to the network, or even accessing files from anywhere in the filesystem, if you use absolute paths. (Although, on Linux at least, Bazel does prevent you from modifying files outside of the sandbox directory.)
The Nix sandbox does completely obscure the host filesystem and limit network access to processes that can produce a bit-exact output only.
(Bazel also obviously uses the system compilers and headers. Nix does not.)
> Bazel also obviously uses the system compilers and headers. Nix does not.
Bazel allows hermetic toolchains, and uses it for most languages: Java, Python, Go, Rust, Node.js, etc. You can do the same for C++, but Bazel doesn't provide that out-of-the-box. [1]
Bazel sandboxing can restrict system access on Linux with --experimental_use_hermetic_linux_sandbox and --sandbox_add_mount_pair. [2]
Every "reproducible builds" discussion requires an understand of what is permitted to vary. E.g. Neither Nix nor Bazel attempts to make build products the same for x86 host environments vs ARM host environments. Bazel is less aggressive than Nix in that it does not (by default) attempt to make build products the same for different host C++ compilers.
[1] https://github.com/bazelbuild/bazel/discussions/18332
[2] https://bazel.build/reference/command-line-reference#flag--e...
Uh, Either my understanding of Bazel is wrong, or everything you wrote is wrong.
Bazel absolutely prevents network access and filesystem access (reads) from builds. (only permitting explicit network includes from the WORKSPACE file, and access to files explicitly depended on in the BUILD files).
Maybe you can write some “rules_” for languages that violate this, but it is designed purposely to be hermetic and bit-perfect reproducible.
EDIT:
From the FAQ[0]:
> Will Bazel make my builds reproducible automatically?
> For Java and C++ binaries, yes, assuming you do not change the toolchain.
The issues with Docker's style of "reproducible" (meaning.. consistent environment; are also outlined in the same FAQ[1]
> Doesn’t Docker solve the reproducibility problems?
> Docker does not address reproducibility with regard to changes in the source code. Running Make with an imperfectly written Makefile inside a Docker container can still yield unpredictable results.
[0]: https://bazel.build/about/faq#will_bazel_make_my_builds_repr...
[1]: https://bazel.build/about/faq#doesn’t_docker_solve_the_repro...
I think you're both right in a sense. Bazel doesn't (in general) prevent filesystem access, e.g. to library headers in /usr/include. If those headers change (maybe because a Debian package got upgraded or whatever), Bazel won't know it has to invalidate the build cache. I think the FAQ is still technically correct because upgrading the Debian package for a random library dependency counts as "chang[ing] the toolchain" in this context. But I don't think you'd call it hermetic by default.
Check out the previous discussion at https://news.ycombinator.com/item?id=23184843 and below:
> Under the hood there's a default auto-configured toolchain that finds whatever is installed locally in the system. Since it has no way of knowing what files an arbitrary "cc" might depend on, you lose hermeticity by using it.
I believe your understanding of Bazel is wrong. I don't see any documentation that suggests the Bazel sandbox prevents the toolchain from accessing the network.
https://bazel.build/docs/sandboxing
(Actually, it can: that documentation suggests it's optionally supported, at least on the Linux sandbox. That said, it's optional. There's definitely actions that use the network on purpose and can't participate in this.)
This may seem pointless, because in many situations this would only matter in somewhat convoluted cases. In C++ the toolchain probably won't connect to the network. This isn't the case for e.g. Rust, where proc macros can access the network. (In practical terms, I believe the sqlx crate does this, connecting to a local Postgres instance to do type inference.) Likewise, you could do an absolute file inclusion, but that would be very much on purpose and not an accident. So it's reasonable to say that you get a level of reproducibility when you use Bazel for C++ builds...
Kind of. It's not bit-for-bit because it uses the system toolchain, which is just an arbitrary choice. On Darwin it's even more annoying: with XCode installed via Mac App Store, the XCode version can change transparently under Bazel in the background, entirely breaking the hermeticity, and require you to purge the Bazel cache (because the dependency graph will be wrong and break the build. Usually.)
Nix is different. The toolchain is built by Nix and undergoes the same sandboxed build process with sandboxing and cryptographically verified inputs. Bazel does not do that.
It does.
There are mechanisms for opting out/breaking that, just as with Nix or any other system.
> macOS
What does nix do on these systems?
Opt-out would be one thing, but it's actually opt-in for network isolation, and a project can disable all sandboxing with just a .bazelrc. Nix does have ways to opt-out of sandboxing, but you can't do it inside a Nix expression: if you ran Nix with sandbox = true, anything being able to escape or bypass the sandbox restrictions would be a security vulnerability and assigned a CVE. Disabling the sandbox can only be done by a trusted user, and it's entirely out-of-band from the builder. For Bazel, the sandbox is mostly just there to prevent accidental impurities, but it's not water tight by any means.
Ultimately, I still think that Nix provides a greater degree of isolation and reproducibility than Bazel overall, and especially out of the box, but I was definitely incorrect when I said that Bazel's sandbox doesn't/can't block the network. I did dive a little deeper into the nuances in another comment.[1]
> What does nix do on these systems?
On macOS, Nix is not exactly as solid as it is on Linux. It uses sandbox-exec for sandboxing, which achieves most of what the Nix sandbox does on Linux, except it disallows all networking rather than just isolated networking. (Some derivations need local network access, so you can opt-in to having local network access per-derivation. This still doesn't give Internet access, though: internet access still requires a fixed-output derivation.) There's definitely some room for improvement there but it will be hard to do too much better since xnu doesn't have anything similar to network namespaces afaik.
As for the toolchain, I'm not sure how the Nix bootstrap works on macOS. It seems like a lot of effort went in to making it work and it can function without XCode installed. (Can't find a source for this, but I was using it on a Mac Mini that I'm pretty sure didn't have XCode installed. So it clearly has its own hermetic toolchain setup just like Linux.)
> it's actually opt-in for network isolation
Bazel enables sandboxing by default, including network isolation. [1] [2]
The exception would be in environments that don't support it (Windows, unprivileged Docker container, etc.)
My assertion that network isolation is opt-in is based on the fact that the --sandbox_default_allow_network defaults to true[1]. That suggests actions will have networking unless they are dispatched with `block-network`[2].
(It's hard to figure out exactly what's going on based on the documentation and some crawling around, but I wouldn't be surprised if specifically tests defaulted to blocking the network.)
[1]: https://bazel.build/reference/command-line-reference#flag--s...
[2]: https://bazel.build/reference/be/common-definitions#common.t...
AFAIK Bazel does not use the sandbox by default. Last time I experimented with it, the sandbox had some problematic holes, but I don’t remember exactly what, and it’s been a few years.
The very doc you link hints at that, while also giving many caveats where the build will become non-reproducible. So it boils down to “yes, but only if you configure it correctly and do things right”.
Yeah, I think you are right: by default, there is no OS-level sandboxing going on. According to documentation, the default spawn strategy is `local`[1], whereas it would need to be `sandboxed` for sandboxing to take effect.
Meanwhile, if you want to forcibly block network access for a specific action, you can pass `block-network` as an execution requirement[2]. You can also explicitly block network access with flags, using --nosandbox_default_allow_network[3]. Interestingly though, an action can also `require-network` to bypass this, and I don't think there's any way to account for that.
Maybe more importantly, Bazel lacks the concept of a fixed-output action, so when an impure action needs `require-network` the potentially-impure results could impact downstream dependents of actions.
I was still ultimately incorrect to say that Bazel's sandbox can't sandbox the network. The actual reality is that it can. If you do enable the sandbox, while it's not exactly pervasive through the entire ecosystem, it does look like a fair number of projects at least set the `block-network` tag--about 700 as of writing this[4]. I think the broader point I was making (that Nix adheres to a stronger standard of "hermetic" than Bazel) is ultimately true, but I did miss on a bit of nuance initially.
[1]: https://bazel.build/docs/user-manual#spawn-strategy
[2]: https://bazel.build/reference/be/common-definitions#common.t...
[3]: https://bazel.build/reference/command-line-reference#flag--s...
[4]: https://github.com/search?q=language%3Abzl+%22block-network%...
I remember that a system nagged about non-reproducible outputs, Blaze (not Bazel, but the internal thing) allowed looking into the outside-world through bad Starlark rules and compile time tricks could get you questioning why there's so much evil in the world.
Maybe Bazel forbid these things right away and Googlers actually talking about Blaze will be inadvertently lying thinking they are similar enough.
I'm not familiar with Bazel at all so this might be obvious, but does Bazel check that the files listed in the BUILD file are the "right ones" (ex. through a checksum), and if so, is this always enforced (that is, this behavior cannot be disabled)?
The contents of files are basically hashed, if the contents don't change of the file listed for a target then no change will happen, even if you modify metadata of the file (like last modified time by `touch` or so on.)
Bazel is really sophisticated and I'd be lying if I said I understood it well, but I have spent time looking at it.
I think talking about sandboxes is missing a point a bit.
It's an important constituent, but only complete OS-emulation with deterministic scheduling could (at a huge overhead) actually result in bit-by-bit reproducible artifacts with arbitrary build steps.
There are an endless source of impurities/randomness and most compilers haven't historically cared much about this.
The point I'm making is that neither Bazel nor Nix do that. However, sandboxing is still relevant, because if you still have impurities leaking from outside the closure of the build, you have bigger fish to fry than non-deterministic builds.
That all said, in practice, many of the cases where Nixpkgs builds are not deterministic are actually fairly trivial. Despite not being a specific goal necessarily, compilers are more deterministic than not, and in practice the sources of non-determinism are fewer than you'd think. Case in point, I'm pretty sure the vast majority of Nixpkgs packages that are bit-for-bit reproducible just kind of are by accident, because nothing in the build is actually non-deterministic. Many of the cases of non-deterministic builds are fairly trivial, such as things just linking in different orders depending on scheduling.
Running everything under a deterministic VM would probably be too slow and/or cumbersome, so I think Nix is the best it's going to get.
Sandboxing is relevant, but nix does that by default, so no difference here.
Nonetheless, I agree that Nix does the optimum here, full-on emulation would be prohibitively expensive.
You know, though, it would probably be possible to develop a pretty fast "deterministic" execution environment if you just limit execution to a single thread, still not resorting to full emulation. You'd still have to deal with differences between CPUs, but it would probably not be nearly as big of an issue. And it would slow down builds, but on the other hand, you can do a lot of them in parallel. This could be pretty interesting if you combined it with trying to integrate build system output directly into the Nix DAG, because then you could get back some of the intra-build parallelism, too. Wouldn't be applicable for Nixpkgs since it would require ugly IFD hacks, but might be interesting for a development setup.
Perhaps it's an area worth researching.
I don't know, concurrency is still on the table, especially that the OS also has timing events.
Say, a compiler uses multiple threads and even if you assign some fixed amount of fuel to each thread, and mandate that after n instructions, thread-2 must follow for another n, how would that work with, say, a kernel interrupt? Would that be emulated completely only at given fixed times?
But I do like the idea of running multiple builds in parallel to not take as big of a hit from single-threaded builds, though it would only increase throughput not latency.
> There are an endless source of impurities/randomness and most compilers haven't historically cared much about this.
The point is that Nix will catch a lot more of them than Bazel does, since Nix manages the toolchain used to build, whereas Bazel just runs the host system cc.
> It's an important constituent, but only complete OS-emulation with deterministic scheduling could (at a huge overhead)
This does actually exist; check out antithesis's product. I'm not sure how much is public information but their main service is a deterministic (...I'm not sure to what extent this is true, but that was the claim I heard) many-core vm on which otherwise difficult testing scenarios can be reproduced (clusters, databases, video games, maybe even kernels?) to observe bugs that only arise in extremely difficult to reproduce circumstances.
It does seem like overkill just to get a marginally more reproducible build system, though.
No, most compilers are not themselves reproducible, even within very restrictive sandboxes (e.g. they may do some work concurrently and collect the results based on when it completes, then build on top of that. If they don't add a timing-insensitive sorting step, the resulting binary will (assuming no bugs) be functionally equivalent, but may not be bit-by-by equal), and a build tool can only do so much.
What are the common issues besides timestamps?
Compiler executing internal work concurrently and merging at the end. Thread scheduling changes will cause a different output ordering.
How this article discusses reproducibility in NixOS and declines to even mention the intensional model or efforts to implement it are surprising to me, since it appears they have done a lot of research into the matter.
If you don’t know, the intensional model is an alternative way to structure the NixOS store so that components are content-addressable (store hash is based on the targets) as opposed to being addressed based on the build instructions and dependencies. IIUC, the entire purpose of the intensional model is to make Nix stores shareable so that you could just depend on Cachix and such without the worry of a supply-chain attack. This approach was an entire chapter in the Nix thesis paper (chapter 6) and has been worked on recently (see https://github.com/NixOS/rfcs/pull/62 and https://github.com/NixOS/rfcs/pull/17 for current progress).
I think it would have been a good thing to mention, but difficult to do well in more than a quick reference or sidenote and could easily turn into a extensive detour. I'm saying this as someone who's working on exactly that topic. There is a little bit of overlap between the kind of quantitative work that they do and this design aspect: the extensional model leaves the identity of direct dependencies not entirely certain. In practice that means we don't know if they built direct dependencies from source or substituted them from cache.nixos.org, but this exact concern also applies to cache.nixos.org itself.
The intensional store makes the store shareable without also sharing trust relationships ('kind of trustless' in that sense), but only because it moves trust relationships out of the store, not because it gets rid of them. You still need to trust signatures which map an hash of inputs to a hash of the output, just like in the extensional model. You can however get really powerful properties for supply chain security from the intensional store model (and a few extra things). You can read about that in this recent paper of mine: https://dl.acm.org/doi/10.1145/3689944.3696169. I'm still working on this stuff and trying to find ways to get that work funded (see https://groundry.org/).
You still need to trust something though. It's just that instead of trusting the signing of the binaries themselves, you trust the metadata that maps input hashes (computed locally) to content hashes (unknown until a build occurs).
The real win with content addressing in Nix is being able to proactively dedupe the store and also cut off rebuild cascades, like if you have dependency chain A -> B -> C, and A changes, but you can demonstrate that the result of B is identical, then there's no longer a need to also rebuild C. With input addressing, you have to rebuild everything downtree of A when it changes, no exceptions.
Is B remaining the same something that happens often enough for it to matter?
I haven’t studied it, but yes I would imagine so. For example if a python build macro changes but the sphinx output remains unchanged, you get out of rebuilding thousands of packages that throw off sphinx docs as part of their build.
> Our most important finding is that the reproducibility rate in nixpkgs has increased steadily from 69% in 2017 to about 91% in April 2023. The high reproducibility rate in our most recent revision is quite impressive, given both the size of the package set and the absence of systematic monitoring in nixpkgs.
That's one way to read the statistic. Another way you could read the graph is that they still have about the same number (~5k) of non-reproducible builds, which has been pretty constant over the time period. Adding a bunch of easily reproducible additional builds maybe doesn't make me believe it's solving the original issues.
> We knew that it was possible to achieve very good reproducibility rate in smaller package sets like Debian, but this shows that achieving very high bitwise reproducibility is possible at scale, something that was believed impossible by practitioners.
Maybe I miss some nuance here, but why is Debian written off as being so much smaller scale? The top end of the graph here suggests a bit over 70k packages, Debian apparently also currently has 74k packages available (https://www.debian.org/doc/manuals/debian-reference/ch02.en....); I guess there's maybe a bit of time lag here but I'm not sure that is enough to claim Debian is somehow not "at scale".
According to https://tests.reproducible-builds.org/debian/reproducible.ht... (which is what the article links to btw) there are ~37k packages tracked for reproducible builds which is ~2.7x smaller than Nix's 100k packages.
This is not really a Nix-issue to begin with.
It's a bit like asking what percentage of Nix-packaged programs have Hungarian translation -- if Nix packages some more stuff the rate might decrease, but it's not Nix's task to add that to the programs that lack it.
Nix does everything in its power to provide a sandboxed environment in which builds can happen. Given the way hardware works, there are still sources of non-determinism that are impossible to prevent, most importantly timing. Most programs depend on it, even compilers, and extra care should be taken by them to change that. The only way to prevent it would be to go full-on CPU and OS emulation, but that would be prohibitively expensive.
> The only way to prevent it would be to go full-on CPU and OS emulation, but that would be prohibitively expensive.
How so?
For ref I used a Gentoo distcc chroot inside a devuan VM to bootstrap gentoo on a 2009 netbook. It worked fine. I did this around Halloween.
A compiler invoked twice on the same source file is not mandated to produce the same binary, but it should produce a binary with the same functionality.
There are infinite number of binaries that do the same thing (e.g. just padding random zeros in certain places wouldn't cause a functional problem).
Nix is very good at doing functionally reproducible builds, that's its whole thing. But there are build steps which are simply not deterministic, and they might produce correct, but not always the same outputs.
OS scheduling is non-deterministic, and there are quite a few things that are sensitive to the order of operations (simplest example: floating point addition). Is you want to guarantee determinism, not just provide it on a best effort basis for things that are willing to cooperate, the only way do that is to put everything into a fully deterministic emulator, which is terribly slow.
is SCHED_FIFO or SCHED_RR or the newer incarnations viable? Is it possible to make QEMU deterministic? a quick glance at the realtime schedulers state source code needs to take that into consideration.
I know it seems like i don't know what i am talking about, often, on here. The reason is i don't live on HN so i type a comment and rarely remember the words i want to use before the edit/delete window closes.
The idea that i can't control when things occur during a compilation seems suspect. Is there a certain "code size" or other bellwether where this non-determinism starts to crop up? I ask because i get the feeling if i start compiling trivial stuff it will all be bit-perfect and if i say so i'll get lambasted for "well obviously trivial stuff can be reproducible," so i am heading that off at the pass, first.
Are they mostly the same 5k packages as 2017?
That seems to be the crux of it.
Although I'm aware many distros care somewhat about reproducible builds these days, I tend to associate it primarily with Guix System, I never really considered it a feature of NixOS, having used both (though spent much more time on Guix System now).
For the record, even in the land of Guix I semi-regularly see reports on the bug-guix mailing list that some package isn't reproducible. It seems to get treated as a bug and fixed then. With that in mind, and personally considering Guix kind of the flagship of these efforts, it doesn't surprise me if anyone else doesn't have perfectly reproducible builds yet either. Especially Nix with the huge number of things in nixpkgs. It's probably easier for stuff to fall through the cracks with that many packages to manage.
I'll repeat my comment from last time this came up.[0]
I could be wrong (and I probably am) but I feel like the term "reproducible build" has shifted/solidified since 2006 when Dolstra's thesis was first written (which itself doesn't really use that term all that much). As evidence the first wikipedia page on "Reproducible builds" seems to have appeared in 2016, a decade after Dolstra's thesis, and even that stub from 2016 appears to prefer to use the term "Deterministic compilation".
Anyhow, when the Nix project originally spoke about "reproducible builds", what I understood was meant by that term was "being able to repeat the same build steps with the same inputs". Because of the lack of determinstic compilation, this doesn't always yield bit-by-bit identical outputs, but are simply presumed to be "functionally identical". There is, of course, no reason to believe that they will necessarily be functionally identical, but it is what developers take for granted every day, and if otherwise would be considered a bug somewhere in the package.
With Nix, when some software "doesn't work for me, but works for you", we can indeed recursively compare the nix derivation files locating and eliminating potential differences, a debugging process I have used on occasion.
I agree that "reproducible builds" now means something different, but that isn't exactly the fault of Nix advocates. I guess a new term for "being able to repeat the same build steps with the same inputs" is needed.
> There is, of course, no reason to believe that they will necessarily be functionally identical, but it is what developers take for granted every day, and if otherwise would be considered a bug somewhere in the package.
Yes, the only possible differences result from either a compiler bug or a program bug that depends on undefined behaviour, in which case "anything can happen" as they say. As others have noted, parallel compilation depends on non-deterministic thread-scheduling, so this non-determinism can't be solved unless you restrict all compilation to be single-threaded. It's still not the only possible source of non-determinism though.
> I agree that "reproducible builds" now means something different, but that isn't exactly the fault of Nix advocates. I guess a new term for "being able to repeat the same build steps with the same inputs" is needed.
I've usually seen "repeatable builds" used for that.
I think you want to link to https://news.ycombinator.com/item?id=41956044.
It's massively closer than any other solution in this regard (nods to other Nix-inspired distros like Guix, Lix, etc.)
Honestly, I believe every software developer owes it to themselves to read the original Nix paper. It's quite digestible and lays out a lot of what it brings to the table. I came away from it wondering why it took so long to realize it... which is a property I've found true of every new important discovery.
https://edolstra.github.io/pubs/nspfssd-lisa2004-final.pdf
If you want, you can even ask an LLM to sum up its main points for you. Or to sell it to you. =)
https://chatgpt.com/share/67ae1a08-7354-8004-8200-e956cb6b59...
I would like to say one thing about using Docker to "solve" this problem though: Once you think of builds in terms of functions, you realize that a Docker image is basically just the cached artifacts of a build that "just so happened" to work correctly. Consider a function that only occasionally produces a correct value: A Docker image is one of those values.
Thanks for the paper, will check it out! (Still skeptical that we should encourage LLM summarization though, suspect people would gain more and actually learn things from reading papers.)
By all means, read the paper then! It's quite readable, and one of the best papers in software development IMHO
I work on a matching decomp project that has tooling to recompile C into binaries matching a 28 year old game.
In the final binaries created by compiled with gcc 2.6.3 and assembled with a custom assembler there appear to be unused, uninitialized data that is whatever was in RAM when whoever compiled the game created the release build.
Since the goal is a matching (reproducible) binary, we have tools to restore that random data at specific offsets. Fortunately our targets are fixed
What even causes this to happen? ie. what dev tool would add random data from RAM to a binary? Is this likely a bug or is there some reason for it like needing to reach a specific file size somewhere?
Simply calling write() on a C struct can do that, if there is any padding in the struct. Then, of course, there are bugs.
By accidentally writing out uninitialized memory contents to the file with the game still working. It's even worse in DOS era where there is no memory protection so uninitialized memory can contain data used by other processes, so for example parts of source code can get leaked that way. There's a big list of those in https://tcrf.net/Category:Games_with_uncompiled_source_code
Yeah, this was originally all DOS and Windows 3.1 utilities for writing programs that would run on MIPS. The data is small enough that it isn’t relevant, just not reproducible through standard build tools because it was never meant to be bitwise reproducible.
please do write more about it.
We use a tool named dirt-patcher[0], which was written for the project. It lets you write arbitrary bytes at specified offsets[1].
As far as we know at this time, they’re just uninitialized bytes that would have been padding for alignment or other reasons anyway. Maybe if we move to an official build tool chain we’ll find they are deterministic, but for now, we believe they are garbage that happened to make it into the final binary.
0 - https://github.com/Xeeynamo/sotn-decomp/blob/master/tools/di...
1 - https://github.com/Xeeynamo/sotn-decomp/blob/master/config/d...
I would note that stagex is 100% reproducible, and full source bootstrapped.
Every artifact is reproduced and signed by multiple maintainers on independently controlled hardware and this has been the case since our first release around this time last year.
Is anyone actually implementing the concept of checking hashes with trusted builders? This is all wasted effort if that isn't needed.
I've seen it pointed out (by mjg59, perhaps?) that if you have a trusted builder, why don't you just use their build? That seems to be the actual model in practice.
Reproducibility seems only to be useful if you have a pool of mostly trustworthy builders and somehow want to build a consensus out of that. Which I suppose is useful for a distributed community but does seem like a stretch for the amount of work going in to reproducible builds.
The superior distro Arch Linux does it: https://reproducible.archlinux.org/
maintainers build the packages, other people check: https://wiki.archlinux.org/title/Rebuilderd#Package_rebuilde...
> if you have a trusted builder, why don't you just use their build
Pardon my tinfoil hat, but doing this would make them a high-value target. If I like them enough to trust their builds, I probably also like them enough to avoid focusing the attentions of the bad guys on them.
Better would be to have a lot of trusted builders all comparing hashes... like, every NixOS user you know (and also the ones they know) so that there's nobody in particular to target.
That's no different from how NixOS does it. You are still comparing hashes from the first build done by the distribution. A more pure approach would be to use the source code files (simple sha256sum will suffice) as the first independent variable in the chain of trust.
I'm not sure what you mean. It's your machine that calculates the hashes when it encounters the code.
If you bulld the directed graph made by the symlinks in the nix store, and walk it backwards, a sha256 of the source files is what you'll find, both in the form of a nix store path and possibly in a derivation that relies on a remote resource but provides a hash of that resource so we can know it's unchanged when downloaded later.
The missing piece is that they're not gossipped between users. So if I find some code in a dark alley somewhere and it has a nix flake to make building it easy, I've got no way to take the hashes and determine who else has experience with the same code and can help me decide if it's trustworthy.
If your builder is compromised, it can be co-opted to sign and verify the "source code" files with any values. The risk of placing this trust in the builder or the nix store is an easy one to avoid. Getting the authencity of the code from the source code independently ought to be the correct way of verifying reproducible builds.
You mean like, as a signature made by the code's author?
Hmm that feels a bit too much like a root of trust, those make me uncomfortable. I'm more interested in tooling for gathering metadata re: the trustworthiness of some code without the author's participation. If the author wants to be involved, all the better.
The code author could make a signature on every release which would be the strongest guarantee of authenticity. But at a rudimentary level, we could have code hosting repositories simply publish/advertise the sha256 values of the hosted code files.
The root of trust has to lay at the source code origin for a pure implementation of reproducible builds and for the security reasons I mentioned earlier.
In general it doesn't help much IMO to have distributions take a silo view of the problem. But those are just my ideas and thoughts on the matter.
There are also some other gaps left to close to implement this vision, mentioned in this post an my reply to it:
I've opened a tab to your paper and I'll be reading it, thanks for the link
That's great. Feel free to reach out if you want to, I'm happy to answer any questions. It's basically my job, that I really love. :)
> is useful for a distributed community but does seem like a stretch for the amount of work going in to reproducible builds
Good point but even in the case of a larger monolithic systems you want to be sure it is possible to forensically analyze your source, to audit it. Once you can trust that one hash relates to this specific thing you can sign it, etc. This can then be "sold" with some added value of trust down the stream. Tracking of hashes also becomes easier once they are reproducible because they mean much more than just a "version".
There is also an additional benefit to reproducible builds, where getting the same output every time could help avoiding certain regressions. For instance, if GitHub actions performs extensive testing on a particular executable. Then you want to be able to get the exact same executable in the future, not one that is slightly different.
Yes. Reproducibility also makes it possible to aggregate information about the links in dependency trees and distribute trust on that basis.
That stuff is useful to humans, but it is also really useful for cold hard automated logical reasoning about dependency trees.
> is anyone actually implementing [..]
Not for NixOS as far as I can tell. You only have this for source derivations where a hash is (usually in a PR) submitted and must be reproducable in CI. This specific example however has the problem that linkrot can be hard to detect unless you regularly check upstream sources.
You also couldn't feasibly do that for derivations that actually build packages, instead of fixed output derivations only, because if you the update the package set to include a newer version of the compiler, which would often produce a different output, in addition to having to rebuild everything, you would have to update all of the affected hashes.
What you should be able to do in the future with a system like nix plus a few changes is use nix as a common underlying mechanism for precisely describing build steps, and then use whatever policy you like to determine who you trust.
One policy can be about having an attestation for every build step, another one can be about two different builders being in agreement about the output of a specific build step.
That way you can construct a policy that expresses reproducibility, and reproducibility strengthens any other verification mechanism you have, because it makes it so that you can aggregate evidence from different sources. and then have different build hosts
> You also couldn't feasibly do that for derivations that actually build packages, [..] you would have to update all of the affected hashes.
You can actually, changes to stdenv are possible and "just" a lot of work. You will regularly see them between releases or on unstable and they cause mass rebuilds. This doesn't just affect a compiler but also all stdenv tooling as these changes tend to cause rebuilds across nixpkgs. This would be verifiable but it obviously multiples the amount of compute spent.
Hint: If you look at PRs for nixpkgs you will notice labels indicating the required amounts of rebuilds, e. G., rebuild-darwin:1-10. See for example https://github.com/NixOS/nixpkgs/pull/377186 with the rebuild-darwin:5001+ label.
I know about mass rebuilds, but in the parent comment you were talking about fixed output derivations, and committing the hashes for a mass rebuild to version control is technically possible, but not a reasonable workflow, because it makes all changes that are mass rebuilds conflict.
What works better is keep track of those hashes as part of the signatures, which is already happening. There's a lot of interesting things that can be done with that kind of information, I'm one of the people working on that kind of stuff.
Basically I have a paper out about how verifiable and reproducible can come together like that in Nix:
Note that NixOS's "build" step often actually doesn't do any compilation. Often it's just downloading a binary from github releases and runs NixOS's specific binary tools on it to make it look for libraries in the right places.
So if that process is reproducible, it's a different statement from a Debian package being reproducible, which requires build inputs in the preferred form of modification (source code).
You are right, but I would not agree with this appearing "often". I get the impression that the nixpkgs community tries quite hard to truly compile from source even quite complex projects like Firefox and LibreOffice.
Completely false. Building from the actual source code is strongly preferred and usually easier than patching a binary that wasn't built for such an environment.
I've genuinely never really understood the appeal of Nix. I even attempted to use this to build and "maintain" the machines we used at an offsite factory and even then with just a basic electron app, a python installation and very basic mac configs Nix proved to be a complete nightmare.
I guess this is a tangent, but Nix to me feels like the right idea with the wrong abstraction. I can't explain / it would take a serious bit of genius to come up with an alternative worth switching to.
Has anyone done any better?
I agree, I feel like Nix is kind of a hack to work around the fact that many build systems (especially for C and C++) aren't pure by default, so it tries to wrap them in a sandboxed environment that eliminates as many opportunities for impurity as it reasonably can.
It's not solving the underlying problem: that build systems are often impure and sometimes nondeterministic. It also tries to solve a bunch of adjacent problems, like providing a common interface for building many different types of package, providing a common configuration language for builds as well as system services and user applications in the case of NixOS and home-manager, and providing end-user CLI tools to manage the packages built with it. It's trying to be a build wrapper, a package manager, a package repository, a configuration language, and more.
> It's not solving the underlying problem: that build systems are often impure and sometimes nondeterministic
It's not Nix's job, imo. Those compilers should be fixed.
And all the other "features" come for free from Nix's fundamental abstractions, I don't feel it would overstep its boundaries anywhere.
Purity becomes a hard goal when ever you hit the real world at build or runtime. By definition, you have to bridge 2 domains.
Imagine constant time compute and constant memory constrains, required in cryptography, being applied to the nix ecosystem.
Yes, this is an artifical example but it shows that purity is harder to define and come by, then some people think. Maybe someday these constraints actually do apply to nix goal of reproducibility.
With ever changing hardware that purity is a moving target so nix imo will always be an approach to purity and bundling so much tooling is to be expected. Still, you can legitimately call it a hack :)
You can use the low level stuff without the language to forge your own journey.
https://github.com/NixOS/nix/blob/master/doc/manual/source/s...
I am working on the docs for this as we speak.
I have never checked if my c compilers are deterministic, but Gentoo has tinderbox, and since everything that has an emake or whatever has a sha hash; this means if I use the exact same sha hashed source as the tinderbox binary, I should get a bitwise equal binary output myself. I of course imply all of the toolchain is using sha hash verified source output.
In Gentoo
`emerge -e <package name>` will do it, add binpkgs if you know what you're doing (I do, and I do).
IIRC any package that uses Java isn't reproducible because system time and fixing it to epoch permamently causes issues in some application builds.
* there're maven and gradle plugins to make builds reproducible.
Can you force it to some time other than 0? Ex. I've seen some packages force timestamps to the git commit timestamp, which is nice but still fixed.
This is an approach you can use when building Docker images in Nix flakes: https://github.com/aksiksi/ncdmv/blob/aa108a1c1e2c14a13dfbc0...
The standard usually isn't 0s since epoch:
https://bugs.openjdk.org/browse/JDK-8264449 https://reproducible-builds.org/docs/source-date-epoch/
IME Erlang was like this ~8 years ago (the last time I touched it) but things may have changed since then.
What issues? I'm not aware of any Java build process that checks timestamps.
JARs are archives, and archives have timestamps.
You can remove those with some extra work.
Just add a post-process step that sets the output artifacts' timestamps (including its content)?
Wouldn't that work?
Yes, just add that.
Can you elaborate on the root causes?
Certainly not all packages are reproducible, but the project is systematically increasing the percentage of projects that are reproducible while ALSO adding new projects and demonstrating conclusively that what was considered infeasible is actually very readily achievable.
Aside from this being a great article with lots of interesting details, it's also a rare example of a headline that does NOT follow "Betteridge's law of headlines"
I got scared and then I was unexpected releaved!
(-- A Nix maintainer)
so looks like reproducibility rate of nixos is not that high, roughly similar with debian?
At this point almost all major package managers try to not introduce any non-reproducibility, so it comes down to package itself building deterministically. Nix just have an architectural sandboxing and isolation that enforces is maybe a bit better. But the moment e.g. ArchLinux devs fix determinism and upstream it in certain package, it will become deterministic in NixOS and others, and vice versa, so there is not going to be a lot of differences between distros at this point. Everyone agrees that deterministic builds are a good thing.
> there exist no reproducibility monitoring at the scale of the Nix package set (nixpkgs)
I think it would be fairly easy to do this monitoring with a bit of community participation. At least I'd enable telemetry ¯\_(ツ)_/¯.
By default the pkz57 in /nix/store/pkz57...-nushell-0.97.1 is a hash of the build inputs for that package. If you hash the contents of that dir, you get an identifier for the build output.
If we then make a big list of such pairs as built by different people on different machines at different times, and capture the frequency of each pair we'll either see this:
/nix/store/pkz5..., sha256:0drrxa..., 121 users
or we'll see this: /nix/store/pkz5..., sha256:0drrxa..., 100 users
/nix/store/pkz5..., sha256:gvpbk5..., 20 users
/nix/store/pkz5..., sha256:1fwwfe..., 1 user
The former being an indicator that the build is reproducible, and the latter giving hints about why not (supposing these users are willing to share a bit more about the circumstances of that build). I'd call it a "build chromatograph". I expect that knowing whether you're one of the 100 or the odd-man-out could be relevant for certain scenarios.I'm not sure the "distribution" would be all that helpful.
A single compiler that does some parallel work and collect the results of that work in a list in order of completion (and similar) are probably the most common cause of non-determinism.
Given that, your chromatograph would be mostly "determined" by the source code's peculiarities, instead of the offending compiler itself. (E.g. I have n classes a compiler would process in parallel, so given a single point of timing non-determinism n! different combinations could possibly exist (assuming they all cause a different outputs). The only information I could conclude from such a distribution is that there is a common sequence of completing tasks).
But your idea is cool, and simply reporting back non-matching local builds would be helpful (I believe the binary property of whether a different output could be built is the only relevant fact) -- also, if we were to mark a package as non-reproducible, we could recursively mark everything else that has it as a (transitive) input.
I think there are only a few cases where it would be helpful:
- one hash to rule them all, perfectly reproducible
- a big mess, consider avoiding it
- just two or four cohorts, package maintainer may want to investigate
- everbody agrees on the output hash except for you, something local is compromized
I don't anticipiate peering into the mess and coming up with many useful conclusions.
> if we were to mark a package as non-reproducible, we could recursively mark everything else that has it as a (transitive) input.
I like that idea, to sort of carve out a space within the already-pretty-reliable nixpkgs which can be expected upon to be perfectly reproducible. I'd strive to get my packages included in that set, and to select my dependencies from it.
Is it somewhat reproducible? Yes.
But what did it cost? Usability.
[dead]
Let me guess... No?
>As part of my PhD, and under the supervision of Théo Zimmermann and Stefano Zacchiroli, I have empirically studied bitwise build reproducibility in nixpkgs over a time period of 6 years.
Why spend only 6 years on the most interesting topic of all mankind? I spent 10 years analyzing this.
In my case, I define, "reproducible," to mean, "immutable." After a few days of testing, I broke NixOS. Simple test was swapping different Desktop Environments, eventually broke Nix, thus I'm not at the point where I'd agree with Nix being truly reproducible, at least not in that context :(
One problem is that the applications themselves are impure.
Just running KDE litters a bunch of dotfiles into your user folder, even for settings you didn't adjust. This is true for many applications.
If you had an empty home folder and passively tried a handful of desktops, you'd no longer have an empty home folder. Hopefully your environment is resilient to clutter being leaked into your home folder, but if your filesystem isn't truly immutable, rolling back to a particular Nix config might not get you the exact state your system was in when you first built that.
There's a project that wipes all local changes when you restart your machine, with the goal of making Nix systems more reproducible. I think it's called Impermanence.
I do all my stuff in temporary docker containers, and when I’m done, the container gets blown away.
If the point of Nix is to keep a filesystem immutable as long as every app sticks to certain rules, is it actually the right till for the job?
Sorry… I actually don’t know much about Nix given I’ve been using VMs and now containers for over a decade, so just trying to understand the problem that nix actually solves
I do something similar with Nix Flakes for a lot of my applications. I get my stuff working in a Flake, then I execute it with `nix run`; this is an ephemeral thing; once I kill the app then it's unlinked and can be garbage collected eventually.
It can still write to folders, so it's not completely silo'd off like a full-on Docker container, but I still really like it.
Just chatgpt’d it. I see… what I’m thinking about more was NixOS. Ok, I think I see how it could work, but I’d apps aren’t really isolated, then couldn’t a system still get to a broken if it spills out?
At the moment I’m using Ansible for the host and Docker for guests, but I see NixOS as combining these two lasers so everything just runs on the host? Is that fair to say how NixOS works? If so and I have it wrong, maybe I should check it out and I’ve been sleeping on Nix all this time
A "normal" NixOS system will only give you a full sandboxed isolation for apps at build time and not a runtime. nixpkgs (the thing packaging the stuff for NixOS) provides packages for apps similar to Debian afterwards and not flatpak in terms of runtime isolation (if I understand your use case).
My recommendation would be to test it out and look at how it does things. Maybe checkout the live installer with a gui to get a feel for a desktop system.
Ah! Ok that makes more sense now… lol I was wondering how they went about that. Even the table about /etc being immutable but you can edit seems a bit weird. But I kind of got the picture now
Hmm.. nix-shell and “nix develop” do look interesting!
Edit: ok I HAVE been sleeping on NixOS! I couldn’t understand how isolation worked with /etc files, but it turns out /etc is not modified but you do it all through modifying the nix config and rebuild the system which generates /etc! Ok, super interesting
Those things are not the same though. Reproducible just means it will break again if you configure your system in the same way.
Crafted by Rajat
Source Code