Artifact Binary Provenance #
Control Code: KBC2
TLDR: Every software running in a production system has known provenance
Rationale: High security environment require a tamper-proof identity scheme. The use of Content Addressable Storage mechanisms ensures that if software changes it will have a different identity.
Background #
To define software identity, you use the cryptographic hash of the software itself. We use the SHA256 digest of the sofware binary.
This means that if a single byte in the software changes it will have a different identity. This ensures we can’t qualify one software artifact and deploy a different one. It also allows us to create a provable chain of custody from commit to build to production.
What evidence to record #
- The SHA256 of the artifact (docker image, zip file, …)
- A human readable name
- The git commit that produced it
- The git repository state (clean, unstaged files, …)
- A url to the build log
- The build environment information
- The software bill of materials
Alternative identification of artifacts #
It can be helpful to use human-friendly identites in CI displays, filenames, and docker image tags. Some examples could be semver, or git source commits.
These are very useful ways for humans to navigate identity through version control and CI systems. However, since they are fallible, they cannot be used to identify software in the security and compliance areas.
Use labels for humans and SHAs for machines.
How we implement this control #
We use Kosli to record every official build in our CI system. The audit trails for our binary provenance can be found here: https://app.kosli.com/kosli/flows/