If I were to recommend you use a piece of cryptography-relevant software that I created, how would you actually know if it was any good?
Trust is, first and foremost, a social problem. If I told you a furry designed a core piece of Internet infrastructure, the reception to this would be mixed, to say the least.
If youâre familiar with the deep lore about furries and the Internet, this probably wouldnât surprise you.
(We all need hobbies, after all.)
But if you werenât in the know, you might recoil in horror at the thought. âHell nah, those cringey people on TikTok?â
Software assurance is chiefly concerned with reliability, safety, and security, which is deeply connected to how trustworthy a piece of software is.
(Trustworthy doesnât necessarily mean trusted. Everyone is free to choose what they trust, and what they donât.)
But if you were to approach my work with an open mind, how might someone make an informed decision about whether or not the software Iâm writing is trustworthy enough for their use cases?
Cryptography Audits and Other Thought-Terminating Clichés
If you werenât around for Is TrueCrypt Audited Yet?, you might think this subheading is clickbait.
Cryptography audits and penetration testing in general are valuable work. Audits are essential in avoiding worse outcomes for security in multiple domains (network, software, hardware, physical, etc.).
However, theyâre not a panacea, and an audit status should not be viewed as a simple binary âYES / NOâ matter.
Some âaudit reportsâ are utterly worthless.
Unfortunately, you basically have to be qualified to do the same kind of work to accurately distinguish between the trash and treasure.
However, this is usually one of the first questions I get from some technologists: âWas it audited? And is the report public?â
Audits are not cheap. Hiring a team of qualified and reputable experts to review the entirety of a nontrivial project can easily run into the six digits.
Letâs be real: I donât have that kind of money just lying around, and random gay furries arenât exactly a good fit for a non-profit like OSTIF.
So, no, I donât have an audit for any of my projects.
If not having an audit is a dealbreaker for you, thatâs fine. I would rather you be disappointed with an unchecked box than go out of my way to pay an unqualified company to give me a rubber stamp audit report just to satisfy your request (an act which I consider dishonest).
Audits aside, Iâd like to talk about some of the mechanisms Iâm employing to make my projects as trustworthy as any mere mortal can hope to achieve.
Towards Furry-Grade Assurance
Letâs not mince words: The rest of this blog post is me nerding out about software testing methodologies, how Iâm applying them, and the gap between practice and theory (i.e., formal methods).
The project in scope for this blog post is my Public Key Directory project, which offers an essential building block for making E2EE possible on the Fediverse.
If you want more background about this project, I recommend starting here.
Specification-First Development
Before any code was ever written, I started writing a specification. You can read it online at fedi-e2ee/public-key-directory-specification on GitHub.
This document covers:
- The motivation for its own existence
- Important concepts relevant to the design
- A threat model which includes:
- Assumptions that must be true in order for the software to be secure. These might seem obvious, but itâs better to be explicit.
- Assets in scope for the threat model.
- Actors with specific roles, privileges, and motivations.
- The actual risks being considered, including many that are not mitigated by design.
- The actual definitions, algorithms, steps, etc.
- Security considerations for implementors of the spec
I started writing this in June 2024, and I continue to make revisions as I develop the reference implementations of the specification. This process will continue until the first major version has been tagged.
By starting with a specification, every party can agree on what the software does, what it doesnât do, what are its security goals, and why specific trade-offs were chosen in favor of others. Unlike source code, specifications do not carry the quirks of particular programming language runtimes or package ecosystems.
However, put a pin in this, because thereâs more to talk about later.
Unit Testing: Table Stakes
Once you have a reference implementation, you need it to pass unit tests in CI. If youâre writing software in 2026 and donât have working unit tests, your life is harder than it needs to be.
Whenever you bring up unit testing, the next thing that usually comes up is code coverage.
I do not use code coverage reports in any of my projects. To understand why, you should be familiar with Goodhartâs Law: When a measure becomes a target, it ceases to be a good measure.
Code coverage reports can lie to you with pretty green squares, and this leads to overconfidence.
Instead, I actually rely on two distinct additional testing methodologies.
Mutation Testing: Code Coverage That Cannot Lie
Mutation testing is simple enough to understand.
First, run your full unit test suite and keep track of which unit tests touch which parts of the source code. Then, mutate the source code and study how the test suite behaves.
If the tests still pass when a line of code was changed, then either the change was benign (to the point of being meaningless), or your unit tests donât really cover it.
The ratio of mutants that survive your test suite to the total number of mutants created is often represented as a percentage. The framework Iâm using (Infection) calls this percentage MSI (Mutation Score Indicator).
How Iâm Using Mutation Testing
The cryptography code present in the Public Key Directory project lives in its own repository: pkd-crypto. As I write this, I have an open pull request that kills 89% of mutants in the codebase. I aim for over 90% before I merge it, and over 95% before I tag a stable v1.0.0 release.
The cryptography code is used by both the client and server-side reference software. The server-side software currently has a minimum MSI of 80%, though I will be ratcheting that up higher as well.
âWhy not aim for a 100% MSI?â
Not all mutants are bugs. For example, some of the code I write is meant to avoid side-channels. So the original code, which works on integers in the range of 0 to 255, might look like:
$isCR = ((($char ^ 0x0d) - 1) >> 8) & 1);
But then the mutation testing framework will try:
$isCR = ((($char ^ 0x0d) - 1) >> 9) & 1);
Does this mutant escape? Yes.
But is there any possible value in the range for $char that will lead to an incorrect output? No. -1 >> 8 == -1 >> 9.
Now, I could extract these bitwise operators out into functions that I can externally test in isolation (rather than as part of a larger algorithm). That would let me get a higher MSI out of this library.
However, the PHP language doesnât support macros (at least as of 8.5). Aiming for a 100% score would introduce function call overhead in a critical loop that could negatively impact performance.
Fuzz Testing
(Sorry, I couldnât decide on a âfurryâ pun for this subheading.)
Fuzz testing (or âfuzzingâ) is an automated testing technique that involves generating many invalid inputs in order to study a programâs behavior.
Thereâs a little more to fuzz-testing than âthrow everything at the wall and see what sticksâ. A sophisticated fuzzer such as afl (or afl++, its successor) can reliably learn to invent JPEGs as part of its generation algorithm.
Yâknow those jokes about QA engineers?
A software QA engineer walks into a bar.
He orders a beer. Orders 0 beers. Orders 99999999999 beers. Orders a lizard. Orders -1 beers. Orders a ueicbksjdhd.
First real customer walks in and asks where the bathroom is. The bar bursts into flames, killing everyone.
I do not know who originally told this joke. If anyone knows, I will update this with a link to credit the original source. Too many joke thieves online.
Thatâs basically fuzzing.
Currently, for my projects, PHP-Fuzzer runs on every push and pull request against the main branch for some reasonable number of runs, and then a lot more whenever a release is tagged.
Separately (though this is less visible from the CI config), I run it in the background without limits to see if any interesting behaviors emerge.
The combination of fuzzing and mutation testing provides a deeper and broader level of assurance than any PHPUnit-derived code coverage report ever could on its own. But why stop there?
Static Analysis
I actually employ multiple tools here. Psalm and PHPStan are both useful for identifying a wide range of problems in PHP code, ranging from âwhat the hell even is this variableâs type?â linting to full on âyeah this is a SQL injection vulnâ taint analysis.
However, I also added a security-focused static anaylsis tool called Semgrep to the mix. Semgrep runs with rules specifically intended to identify unsafe PHP code.
This all runs in CI, on every commit, and complains loudly if anything is amiss.
Property-Based Testing
This comes up a lot in discussions about testing methodologies, so it bears mentioning here.
Property-based testing is a complementary approach to fuzzing that emphasizes logical correctness rather than security properties.
This can all sound kind of vague and abstract, so hereâs a clear example:
If you encrypt a field, then decrypt the corresponding ciphertext, you should always get the same input.
This property should hold true for random inputs and random keys, as long as theyâre used correctly (i.e., the same key is used for both operations).
In addition to what was mentioned above, the pkd-crypto project uses a PHPUnit-compatible tool called Eris for property-based testing.
Integration Testing
Iâve talked a lot about pkd-crypto (the common cryptography components for both client- and server-side code) as well as pkd-server-php (the actual server-side software implementation of the spec).
If youâre wondering, âWhat is about the client-side software?â youâve activated my nerd-snipe card.
The client-side software Iâm developing is deliberately very dumb in scope: Itâs mostly a thin HTTP Client wrapper that talks to pkd-crypto to generate/validate things.
But thatâs also where a deeper integration test resides. The client-side software does the following with each build:
- Clones the pkd-server-php repository and configures it to run locally.
- Clones a miniature Fediverse server implementation I wrote for the purpose of test orchestration.
- Tests the client against the local PKD server and mini-fedi-server.
The client library is still being developed, but this integration testing setup will allow me to create contrived scenarios and test their behavior thoroughly in a reproducible way.
What Is Left To Do
As cool as all this stuff is, there are still gaps left to fill that could undermine the security or reliability of this project. Of course, Iâm actively working on addressing these.
What if I could give you a machine-verifiable mathematical proof that my design is secure?
Thatâs what formal verification is all about.
My current plan is to write ProVerif models of the PKD specification. I intend to cover the core algorithms, the state machine that runs atop the Transparency Log, and the risks covered by the threat model.
There are other tools Iâm considering as well. Using hacspec for a future Rust implementation is a tantalizing prospect.
Requirements Traceability
Towards the top of this blog post, Iâd said to put a pin in the specification. Itâs time to unpin that.
Requirements traceability tools allow you to tie your implementation and testing framework into the requirements from a formal specification.
AWS released a tool for this purpose called Duvet, which supports parsing RFC-style specification documents to obtain a corpus of requirements. You can then instrument your code and tests to cover these requirements (e.g., foo MUST be set to bar), and Duvet will report on any gaps you still have.
Requirements traceability is the critical last mile of providing high assurance for this project:
Formally verifying the specification is almost useless without a mechanism to ensure the actual implementation adheres to the spec.
Unfortunately, Duvet doesnât currently support the ProVerif comment syntax, so I cannot even get started on this work.
On that note!
From what I understand talking with ex-employees, Amazon has a culture of so-called âcustomer obsessionâ.
What this means in practice is, unless you have an enterprise customer literally asking for a specific feature, the demand for it might as well not even exist.
They donât pay much attention to reactji on GitHub issues.
They do pay attention to what their big customers are asking their technical account managers about when planning to spend tens of millions of dollars on compute.
If anyone reading this works for a company that spends a lot of money on cloud services, ask about their investment into assurance tools like Duvet.
The more interest there is, the better these tools will become.
Isochronic Verification
Put simply: How do you know if your code is actually constant-time?
Many cryptographers and security engineers consider this property a losing battle against compilers and hardware optimizations.
Others are developing standardized constant-time support into mainstream compilers in order to ensure the mitigations never get optimized away.
This might not matter much for the reference implementation in PHP (which cannot guarantee its own runtime is side-channel free). But, as that matures and I look towards implementations in Go and/or Rust, I do plan to identify mechanisms for ensuring the compiled code is free of side-channels.
Closing Thoughts
I hope anyone that reads this blog post walks away with a good understanding of the measures Iâve taken to hedge against my own mortal fallibility as I build the Public Key Directory project, as well as the additional work I plan to do as this project matures.
While there will always be hold-outs that do not trust pseudonymous nerds for whatever personal reason, I do hope that anyone with an open mind walks away feeling like their friend just gave them a big hug in their first freshly-unboxed fursuit.
Header art: Harubaki and CMYKat
Original post written by Soatok