I’m Tim Nosco, a cyber security researcher here at RII. I’m also a husband, a father, and a U.S. Army veteran. I’ve always enjoyed the intersection of computer science and mathematics, specifically topics like satisfiability solving, symbolic execution, and cryptography.
My team recently ran into an interesting exploitation scenario. The device we were testing had a read-out-of-bounds flaw, allowing a routine to access uninitialized memory outside of its own stack frame. It turns out, if an attacker carefully crafts an input so that the use of this uninitialized data still results in a valid message, they can learn some information about the uninitialized data. We turned this property of the system into a side-channel information leak.
A key component to successfully taking advantage of our target’s weakness was the ability to manipulate a data integrity check. A few times in my career, a researcher needed to modify a firmware object, but make changes in such a way that the new firmware maintains the same integrity-checker values as the original. In one example, the target employed a complex custom integrity-checking routine; even in this case, I could use the skills presented here to generate a specific return value.
My hope is that after reading this article, you’ll have the necessary tools to collide popular integrity-checking routines such as the different variants of CRC.
What can you do with this knowledge?
I’d like to touch on a few recent public vulnerability disclosures to emphasize why improper processing of sensitive data is important. In USENIX Security ‘24, Chen et al. presented GoFetch: Breaking Constant-Time Cryptographic Implementations Using Data Memory-Dependent Prefetchers. Their principal discovery was that some chips do additional processing of data that looks like a pointer to other data. The clever attackers then crafted inputs to popular cryptography libraries such that the intermediate product of their input mixed with sensitive data would look like a data pointer. By carefully measuring the time it took for the library to return a response, the attackers learned the server’s sensitive data.
Consider how much software the average person uses in the broader scope of our day-to-day lives. You may accept an update to your browser, or a new firmware for your router. You might like to have some new graphics card drivers or maybe even some new classifiers from Windows Defender. All of these distribution mechanisms rely on cryptographic signing, and modern cryptographic signing relies on the secrecy of a private key. The implication of GoFetch might be a compromise of a software vendor’s private keys used to sign any number of the tools we use daily.
Of course, side-channel information leaks are one step removed from running malicious code on your device, as opposed to running malicious code directly. In comparison, the recent XZ-utils back door depended on our trust in the authors of the software instead of the security of their private key. But that doesn’t belittle the impact of sensitive information disclosure on our security. Let us get into the nuts and bolts of our problem, the linear function.
What is a linear system?
In summary, a binary linear system is a situation where each result bit can be represented as a linear equation of the input bits.
Figure 1: An example linear system of equations. You have an input X and an output Y. Both are bit vectors with 2 bits.
Inside each equation, we can do boolean algebra with the operations AND ∩, OR ∪, and NOT ∼. There is a convenience operation named XOR ⊕ that is short for ∼ (A ∩ B) ∩ (A ∪ B). XOR is particularly convenient because it operates the same way as the integer ADD operation in the modulo two finite field. See the truth table in Figure 2 for more information.
Figure 2: The matching truth tables for ADD, SUB, and XOR operations in (mod 2).
This article will have a recurring focus on a popular linear system, the Cyclic Redundancy Check, or CRC for short. CRC functions take in a bit vector of arbitrary size and “calculate a short, fixed-length binary sequence known as the check value” (Wikipedia). The CRC Wikipedia page even has a small section dedicated to explaining the linear property of CRC, as recreated in Figure 3.
Figure 3: A security-relevant property of CRC used in Wired Equivalent Privacy (WEP) cryptographic attacks.
CRC has this linear property because it is calculated using a moving window of its polynomial over the input bit vector. This polynomial is XOR’d into an ever-shortening accumulator until it fits in the fixed-length result. Because XOR and SHIFT are the only operations used here, we can represent every output bit as an XOR of the input bits, as shown in Figure 4.
How do I do the thing?
Our problem is that we have some string of bytes going into our linear function, and we want to modify those bytes while maintaining the same linear function output. To facilitate discussion, let’s make some variables:
The function under test—presumably CRC—will be f.
We’ll call the bit-length of f’s output: n.
The string of bytes going into f will be s.
Within s, there will be an n-length sequence of contiguous bits inconsequential to the system that calls f. We’ll name these bits the free bits or x.
The result of running f on s will be f(s), or y.
In the case where s is static except for the x substring we’re manipulating, we may also say y=f(x), but remember that we’re running f on all of s, not just the x substring.
Building Equations
So we have this n-bit result, y, and this n-bit sequence of free-bit inputs, x. We can represent every bit of y as a function of the input bits in x, as shown in Figure 4.
Figure 4: A linear system of equations where n=2.
In our equation, we introduce this set of a constants that act as a switch to include or not include the corresponding bit of x. Similar to how XOR, ADD, and SUB have the same truth table in (mod 2), MUL and AND have the same truth table. With that in mind, the equations in Figure 4 begin to look very familiar to the average algebra enjoyer who’s seen:
Linear Algebra
A long time ago, some smart people built this model for how to think about systems of linear equations and named it Linear Algebra. More or less, let’s consider it a way to organize our equations in Figure 4 so we don’t get confused. We’ll take all the a constants and put them in rows and columns corresponding to their position in y and x, respectively. Figure 5 shows the result.
Figure 5: A more sane way to organize the equations from Figure 4.
For those merely acquainted with the darkness (of linear algebra), Figure 4 and Figure 5 are the same systems when considering the equivalency between AND and MUL and between ADD and XOR while applying linear algebra multiply and addition.
Back To The Basis
Remember that x in our problem is our free bits with y as the result of CRC’ing our string with x inserted. We’ll first carefully choose an x so that we can learn the value of b, our basis vector. The zero vector is a good option.
Figure 6: y = Ax + b where x is the zero vector results in y = b, revealing the value of our basis vector, b.
As shown in Figure 6, setting the free bits to zeros and then conducting the CRC of our whole string gives us the value of the b vector. For the sake of our adventure, let’s assume you ran this example with n=2 and found a basis vector of [0 0].
The Value of A
The next step in our adventure is to reveal the value of A in our system. For this, we will actually need a few inputs to f: n many, to be precise. As shown in Figure 7, x and y will actually be a matrix of the same shape as A.
Figure 7: x and y will have the same dimensions as A. Each row of x and y corresponds to an input, x, and output, f(x).
Now, what x should we pick as inputs into our system? To make our lives easier, let’s use the identity matrix. Identity matrices have this nice property similar to multiplying by 1 in the real number system. Any matrix multiplied by the identity matrix is itself, thus simplifying as in Figure 8.
Figure 8: Selecting the identity matrix as our x inputs allows us to solve for A.
In Figure 8, we set the x inputs to sequences like “10,” “01,” etc., so that each input to f(x) is a row of the identity matrix. Assume for demonstration, x = [1 0] resulted in y = [1 1], and x = [0 1] resulted in y = [0 1]. Then, we can simplify our equation and solve for each cell of A, as shown in Figure 9.
Figure 9: We work through a concrete set of inputs and outputs to reveal the A matrix, thus completing our model y = Ax + b.
Selecting an x to get a desired f(x)
Now that we’ve found our values for A and b, we’re ready to answer our original question, “How do I collide CRC?” Or… Almost ready. We still need to solve y=Ax + b for x. But how?
The first step is easy; we subtract both sides of the equation by b, leaving the Ax term on the right-hand side by itself. Now, how do I move A over? The answer is that you must multiply both sides of the equation by the inverse of A, otherwise written as A⁻¹. If we can do this, we end up with the equation in Figure 10, our biggest take away from this whole article.
Figure 10: The crown jewel. The keys to the kingdom. The ancient texts: deciphered. How to collide CRC functions: finally answered.
There is a strategy for finding the inverse of a matrix named Gaussian Elimination that works for binary matrices, too. If you care to learn about it, consider reading the paper, A Fast Algorithm for Gaussian Elimination over GF(2) and Its Implementation on the GAPP by Koç et al. If you’ve made it this far, but Gaussian Elimination is a step too far, then you concatenate A with the identity matrix, I, run popcornell/gf2elim.py, and then take the right-hand part of the resulting matrix as in Figure 11.
It’s worth noting that not every matrix is invertible. If that happens, change up another part of your inputs to f and re-find b and A.
Figure 11: Finding the inverse of A in Python using Gaussian Elimination in the Galois Field of two elements.
Now, if I wanted to determine the free bits that give me the CRC result of [1 0], I would plug in y = [1 0] and solve for x in the crown jewel equation of Figure 10, as shown in Figure 12.
Figure 12: Using math for fun and profit.
Conclusion
In this article, we've shown you a step-by-step method to determine the basis vector b and the transformation matrix A. These might sound fancy, but they're basically the secret sauce that helps us play around with linear functions. We've used a cool technique called Gaussian Elimination to do some math magic and find the last important piece of the puzzle, A⁻¹. If you’re still unsatisfied, maybe check out this article by Project Nayuki.
But why does all this matter? Well, in the context of patching a system that must maintain a data integrity check, we can now generate some bytes that will result in a passing check. In the context of running an integrity check on our data mixed with sensitive data, well, you’ll just have to figure that bit out on your own.
Good luck, and happy hacking!
留言