\myheading{Cyclic redundancy check} I've always been wondering, which input bit affects which bit in the final CRC32 value. From the \ac{CRC} theory (good and concise introduction: \url{http://web.archive.org/web/20161220015646/http://www.hackersdelight.org/crc.pdf} ) we know that \ac{CRC} is shifting register with taps. We will track each bit rather than byte or word, which is highly inefficient, but serves our purpose better: \lstinputlisting[style=custompy]{\CURPATH/4_CRC/1.py} Here are expressions for each CRC32 bit for 1-byte buffer: \lstinputlisting{\CURPATH/4_CRC/1byte.txt} For larger buffer, expressions gets increasing exponentially. This is 0th bit of the final state for 4-byte buffer: \begin{lstlisting} state 0=((((((((((((((in_0_0^1)^(in_0_1^1))^(in_0_2^1))^(in_0_4^1))^(in_0_5^1))^(in_0_7^(1^(in_0_1^1))))^ (in_1_0^(1^(in_0_2^1))))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))^(in_0_4^1))))^(in_1_3^(((1^(in_0_1^1))^ (in_0_2^1))^(in_0_5^1))))^(in_1_4^(((1^(in_0_2^1))^(in_0_3^1))^(in_0_6^(1^(in_0_0^1))))))^(in_2_0^((((1^ (in_0_0^1))^(in_0_6^(1^(in_0_0^1))))^(in_0_7^(1^(in_0_1^1))))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))^(in_0_4^ 1))))))^(in_2_6^(((((((1^(in_0_0^1))^(in_0_1^1))^(in_0_2^1))^(in_0_6^(1^(in_0_0^1))))^(in_1_4^(((1^(in_0_2^1))^ (in_0_3^1))^(in_0_6^(1^(in_0_0^1))))))^(in_1_5^(((1^(in_0_3^1))^(in_0_4^1))^(in_0_7^(1^(in_0_1^1))))))^ (in_2_0^((((1^(in_0_0^1))^(in_0_6^(1^(in_0_0^1))))^(in_0_7^(1^(in_0_1^1))))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))^ (in_0_4^1))))))))^(in_2_7^(((((((1^(in_0_1^1))^(in_0_2^1))^(in_0_3^1))^(in_0_7^(1^(in_0_1^1))))^(in_1_5^(((1^ (in_0_3^1))^(in_0_4^1))^(in_0_7^(1^(in_0_1^1))))))^(in_1_6^(((1^(in_0_4^1))^(in_0_5^1))^(in_1_0^(1^(in_0_2^ 1))))))^(in_2_1^((((1^(in_0_1^1))^(in_0_7^(1^(in_0_1^1))))^(in_1_0^(1^(in_0_2^1))))^(in_1_3^(((1^(in_0_1^1))^ (in_0_2^1))^(in_0_5^1))))))))^(in_3_2^(((((((((1^(in_0_1^1))^(in_0_2^1))^(in_0_4^1))^(in_0_5^1))^(in_0_6^(1^ (in_0_0^1))))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))^(in_0_4^1))))^(in_2_0^((((1^(in_0_0^1))^(in_0_6^(1^(in_0_0^ 1))))^(in_0_7^(1^(in_0_1^1))))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))^(in_0_4^1))))))^(in_2_1^((((1^(in_0_1^1))^ (in_0_7^(1^(in_0_1^1))))^(in_1_0^(1^(in_0_2^1))))^(in_1_3^(((1^(in_0_1^1))^(in_0_2^1))^(in_0_5^1))))))^(in_2_4^ (((((1^(in_0_0^1))^(in_0_4^1))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))^(in_0_4^1))))^(in_1_3^(((1^(in_0_1^1))^ (in_0_2^1))^(in_0_5^1))))^(in_1_6^(((1^(in_0_4^1))^(in_0_5^1))^(in_1_0^(1^(in_0_2^1)))))))))) \end{lstlisting} Expression for the 0th bit of the final state for 8-byte buffer has length of $\approx 350KiB$, which is, of course, can be reduced significantly (because this expression is basically XOR tree), but you can feel the weight of it. Now we can process this expressions somehow to get a smaller picture on what is affecting what. Let's say, if we can find ``in\_2\_3'' substring in expression, this means that 3rd bit of 2nd byte of input affects this expression. But even more than that: since this is XOR tree (i.e., expression consisting only of XOR operations), if some input variable is occurring twice, it's \emph{annihilated}, since $x \oplus x=0$. More than that: if a variable occurred even number of times (2, 4, 8, etc), it's annihilated, but left if it's occurred odd number of times (1, 3, 5, etc). \begin{lstlisting} for i in range(32): #print "state %d=%s" % (i, state[31-i]) sys.stdout.write ("state %02d: " % i) for byte in range(BYTES): for bit in range(8): s="in_%d_%d" % (byte, bit) if str(state[31-i]).count(s) & 1: sys.stdout.write ("*") else: sys.stdout.write (" ") sys.stdout.write ("\n") \end{lstlisting} ( \url{\RepoURL/\CURPATH/4_CRC/2.py} ) Now this how each bit of 1-byte input buffer affects each bit of the final CRC32 state: \lstinputlisting{\CURPATH/4_CRC/1byte_tbl.txt} This is 8*8=64 bits of 8-byte input buffer: \lstinputlisting{\CURPATH/4_CRC/8byte_tbl.txt}