\myheading{My other implementations of toy decompiler} \leveldown{} When I made attempt to write it in C++, of course, node in expression was represented using class. There is also implementation in pure C\footnote{\url{\RepoURL/\CURPATH/files/C}}, node is represented using structure. Matchers in both C++ and C versions doesn't return any dictionary, but instead, \TT{bind\_value()} functions takes pointer to a variable which will contain value after successful matching. \TT{bind\_expr()} takes pointer to a pointer, which will points to the part of expression, again, in case of success. I took this idea from LLVM. Here are two pieces of code from LLVM source code with couple of reducing rules: \begin{lstlisting} // (X >> A) << A -> X Value *X; if (match(Op0, m_Exact(m_Shr(m_Value(X), m_Specific(Op1))))) return X; \end{lstlisting} ( \href{http://llvm.org/docs/doxygen/html/InstructionSimplify_8cpp_source.html}{lib/Analysis/InstructionSimplify.cpp} ) \begin{lstlisting} // (A | B) | C and A | (B | C) -> bswap if possible. // (A >> B) | (C << D) and (A << B) | (B >> C) -> bswap if possible. if (match(Op0, m_Or(m_Value(), m_Value())) || match(Op1, m_Or(m_Value(), m_Value())) || (match(Op0, m_LogicalShift(m_Value(), m_Value())) && match(Op1, m_LogicalShift(m_Value(), m_Value())))) { if (Instruction *BSwap = MatchBSwap(I)) return BSwap; \end{lstlisting} ( \href{https://github.com/numba/llvm-mirror/blob/master/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp}{lib/Transforms/InstCombine/InstCombineAndOrXor.cpp} ) As you can see, my matcher tries to mimic LLVM. What I call \emph{reduction} is called \emph{folding} in LLVM. Both terms are popular. I have also a blog post about LLVM obfuscator, in which LLVM matcher is mentioned: \url{https://yurichev.com/blog/llvm/}. Python version of toy decompiler uses strings in place where enumerate data type is used in C version (like \emph{OP\_AND}, \emph{OP\_MUL}, etc) and symbols used in Racket version\footnote{Racket is Scheme (which is, in turn, LISP dialect) dialect. \url{\RepoURL/\CURPATH/files/Racket}} (like \emph{'OP\_DIV}, etc). This may be seen as inefficient, nevertheless, thanks to strings interning, only address of strings are compared in Python version, not strings as a whole. So strings in Python can be seen as possible replacement for LISP symbols. \myheading{Even simpler toy decompiler} Knowledge of LISP makes you understand all these things naturally, without significant effort. But when I had no knowledge of it, but still tried to make a simple toy decompiler, I made it using usual text strings which hold expressions for each registers (and even memory). So when MOV instruction copies value from one register to another, we just copy string. When arithmetical instruction occurred, we do string concatenation: \begin{lstlisting} std::string registers[TOTAL]; ... // all 3 arguments are strings switch (ins, op1, op2) { ... case ADD: registers[op1]="(" + registers[op1] + " + " + registers[op2] + ")"; break; ... case MUL: registers[op1]="(" + registers[op1] + " / " + registers[op2] + ")"; break; ... } \end{lstlisting} Now you'll have long expressions for each register, represented as strings. For reducing them, you can use plain simple regular expression matcher. For example, for the rule \TT{(X*n)+(X*m) -> X*(n+m)}, you can match (sub)string using the following regular expression: \\ \TT{((.*)*(.*))+((.*)*(.*))} \footnote{This regular expression string hasn't been properly escaped, for the reason of easier readability and understanding.}. If the string is matched, you're getting 4 groups (or substrings). You then just compare 1st and 3rd using string comparison function, then you check if the 2nd and 4th are numbers, you convert them to numbers, sum them and you make new string, consisting of 1st group and sum, like this: \TT{(" + X + "*" + (int(n) + int(m)) + ")}. It was naïve, clumsy, it was source of great embarrassment, but it worked correctly. \levelup{}