\myheading{Dealing with compiler optimizations} \leveldown{} The following piece of code \dots \begin{lstlisting} mov rax, rdi add rax, rax \end{lstlisting} \dots will be transformed into \emph{(arg1 + arg1)} expression. It can be reduced to \emph{(arg1 * 2)}. Our toy decompiler can identify patterns like such and rewrite them. \begin{lstlisting} # X+X -> X*2 def reduce_ADD1 (expr): if is_expr_op(expr) and get_op (expr)=="+" and get_op1 (expr)==get_op2 (expr): return dbg_print_reduced_expr ("reduce_ADD1", expr, create_binary_expr ("*", get_op1 (expr), create_val_expr (2))) return expr # no match \end{lstlisting} This function will just test, if the current node has \emph{EXPR\_OP} type, operation is ``+'' and both children are equal to each other. By the way, since our data structure is just tuple of tuples, Python can compare them using plain ``=='' operation. If the testing is finished successfully, current node is then replaced with a new expression: we take one of children, we construct a node of \emph{EXPR\_VALUE} type with ``2'' number in it, and then we construct a node of \emph{EXPR\_OP} type with ``*''. \TT{dbg\_print\_reduced\_expr()} serving solely debugging purposes---it just prints the old and the new (reduced) expressions. Decompiler is then traverse expression tree recursively in \emph{deep-first search} fashion. \begin{lstlisting} def reduce_step (e): if is_expr_op (e)==False: return e # expr isn't EXPR_OP, nothing to reduce (we don't reduce EXPR_SYMBOL and EXPR_VAL) if is_unary_op(get_op(e)): # recreate expr with reduced operand: return reducers(create_unary_expr (get_op(e), reduce_step (get_op1 (e)))) else: # recreate expr with both reduced operands: return reducers(create_binary_expr (get_op(e), reduce_step (get_op1 (e)), reduce_step (get_op2 (e)))) ... # same as "return ...(reduce_MUL1 (reduce_ADD1 (reduce_ADD2 (... expr))))" reducers=compose([ ... reduce_ADD1, ... ...]) def reduce (e): print "going to reduce " + expr_to_string (e) new_expr=reduce_step(e) if new_expr==e: return new_expr # we are done here, expression can't be reduced further else: return reduce(new_expr) # reduced expr has been changed, so try to reduce it again \end{lstlisting} Reduction functions called again and again, as long, as expression changes. Now we run it: \begin{lstlisting} python td.py tests/add1.s ... going to reduce (arg1 + arg1) reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2) going to reduce (arg1 * 2) result=(arg1 * 2) \end{lstlisting} So far so good, now what if we would try this piece of code? \begin{lstlisting} mov rax, rdi add rax, rax add rax, rax add rax, rax \end{lstlisting} \begin{lstlisting} python td.py tests/add2.s ... working out tests/add2.s going to reduce (((arg1 + arg1) + (arg1 + arg1)) + ((arg1 + arg1) + (arg1 + arg1))) reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2) reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2) reduction in reduce_ADD1() ((arg1 * 2) + (arg1 * 2)) -> ((arg1 * 2) * 2) reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2) reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2) reduction in reduce_ADD1() ((arg1 * 2) + (arg1 * 2)) -> ((arg1 * 2) * 2) reduction in reduce_ADD1() (((arg1 * 2) * 2) + ((arg1 * 2) * 2)) -> (((arg1 * 2) * 2) * 2) going to reduce (((arg1 * 2) * 2) * 2) result=(((arg1 * 2) * 2) * 2) \end{lstlisting} This is correct, but too verbose. We would like to rewrite \emph{(X*n)*m} expression to \emph{X*(n*m)}, where $n$ and $m$ are numbers. We can do this by adding another function like \TT{reduce\_ADD1()}, but there is much better option: we can make matcher for tree. You can think about it as regular expression matcher, but over trees. \begin{lstlisting} def bind_expr (key): return ("EXPR_WILDCARD", key) def bind_value (key): return ("EXPR_WILDCARD_VALUE", key) def match_EXPR_WILDCARD (expr, pattern): return {pattern[1] : expr} # return {key : expr} def match_EXPR_WILDCARD_VALUE (expr, pattern): if get_expr_type (expr)!="EXPR_VALUE": return None return {pattern[1] : get_val(expr)} # return {key : expr} def is_commutative (op): return op in ["+", "*", "&", "|", "^"] def match_two_ops (op1_expr, op1_pattern, op2_expr, op2_pattern): m1=match (op1_expr, op1_pattern) m2=match (op2_expr, op2_pattern) if m1==None or m2==None: return None # one of match for operands returned False, so we do the same # join two dicts from both operands: rt={} rt.update(m1) rt.update(m2) return rt def match_EXPR_OP (expr, pattern): if get_expr_type(expr)!=get_expr_type(pattern): # be sure, both EXPR_OP. return None if get_op (expr)!=get_op (pattern): # be sure, ops type are the same. return None if (is_unary_op(get_op(expr))): # match unary expression. return match (get_op1 (expr), get_op1 (pattern)) else: # match binary expression. # first try match operands as is. m=match_two_ops (get_op1 (expr), get_op1 (pattern), get_op2 (expr), get_op2 (pattern)) if m!=None: return m # if matching unsuccessful, AND operation is commutative, try also swapped operands. if is_commutative (get_op (expr))==False: return None return match_two_ops (get_op1 (expr), get_op2 (pattern), get_op2 (expr), get_op1 (pattern)) # returns dict in case of success, or None def match (expr, pattern): t=get_expr_type(pattern) if t=="EXPR_WILDCARD": return match_EXPR_WILDCARD (expr, pattern) elif t=="EXPR_WILDCARD_VALUE": return match_EXPR_WILDCARD_VALUE (expr, pattern) elif t=="EXPR_SYMBOL": if expr==pattern: return {} else: return None elif t=="EXPR_VALUE": if expr==pattern: return {} else: return None elif t=="EXPR_OP": return match_EXPR_OP (expr, pattern) else: raise AssertionError \end{lstlisting} Now how we will use it: \begin{lstlisting} # (X*A)*B -> X*(A*B) def reduce_MUL1 (expr): m=match (expr, create_binary_expr ("*", (create_binary_expr ("*", bind_expr ("X"), bind_value ("A"))), bind_value ("B"))) if m==None: return expr # no match return dbg_print_reduced_expr ("reduce_MUL1", expr, create_binary_expr ("*", m["X"], # new op1 create_val_expr (m["A"] * m["B"]))) # new op2 \end{lstlisting} We take input expression, and we also construct pattern to be matched. Matcher works recursively over both expressions synchronously. Pattern is also expression, but can use two additional node types: \emph{EXPR\_WILDCARD} and \emph{EXPR\_WILDCARD\_VALUE}. These nodes are supplied with keys (stored as strings). When matcher encounters \emph{EXPR\_WILDCARD} in pattern, it just stashes current expression and will return it. If matcher encounters \emph{EXPR\_WILDCARD\_VALUE}, it does the same, but only in case the current node has \emph{EXPR\_VALUE} type. \TT{bind\_expr()} and \TT{bind\_value()} are functions which create nodes with the types we have seen. All this means, \TT{reduce\_MUL1()} function will search for the expression in form \emph{(X*A)*B}, where $A$ and $B$ are numbers. In other cases, matcher will return input expression untouched, so these reducing function can be chained. Now when \TT{reduce\_MUL1()} encounters (sub)expression we are interesting in, it will return dictionary with keys and expressions. Let's add \TT{print m} call somewhere before return and rerun: \begin{lstlisting} python td.py tests/add2.s ... going to reduce (((arg1 + arg1) + (arg1 + arg1)) + ((arg1 + arg1) + (arg1 + arg1))) reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2) reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2) reduction in reduce_ADD1() ((arg1 * 2) + (arg1 * 2)) -> ((arg1 * 2) * 2) {'A': 2, 'X': ('EXPR_SYMBOL', 'arg1'), 'B': 2} reduction in reduce_MUL1() ((arg1 * 2) * 2) -> (arg1 * 4) reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2) reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2) reduction in reduce_ADD1() ((arg1 * 2) + (arg1 * 2)) -> ((arg1 * 2) * 2) {'A': 2, 'X': ('EXPR_SYMBOL', 'arg1'), 'B': 2} reduction in reduce_MUL1() ((arg1 * 2) * 2) -> (arg1 * 4) reduction in reduce_ADD1() ((arg1 * 4) + (arg1 * 4)) -> ((arg1 * 4) * 2) {'A': 4, 'X': ('EXPR_SYMBOL', 'arg1'), 'B': 2} reduction in reduce_MUL1() ((arg1 * 4) * 2) -> (arg1 * 8) going to reduce (arg1 * 8) ... result=(arg1 * 8) \end{lstlisting} The dictionary has keys we supplied plus expressions matcher found. We then can use them to create new expression and return it. Numbers are just summed while forming second operand to ``*'' operation. Now a real-world optimization technique---optimizing GCC replaced multiplication by 31 by shifting and subtraction operations: \begin{lstlisting} mov rax, rdi sal rax, 5 sub rax, rdi \end{lstlisting} Without reduction functions, our decompiler will translate this into \emph{((arg1 << 5) - arg1)}. We can replace shifting left by multiplication: \begin{lstlisting} # X< X*(2^n) def reduce_SHL1 (expr): m=match (expr, create_binary_expr ("<<", bind_expr ("X"), bind_value ("Y"))) if m==None: return expr # no match return dbg_print_reduced_expr ("reduce_SHL1", expr, create_binary_expr ("*", m["X"], create_val_expr (1< X*(n-1) def reduce_SUB3 (expr): m=match (expr, create_binary_expr ("-", create_binary_expr ("*", bind_expr("X1"), bind_value ("N")), bind_expr("X2"))) if m!=None and match (m["X1"], m["X2"])!=None: return dbg_print_reduced_expr ("reduce_SUB3", expr, create_binary_expr ("*", m["X1"], create_val_expr (m["N"]-1))) else: return expr # no match \end{lstlisting} Matcher will return two X's, and we must be assured that they are equal. In fact, in previous versions of this toy decompiler, I did comparison with plain ``=='', and it worked. But we can reuse \TT{match()} function for the same purpose, because it will process commutative operations better. For example, if X1 is ``Q+1'' and X2 is ``1+Q'', expressions are equal, but plain ``=='' will not work. On the other side, \TT{match()} function, when encounter ``+'' operation (or another commutative operation), and it fails with comparison, it will also try swapped operand and will try to compare again. However, to understand it easier, for a moment, you can imagine there is ``=='' instead of the second \TT{match()}. Anyway, here is what we've got: \begin{lstlisting} working out tests/mul31_GCC.s going to reduce ((arg1 << 5) - arg1) reduction in reduce_SHL1() (arg1 << 5) -> (arg1 * 32) reduction in reduce_SUB3() ((arg1 * 32) - arg1) -> (arg1 * 31) going to reduce (arg1 * 31) ... result=(arg1 * 31) \end{lstlisting} Another optimization technique is often seen in ARM thumb code: AND-ing a value with a value like 0xFFFFFFF0, is implemented using shifts: \begin{lstlisting} mov rax, rdi shr rax, 4 shl rax, 4 \end{lstlisting} This code is quite common in ARM thumb code, because it's a headache to encode 32-bit constants using couple of 16-bit thumb instructions, while single 16-bit instruction can shift by 4 bits left or right. Also, the expression \emph{(x>>4)<<4} can be jokingly called as ``twitching operator'': I've heard the ``-{}-i++'' expression was called like this in Russian-speaking social networks, it was some kind of meme (``operator podergivaniya''). Anyway, these reduction functions will be used: \begin{lstlisting} # X>>n -> X / (2^n) ... def reduce_SHR2 (expr): m=match(expr, create_binary_expr(">>", bind_expr("X"), bind_value("Y"))) if m==None or m["Y"]>=64: return expr # no match return dbg_print_reduced_expr ("reduce_SHR2", expr, create_binary_expr ("/", m["X"], create_val_expr (1< X*(2^n) def reduce_SHL1 (expr): m=match (expr, create_binary_expr ("<<", bind_expr ("X"), bind_value ("Y"))) if m==None: return expr # no match return dbg_print_reduced_expr ("reduce_SHL1", expr, create_binary_expr ("*", m["X"], create_val_expr (1< X&(~((2^n)-1)) def reduce_MUL2 (expr): m=match(expr, create_binary_expr ("*", create_binary_expr ("/", bind_expr("X"), bind_value("N1")), bind_value("N2"))) if m==None or m["N1"]!=m["N2"] or is_2n(m["N1"])==False: # short-circuit expression return expr # no match return dbg_print_reduced_expr("reduce_MUL2", expr, create_binary_expr ("&", m["X"], create_val_expr(~(m["N1"]-1)&0xffffffffffffffff))) \end{lstlisting} Now the result: \begin{lstlisting} working out tests/AND_by_shifts2.s going to reduce ((arg1 >> 4) << 4) reduction in reduce_SHR2() (arg1 >> 4) -> (arg1 / 16) reduction in reduce_SHL1() ((arg1 / 16) << 4) -> ((arg1 / 16) * 16) reduction in reduce_MUL2() ((arg1 / 16) * 16) -> (arg1 & 0xfffffffffffffff0) going to reduce (arg1 & 0xfffffffffffffff0) ... result=(arg1 & 0xfffffffffffffff0) \end{lstlisting} \myheading{Division using multiplication} Division is often replaced by multiplication for performance reasons. From school-level arithmetic, we can remember that division by 3 can be replaced by multiplication by $\frac{1}{3}$. In fact, sometimes compilers do so for floating-point arithmetic, for example, FDIV instruction in x86 code can be replaced by FMUL. At least MSVC 6.0 will replace division by 3 by multiplication by $\frac{1}{3}$ and sometimes it's hard to be sure, what operation was in original source code. But when we operate over integer values and CPU registers, we can't use fractions. However, we can rework fraction: \begin{center} {\large $result = \frac{x}{3} = x \cdot \frac{1}{3} = x \cdot \frac{1 \cdot MagicNumber}{3 \cdot MagicNumber}$} \end{center} Given the fact that division by $2^n$ is very fast, we now should find that $MagicNumber$, for which the following equation will be true: $2^n = 3 \cdot MagicNumber$. This code performing division by 10: \begin{lstlisting} mov rax, rdi movabs rdx, 0cccccccccccccccdh mul rdx shr rdx, 3 mov rax, rdx \end{lstlisting} Division by $2^{64}$ is somewhat hidden: lower 64-bit of product in RAX is not used (dropped), only higher 64-bit of product (in RDX) is used and then shifted by additional 3 bits. RDX register is set during processing of MUL/IMUL like this: \begin{lstlisting} def handle_unary_MUL_IMUL (registers, op1): op1_expr=register_or_number_in_string_to_expr (registers, op1) result=create_binary_expr ("*", registers["rax"], op1_expr) registers["rax"]=result registers["rdx"]=create_binary_expr (">>", result, create_val_expr(64)) \end{lstlisting} In other words, the assembly code we have just seen multiplication operations by {\Large $\frac{0cccccccccccccccdh}{2^{64+3}}$}, or divides by {\Large $\frac{2^{64+3}}{0cccccccccccccccdh}$}. To find divisor we just have to divide numerator by denominator. \begin{lstlisting} # n = magic number # m = shifting coefficient # return = 1 / (n / 2^m) = 2^m / n def get_divisor (n, m): return (2**float(m))/float(n) # (X*n)>>m, where m>=64 -> X/... def reduce_div_by_MUL (expr): m=match (expr, create_binary_expr(">>", create_binary_expr ("*", bind_expr("X"), bind_value("N")), bind_value("M"))) if m==None: return expr # no match divisor=get_divisor(m["N"], m["M"]) return dbg_print_reduced_expr ("reduce_div_by_MUL", expr, create_binary_expr ("/", m["X"], create_val_expr (int(divisor)))) \end{lstlisting} This works, but we have a problem: this rule takes \emph{(arg1 * 0xcccccccccccccccd) >> 64} expression first and finds divisor to be equal to $1.25$. This is correct: result is shifted by 3 bits after (or divided by 8), and $1.25 \cdot 8 = 10$. But our toy decompiler doesn't support real numbers. We can solve this problem in the following way: if divisor has fractional part, we postpone reducing, with a hope, that two subsequent right shift operations will be reduced into single one: \begin{lstlisting} # (X*n)>>m, where m>=64 -> X/... def reduce_div_by_MUL (expr): m=match (expr, create_binary_expr(">>", create_binary_expr ("*", bind_expr("X"), bind_value("N")), bind_value("M"))) if m==None: return expr # no match divisor=get_divisor(m["N"], m["M"]) if math.floor(divisor)==divisor: return dbg_print_reduced_expr ("reduce_div_by_MUL", expr, create_binary_expr ("/", m["X"], create_val_expr (int(divisor)))) else: print "reduce_div_by_MUL(): postponing reduction, because divisor=", divisor return expr \end{lstlisting} That works: \begin{lstlisting} working out tests/div_by_mult10_unsigned.s going to reduce (((arg1 * 0xcccccccccccccccd) >> 64) >> 3) reduce_div_by_MUL(): postponing reduction, because divisor= 1.25 reduction in reduce_SHR1() (((arg1 * 0xcccccccccccccccd) >> 64) >> 3) -> ((arg1 * 0xcccccccccccccccd) >> 67) going to reduce ((arg1 * 0xcccccccccccccccd) >> 67) reduction in reduce_div_by_MUL() ((arg1 * 0xcccccccccccccccd) >> 67) -> (arg1 / 10) going to reduce (arg1 / 10) result=(arg1 / 10) \end{lstlisting} I don't know if this is best solution. In early version of this decompiler, it processed input expression in two passes: first pass for everything except division by multiplication, and the second pass for the latter. I don't know which way is better. Or maybe we could support real numbers in expressions? Couple of words about better understanding division by multiplication. Many people miss ``hidden'' division by $2^{32}$ or $2^{64}$, when lower 32-bit part (or 64-bit part) of product is not used (or just dropped). Also, there is misconception that modulo inverse is used here. This is close, but not the same thing. Extended Euclidean algorithm is usually used to find \emph{magic coefficient}, but in fact, this algorithm is rather used to solve the equation. You can solve it using any other method. Also, needless to mention, the equation is unsolvable for some divisors, because this is diophantine equation (i.e., equation allowing result to be only integer), since we work on integer CPU registers, after all. \levelup{}