\myheading{Path constraint} \leveldown{} \myheading{Example 1} How to get weekday from UNIX timestamp? \begin{lstlisting} #!/usr/bin/env python input=... SECS_DAY=24*60*60 dayno = input / SECS_DAY wday = (dayno + 4) % 7 if wday==5: print "Thanks God, it's Friday!" \end{lstlisting} Let's say, we should find a way to run the block with print() call in it. What input value should be? First, let's build expression of $wday$ variable: \lstinputlisting[style=custompy]{\CURPATH/6_TGIF/TGIF1.py} \begin{lstlisting} cond: ((((input/86400)+4)%7)==5) \end{lstlisting} In order to execute the block, we should solve this equation: $((\frac{input}{86400}+4) \equiv 5 \mod 7$. So far, this is easy task for Z3: \lstinputlisting[style=custompy]{\CURPATH/6_TGIF/Z3_solve1.py} \begin{lstlisting} [x = 86400, dayno = 1] \end{lstlisting} This is indeed correct UNIX timestamp for Friday: \begin{lstlisting} % date -d @86400 -u "+%a %Y-%m-%d %H:%M:%S" Fri 1970-01-02 00:00:00 \end{lstlisting} Though the date back in year 1970, but it's still correct! This is also called ``path constraint'', i.e., what constraint must be satisfied to execute specific block? Several tools has ``path'' in their names, like ``pathgrind'', \href{http://babelfish.arc.nasa.gov/trac/jpf/wiki/projects/jpf-symbc}{Symbolic PathFinder}, CodeSurfer Path Inspector, etc. Like the shell game, this task is also often encounters in practice. You can see that something dangerous can be executed inside some basic block and you're trying to deduce, what input values can cause execution of it. It may be buffer overflow, etc. Such input values are sometimes also called ``inputs of death''. Many crackmes are solved in this way, all you need is find a path into block which prints ``key is correct'' or something like that. We can extend this tiny example: \begin{lstlisting} input=... SECS_DAY=24*60*60 dayno = input / SECS_DAY wday = (dayno + 4) % 7 print wday if wday==5: print "Thanks God, it's Friday!" else: print "Got to wait a little" \end{lstlisting} Now we have two blocks: for the first we should solve this equation: $((\frac{input}{86400}+4) \equiv 5 \mod 7$. But for the second we should solve inverted equation: $((\frac{input}{86400}+4) \not\equiv 5 \mod 7$. By solving these equations, we will find two paths into both blocks. KLEE (or similar tool) tries to find path to each [basic] block and produces ``ideal'' unit test. Hence, KLEE can find a path into the block which crashes everything, or reporting about correctness of the input key/license, etc. Surprisingly, KLEE can find backdoors in the very same manner. KLEE is also called ``KLEE Symbolic Virtual Machine'' -- by that its creators mean that the KLEE is \ac{VM} which executes a code symbolically rather than numerically (like usual \ac{CPU}). % \myheading{KLEE} How KLEE would handle this example? \lstinputlisting[style=customc]{\CURPATH/6_TGIF/TGIF_KLEE1.c} KLEE found two inputs, the one for 'return 1' and the second for 'return 0': \lstinputlisting{\CURPATH/6_TGIF/TGIF_KLEE1.txt} So two inputs: 0 and 86400. One for not-Friday, another is for Friday: \begin{lstlisting} % date -d @86400 -u "+%a %Y-%m-%d %H:%M:%S" Fri 1970-01-02 00:00:00 % date -d @0 -u "+%a %Y-%m-%d %H:%M:%S" Thu 1970-01-01 00:00:00 \end{lstlisting} KLEE can produce SMT-LIB file for SMT-solver, if the '--use-query-log=solver:smt2' option supplied: 'solver-queries.smt2'. This is excerpt: \begin{lstlisting} ; Query 1 -- Type: InitialValues, Instructions: 11658 (set-option :produce-models true) (set-logic QF_AUFBV ) (declare-fun input () (Array (_ BitVec 32) (_ BitVec 8) ) ) (assert (= (_ bv5 32) (bvurem (bvadd (_ bv4 32) (bvudiv (concat (select input (_ bv3 32) ) (concat (select input (_ bv2 32) ) (concat (select input (_ bv1 32) ) (select input (_ bv0 32) ) ) ) ) (_ bv86400 32) ) ) (_ bv7 32) ) ) ) (check-sat) (get-value ( (select input (_ bv0 32) ) ) ) (get-value ( (select input (_ bv1 32) ) ) ) (get-value ( (select input (_ bv2 32) ) ) ) (get-value ( (select input (_ bv3 32) ) ) ) \end{lstlisting} It divided 32-bit uint to 4 bytes. Let's run this in Z3: \begin{lstlisting} z3 -smt2 solver-queries.smt2_excerpt sat (((select input (_ bv0 32) ) #x80)) (((select input (_ bv1 32) ) #xc6)) (((select input (_ bv2 32) ) #x41)) (((select input (_ bv3 32) ) #xc8)) \end{lstlisting} 0xc841c680=3359753856. It's still friday, but in future: \begin{lstlisting} % date -d @3359753856 -u "+%a %Y-%m-%d %H:%M:%S" Fri 2076-06-19 00:57:36 \end{lstlisting} % \myheading{CBMC} And CBMC? \lstinputlisting[style=customc]{\CURPATH/6_TGIF/TGIF_CBMC1.c} I'll ask CBMC to find such an input so that assert() would trigger: \begin{lstlisting} % cbmc --trace --function check TGIF_CBMC1.c ... State 21 file TGIF_CBMC1.c function __CPROVER__start line 15 thread 0 ---------------------------------------------------- INPUT input: 86401ul (00000000 00000000 00000000 00000000 00000000 00000001 01010001 10000001) ... Violated property: file TGIF_CBMC1.c function check line 17 thread 0 assertion is_it_friday(input)==0 return_value_is_it_friday == 0 \end{lstlisting} This value is for Friday again: \begin{lstlisting} % date -d @86401 -u "+%a %Y-%m-%d %H:%M:%S" Fri 1970-01-02 00:00:01 \end{lstlisting} CBMC can also produce an SMT-LIB 2.x file for a SMT solver, with this problem converted to SMT-LIB 2.x: \begin{lstlisting} % cbmc --smt2 --outfile TGIF_CBMC1_dest.smt --trace --function check TGIF_CBMC1.c \end{lstlisting} It's not very verbose, thanks to the tiny size of our problem: \begin{lstlisting}[caption=Excerpt] ; set_to true (equal) (define-fun |is_it_friday::1::SECS_DAY!0@1#2| () (_ BitVec 32) (_ bv86400 32)) ; set_to true (equal) (define-fun |is_it_friday::1::dayno!0@1#2| () (_ BitVec 32) (bvudiv |is_it_friday::input!0@1#1| (_ bv86400 32))) ; set_to true (equal) (define-fun |is_it_friday::1::wday!0@1#2| () (_ BitVec 32) (bvurem (bvadd (_ bv4 32) |is_it_friday::1::dayno!0@1#2|) (_ bv7 32))) \end{lstlisting} Full version: \url{\RepoURL/\CURPATH/6_TGIF/TGIF_CBMC1_dest.smt}. I can't stand the urge and I'll execute this using Z3: \begin{lstlisting} % z3 -smt2 TGIF_CBMC1_dest.smt \end{lstlisting} ( \url{\RepoURL/\CURPATH/6_TGIF/TGIF_CBMC1_dest.smt.result} ) This is the computed $input$ value: \begin{lstlisting} ... ((|__CPROVER__start::input!0@1#2| #x00000000b1796100)) ... \end{lstlisting} 0xb1796100=2977521920 Still, this is Friday, but the Year is 2064: \begin{lstlisting} % date -d @2977521920 -u "+%a %Y-%m-%d %H:%M:%S" Fri 2064-05-09 01:25:20 \end{lstlisting} By the way, if the '--z3' option used with CBMC, it will spawn Z3 with that file. \myheading{Example 2} Let's make our example more realistic. We can go to beer pub within some time range, say, between 18:00 and 00:00. \lstinputlisting[style=custompy]{\CURPATH/6_TGIF/TGIF2.py} Now we have 3 conditions in system of equations: \begin{lstlisting} cond1: ((((input/86400)+4)%7)==5) cond2: (((input%86400)/3600)>=18) cond3: (((input%86400)/3600)<=23) \end{lstlisting} It's an easy problem for Z3: \lstinputlisting[style=custompy]{\CURPATH/6_TGIF/Z3_solve2.py} \begin{lstlisting} [x = 154799, hour = 18, dayno = 1] \end{lstlisting} UTC time, but correct: \begin{lstlisting} % date -d @154799 -u "+%a %Y-%m-%d %H:%M:%S" Fri 1970-01-02 18:59:59 \end{lstlisting} Reworked KLEE example: \lstinputlisting[style=customc]{\CURPATH/6_TGIF/TGIF_KLEE2.c} KLEE finds 3 paths: \begin{lstlisting} % time klee --libc=uclibc --optimize --use-query-log=solver:smt2 TGIF_KLEE2.bc ... KLEE: done: total instructions = 11820 KLEE: done: completed paths = 2 KLEE: done: partially completed paths = 1 KLEE: done: generated tests = 3 ... % ls klee-last/* klee-last/assembly.ll klee-last/run.stats klee-last/test000001.ktest klee-last/info klee-last/solver-queries.smt2 klee-last/test000002.ktest klee-last/messages.txt klee-last/test000001.external.err klee-last/test000003.ktest klee-last/run.istats klee-last/test000001.kquery klee-last/warnings.txt % ktest-tool klee-last/test000001.ktest | grep uint object 0: uint: 159744 % ktest-tool klee-last/test000002.ktest | grep uint object 0: uint: 0 % ktest-tool klee-last/test000003.ktest | grep uint object 0: uint: 86400 \end{lstlisting} So there are 3 values KLEE suggests: \begin{itemize} \item 0 (not friday) \item 86400 (just friday) \item 159744 (friday plus hour in needed range) \end{itemize} \begin{lstlisting} % date -d @159744 -u "+%a %Y-%m-%d %H:%M:%S" Fri 1970-01-02 20:22:24 \end{lstlisting} What SMT file KLEE prepares? Now it consists of two SMT queries. First for first 'if' statement, second is for second: \lstinputlisting{\CURPATH/6_TGIF/TGIF_KLEE2_solver-queries.smt2_excerpt} Now reworked example for CBMC: \lstinputlisting[style=customc]{\CURPATH/6_TGIF/TGIF_CBMC1.c} \begin{lstlisting} % cbmc --trace --function check TGIF_CBMC2.c ... ** Results: TGIF_CBMC2.c function check [check.assertion.1] line 27 assertion lets_party(input)==0: FAILURE Trace for check.assertion.1: State 21 file TGIF_CBMC2.c function __CPROVER__start line 25 thread 0 ---------------------------------------------------- INPUT input: 165600ul (00000000 00000000 00000000 00000000 00000000 00000010 10000110 11100000) ... \end{lstlisting} \begin{lstlisting} % date -d @165600 -u "+%a %Y-%m-%d %H:%M:%S" Fri 1970-01-02 22:00:00 \end{lstlisting} The SMT-LIB file generated by CBMC: \begin{lstlisting}[caption=Excerpt] ; set_to true (equal) (define-fun |lets_party::1::SECS_HOUR!0@1#2| () (_ BitVec 32) (_ bv3600 32)) ; set_to true (equal) (define-fun |lets_party::1::SECS_DAY!0@1#2| () (_ BitVec 32) (_ bv86400 32)) ; set_to true (equal) (define-fun |lets_party::1::dayno!0@1#2| () (_ BitVec 32) (bvudiv |lets_party::input!0@1#1| (_ bv86400 32))) ; set_to true (equal) (define-fun |lets_party::1::wday!0@1#2| () (_ BitVec 32) (bvurem (bvadd (_ bv4 32) |lets_party::1::dayno!0@1#2|) (_ bv7 32))) ; set_to true (equal) (define-fun |lets_party::1::1::hour!0@1#2| () (_ BitVec 32) (bvudiv (bvurem |lets_party::input!0@1#1| (_ bv86400 32)) (_ bv3600 32))) \end{lstlisting} \myheading{Summary} In short, this is how KLEE and CBMC work. They construct a system of equations for each basic block and then ask SMT solver to find correct inputs. \levelup{}