\myheading{Integer overflow}

\leveldown{}

\renewcommand{\CURPATH}{verif/int_over}

This is a classic bug:

\begin{lstlisting}[style=customc]
void allocate_data_for_some_chunks(int num)
{
#define MAX_CHUNKS 10

	if (num>MAX_CHUNKS)
		// throw error

	void* chunks=malloc(num*sizeof(CHUNK));

	...
};
\end{lstlisting}

Seems innocent? However, if a (remote) attacker can put a negative value into $num$, no exception is to be thrown,
and \verb|malloc()| will crash on too big input value, because \verb|malloc()| takes unsigned \verb|size_t| on input.
\textit{unsigned int} should be used instead of \textit{int} for $num$,
but many programmers use \textit{int} as a generic type for everything.

\myheading{Signed addition}

First, let's start with addition.
$a+b$ also seems innocent, but it is producing incorrect result if a sum doesn't fit into 32/64-bit register.

This is what we will do: evaluate an expression on two \ac{ALU}s: 32-bit one and 64-bit one:

%                                          +------------+
%a ---+----------------------------------> |            |
%     |                                    | 32-bit ALU | -- [sign extend to 64-bit] --> |\
%b -- | -+-------------------------------> |            |                                | \
%     |  |                                 +------------+                                |  \
%     |  |                                                                               |cmp| -> true
%     |  |                                 +------------+                                |  /
%     +- | -- [sign extend to 64 bits] --> |            |                                | /
%        |                                 | 64-bit ALU | -----------------------------> |/
%        +--- [sign extend to 64 bits] --> |            |
%                                          +------------+

\iffalse
\begin{figure}[H]
\centering
\begin{tikzpicture}[node distance=2cm,scale=0.8,every node/.append style={transform shape}]
  \tikzstyle{block} = [draw, rectangle, minimum height=3em, minimum width=6em]
  \tikzstyle{bigblock} = [draw, rectangle, minimum height=6em, minimum width=6em]
  \node [bigblock] (ALU32) {32-bit ALU};
  \node [bigblock, below of=ALU32, node distance=3cm] (ALU64) {64-bit ALU};
  \node [block, right of=ALU32, node distance=4cm] (SE641) {sign-extend to 64 bits};
  \node [block] (SE642) [above left=-1cm and 1cm of ALU64] {sign-extend to 64 bits};
  \node [block] (SE643) [below left=-1cm and 1cm of ALU64] {sign-extend to 64 bits};
  \draw [->] (SE642) -- ([yshift=-0.5cm]ALU64.north west);
  \draw [->] (SE643) -- ([yshift=0.5cm]ALU64.south west);
  \node [bigblock] (EQ) [below right=0cm and 1cm of SE641] {EQ};
  \node [block] (A) [above left=-1cm and 7cm of ALU32] {X};
  \node [block] (B) [below left=-1cm and 7cm of ALU32] {Y};
  \draw [->] (A) -- ([yshift=-0.5cm]ALU32.north west);
  \draw [->] (B) -- ([yshift=0.5cm]ALU32.south west);
  \draw [->] (ALU32) -- (SE641);
  \draw [->] (SE641) -- ([yshift=-0.5cm]EQ.north west);
  \draw [->] (ALU64) -- ([yshift=+0.5cm]EQ.south west);
  \node [bigblock] (true) [right=1cm of EQ] {must be true};
  \draw [->] (EQ) -- (true);
  \draw [->] (A.east) to [out=0,in=180] (SE642.west);
  \draw [->] (B.east) to [out=0,in=180] (SE643.west);
\end{tikzpicture}
\end{figure}
\fi

\begin{figure}[H]
\includesvg[width=\textwidth]{\CURPATH/4.svg}
\end{figure}

In other words, you want your expression to be evaluated on both \ac{ALU}s correctly, for all possible inputs, right?
Like if the result of 32-bit \ac{ALU} is always fit into 32-bit register.

And now we can ask Z3 SMT solver to find such an a/b inputs, for which the final comparison will fail.

Needless to say, the default operations (+, -, comparisons, etc) in Z3's Python API are signed, you can see this
here\footnote{\url{https://github.com/Z3Prover/z3/blob/master/src/api/python/z3/z3.py}}.

Also, we can find the lower bound, or minimal possible inputs, using \verb|minimize()|:

\lstinputlisting[style=custompy]{verif/int_over/1.py}

\begin{lstlisting}
a32=0x1 or 1
b32=0x7fffffff or 2147483647
out32=0x80000000 or -2147483648
out32_extended=0xffffffff80000000 or -2147483648
a64=0x1 or 1
b64=0x7fffffff or 2147483647
out64=0x80000000 or 2147483648
\end{lstlisting}

Right, \verb|1+0x7fffffff = 0x80000000|.
But the \verb|0x80000000| value is negative already, because \ac{MSB} is 1.
However, add this on 64-bit \ac{ALU} and the result will fit in 64-bit register.

How would we fix this problem?
We can devise a special function with \textit{wrapped} addition:

\begin{lstlisting}[style=customc]
/* Returns: a + b */

/* Effects: aborts if a + b overflows */

COMPILER_RT_ABI si_int
__addvsi3(si_int a, si_int b)
{
    si_int s = (su_int) a + (su_int) b;
    if (b >= 0)
    {
        if (s < a)
            compilerrt_abort();
    }
    else
    {
        if (s >= a)
            compilerrt_abort();
    }
    return s;
}
\end{lstlisting}

( \url{https://github.com/llvm-mirror/compiler-rt/blob/master/lib/builtins/addvsi3.c} )

Now I can simulate this function using Z3Py.
I'm telling it: ``find a solution, where this expression will be false'':

\begin{lstlisting}
s.add(Not(If(b32>=0, a32+b32<a32, a32+b32>a32)))
\end{lstlisting}

And it gives \emph{unsat}, meaning, there is no counterexample, so the expression can be evaluated safely on both \ac{ALU}s.

But is there a bug in my statement?
Let's check.
Find inputs for which this piece of LLVM code will call \verb|compilerrt_abort()|:

\begin{lstlisting}
s.add(If(b32>=0, a32+b32<a32, a32+b32>a32))
\end{lstlisting}

\begin{lstlisting}
a32=0x1 or 1
b32=0x7fffffff or 2147483647
out32=0x80000000 or -2147483648
out32_extended=0xffffffff80000000 or -2147483648
a64=0x1 or 1
b64=0x7fffffff or 2147483647
out64=0x80000000 or 2147483648
\end{lstlisting}

Safe implementations of other operations:
\url{https://wiki.sei.cmu.edu/confluence/display/java/NUM00-J.+Detect+or+prevent+integer+overflow}.
A popular library:
\url{https://github.com/dcleblanc/SafeInt}.

\myheading{Arithmetic mean}

Another classic bug. This is the famous bug in binary search algorithms
\footnote{
  \url{https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html},
  \url{https://thebittheories.com/the-curious-case-of-binary-search-the-famous-bug-that-remained-undetected-for-20-years-973e89fc212},
  also, see the ``Beautiful Code'' book (2007), Chapter 7, \textit{That Pesky Binary Search}.}.
The bug itself not in binary search algorithm, but in calculating arithmetic mean:

\begin{lstlisting}[style=custompy]
def func(a,b):
    return (a+b)/2
\end{lstlisting}

\begin{lstlisting}
a32=0x1 or 1
b32=0x7fffffff or 2147483647
out32=0xc0000000 or -1073741824
out32_extended=0xffffffffc0000000 or -1073741824
a64=0x1 or 1
b64=0x7fffffff or 2147483647
out64=0x40000000 or 1073741824
\end{lstlisting}

We can fix this function using a seemingly esoteric Dietz formula, used to do the same, but without integer overflow:

\begin{lstlisting}[style=custompy]
def func(a,b):
    return ((a^b)>>1) + (a&b)
\end{lstlisting}

( Its internal workings is described here: \ref{Dietz} )

Z3 gives \emph{unsat} for this function, because it can't find counterexample.

\myheading{Allocate memory for some chunks}

Let's return to the \verb|allocate_data_for_some_chunks()| function at the beginning of this section.

\lstinputlisting[style=custompy]{verif/int_over/2.py}

For which $a$ values will fail the \verb|a*1024| expression?
This is a smallest $a$ input:

\begin{lstlisting}
a32=0x200000 or 2097152
out32=0x80000000 or -2147483648
out32_extended=0xffffffff80000000 or -2147483648
a64=0x200000 or 2097152
out64=0x80000000 or 2147483648
\end{lstlisting}

OK, let's pretend we inserted a \verb|assert (a<100)| before \verb|malloc()| call:

\begin{lstlisting}
s.add(a32<100)
\end{lstlisting}

\begin{lstlisting}
a32=0x80000000 or -2147483648
out32=0x0 or 0
out32_extended=0x0 or 0
a64=0xffffffff80000000 or -2147483648
out64=0xfffffe0000000000 or -2199023255552
\end{lstlisting}

Still, an attacker can pass a negative $a=-2147483648$, and \verb|malloc()| will fail.

Let's pretend, we added a \verb|assert (a>0)| before calling \verb|malloc()|:

\begin{lstlisting}
s.add(a32>0)
\end{lstlisting}

Now Z3 can't find any counterexample.

Some people say you should use functions like \verb|reallocarray()| to be protected from integer overflows:
\url{http://lteo.net/blog/2014/10/28/reallocarray-in-openbsd-integer-overflow-detection-for-free/}.

\myheading{abs()}

Also seemingly innocent function:

\begin{lstlisting}[style=custompy]
def func(a):
    return If(a<0, -a, a)
\end{lstlisting}

\begin{lstlisting}
a32=0x80000000 or -2147483648
out32=0x80000000 or -2147483648
out32_extended=0xffffffff80000000 or -2147483648
a64=0xffffffff80000000 or -2147483648
out64=0x80000000 or 2147483648
\end{lstlisting}

This is an artifact of two's complement system.
This is \verb|INT_MIN|, and \verb|-INT_MIN == INT_MIN|.
It can lead to nasty bugs, and classic one is a naive implementations of \verb|itoa()| or \verb|printf()|.

Take a look at this implementation of \verb|itoa()| function from [\KRBook]:

\begin{lstlisting}[style=customc]
void itoa(int n, char s[])
{
        int i, sign;
        if ((sign = n) < 0) /* record sign */
                n = -n; /* make n positive */
        i = 0;
        do { /* generate digits in reverse order */
                s[i++] = n % 10 + '0'; /* get next digit */
        } while ((n /= 10) > 0); /* delete it */
        if (sign < 0)
                s[i++] = '-';
        s[i] = '\0';
        strrev(s);
}
\end{lstlisting}

( The full source code: \url{\RepoURL/\CURPATH/itoa_KR.c} )

From [\KRBook]:

\begin{framed}
\begin{quotation}
Exercise 3-4. In a two's complement number representation, our version of \emph{itoa}
does not handle the largest negative number, that is, the value
of \emph{n} equal to $-(2^{wordsize-1})$. Explain why not. Modify it to
print that value correctly, regardless of the machine on which
it runs.
\end{quotation}
\end{framed}

The correct answer is: the function cannot process largest negative number (\verb|INT_MIN| or 0x80000000 or -2147483648) correctly.

So, suppose, you print a signed value.
And you write something like:

\begin{lstlisting}[style=customc]
if (input<0)
{
	input=-input;
	printf ("-"); // print leading minus
};

// print digits in (positive) input:
...
\end{lstlisting}

If a \verb|INT_MIN| value (\verb|0x80000000|) is passed, minus sign is printed,
but the \verb|input| variable still contain negative value.
An additional check for \verb|INT_MIN| is to be added to fix this.

This is also called \emph{undefined behaviour} in C/C++.
The problem is that C language itself is old enough to be a witness of \emph{old iron} -- computers which could
represent signed numbers in other ways than two's complement representation:
\url{ttps://en.wikipedia.org/wiki/Signed_number_representations}.

For this reason, C standard doesn't guarantee that $-1$ will be \verb|0xffffffff| (all bits set) on 32-bit registers,
because the standard can't guarantee you will run on a hardware with two's complement representation of signed numbers.

However, almost all hardware you can currently use and buy uses two's complement.

More about the \verb|abs()| problem:

\begin{lstlisting}
This can become a security issue. I have seen one instance in the vasprintf implementation of libiberty,
which is part of gcc, binutils and some other GNU software. vasprintf walks over the format string and 
tries to estimate how much space it will need to hold the formatted result string. In format strings, 
there is a construct %.*s or %*s, which means that the actual value should be taken from the stack. 
The libiberty code did it like this:

          if (*p == '*')
            {
              ++p;
              total_width += abs (va_arg (ap, int));
            }

This is actually two issues in one. The first issue is that total_width can overflow. The second issue 
is the one that is interesting in this context: abs can return a negative number, causing the code 
to allocate not enough space and thus cause a buffer overflow. 
\end{lstlisting}

( \url{http://www.fefe.de/intof.html} )

\myheading{Games}

A lot of video games are prone to integer overflow.
Which are exploited actively by gamers.
As of NetHack:
\url{https://nethackwiki.com/wiki/Integer_overflow},
\url{https://nethackwiki.com/wiki/Negative_gold}.

\myheading{Year 2038}

And of course, the number of seconds since the year 1970 wouldn't fit in 32-bit variable after 2038.

\myheading{Summary}

What we did here, is we checked, if a result of an expression can fit in 32-bit register.
Probably, you can use a narrower second \ac{ALU}, than a 64-bit one.

\myheading{Further work}

If you want to catch overflows on unsigned variables, use unsigned Z3 operations instead of signed,
and do zero extend instead of sign extend.

\myheading{Some discussion}

\url{https://news.ycombinator.com/item?id=18521769}

\myheading{Further reading}

\begin{itemize}
\item \href{https://www.cs.utah.edu/~regehr/papers/tosem15.pdf}{Understanding Integer Overflow in C/C++}.

\item \href{https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/z3prefix.pdf}{Modular Bug-finding for Integer Overflows in the Large: Sound, Efficient, Bit-precise Static Analysis}.

\item blexim -- Basic Integer Overflows\footnote{\url{http://phrack.org/issues/60/10.html}}

\item \href{http://fmv.jku.at/c32sat/}{C32SAT}.
\end{itemize}

\levelup{}