\myheading{\lbrack{}\textvisiblespace{}\textbackslash{}t\rbrack{}*\lbrack{}a-z\rbrack{}+\lbrack{}\textvisiblespace{}\textbackslash{}t\rbrack{}*=\lbrack{}\textvisiblespace{}\textbackslash{}t\rbrack{}*\lbrack{}0-9\rbrack{}+\lbrack{}\textvisiblespace{}\textbackslash{}t\rbrack{}*} \leveldown{} This is the most practically useful example here. It can be used to parse config files in form \textit{option=number}, but whitespaces are also handled -- around input string and around the 'equal' sign. According to my utility, these are (random) sample input strings that can be accepted by the \ac{RE}: \begin{lstlisting} string matched: [zzzzzzzzzwt=09] string matched: [zzzzzzzzztkj=3] string matched: [zzzzzzzzzuj=5 ] string matched: [zzzzzzzzzy=16 ] string matched: [zzzzzzzzzzcr=9] string matched: [zzzzzzzzzzxz=2] string matched: [zzzzzzzzzuca=2] string matched: [zzzzzzzzzyd=44] string matched: [zzzzzzzzzuwj=5] string matched: [zzzzzzzzzvfj=3] string matched: [zzzzzzzzzs =13] string matched: [zzzzzzzzzt = 8] string matched: [zzzzzzzzzuu =5] string matched: [zzzzzzzzzwhi=1] \end{lstlisting} The \ac{DFA} is complicated by first sight, but it's not: \begin{figure}[H] \centering % FIXME. can't include svg %\includesvg[width=0.9\textwidth]{\CURPATH/files/libfsm/config.svg} \includegraphics[width=0.8\textwidth]{\CURPATH/files/config_edited.png} \end{figure} \myheading{Devising my own parser} Now the fun begins. Can we implement our own parser of such config files without the aid of \ac{RE}? It turns out, a very good programming exercise. The first version. Using CBMC I \textit{compare} my version with the pure C \ac{DFA} simulator produced by \textit{libfsm}. (I did that already with other algorithms: \ref{KMP}, \ref{BoyerMoore}.) \lstinputlisting[style=customc]{\CURPATH/files/CBMC/1.c} \lstinputlisting[style=customc]{\CURPATH/files/CBMC/config.c} What can go wrong? Running CBMC: \begin{lstlisting} Trace for check.assertion.1: State 17 file 1.c line 63 function check thread 0 ---------------------------------------------------- s={ ' ', ' ', 'y', '=', 0 } ({ 00100000, 00100000, 01111001, 00111101, 00000000 }) \end{lstlisting} Ouch, my naive parser can't parse the string " y=" correctly. It has no second part (a number) after the 'equality' sign. Probably I can fix it like that: \begin{lstlisting} else break; }; + // option must be non-empty + if (tmp==s) + return -1; // skip spaces while (*s==' ') @@ -48,6 +51,9 @@ else break; }; + // option must be non-empty + if (tmp==s) + return -1; // skip spaces while (*s==' ') \end{lstlisting} But it also has problems: \begin{lstlisting} Trace for check.assertion.1: State 17 file 2.c line 69 function check thread 0 ---------------------------------------------------- s={ '\t', 'p', '=', '0', 0 } ({ 00001001, 01110000, 00111101, 00110000, 00000000 }) \end{lstlisting} Ouch, I forgot about tabs! By code doesn't handle it. \begin{lstlisting} int naive(char* s) { // skip spaces - while (*s==' ') + while ((*s==' ') || (*s=='\t')) s++; char *tmp=s; @@ -27,7 +27,7 @@ return -1; // skip spaces - while (*s==' ') + while ((*s==' ') || (*s=='\t')) s++; // match '=' @@ -36,7 +36,7 @@ s++; // skip spaces - while (*s==' ') + while ((*s==' ') || (*s=='\t')) s++; tmp=s; @@ -56,7 +56,7 @@ return -1; // skip spaces - while (*s==' ') + while ((*s==' ') || (*s=='\t')) s++; \end{lstlisting} Still problems: \begin{lstlisting} Trace for check.assertion.1: State 17 file 3.c line 69 function check thread 0 ---------------------------------------------------- s={ 'j', '=', '9', -72, 0 } ({ 01101010, 00111101, 00111001, 10111000, 00000000 })

-72 is 0xffffffffffffffb8 or 0xb8, it is non-printable character, anyway. Fixing it:

         if (tmp==s)
                 return -1;

-        // skip spaces
-        while ((*s==' ') || (*s=='\t'))
+        // at this point, only spaces and whitespaces are allowed
+        while (*s)
+        {
+                if ((*s!=' ') && (*s!='\t'))
+                {
+                        return -1;
+                };
                 s++;
+        };

         // all done!
         return 1;
\end{lstlisting}

All cool, up to strings of length 15. It should be noted: 'unwind' parameter must be bigger by at least 1.
But not too big, otherwise it will verify your code too slow.

\begin{lstlisting}
 % cbmc --trace --function check 4.c --unwind 16 --unwinding-assertions -DCBMC -DLEN=15

...

Solving with MiniSAT 2.2.1 with simplifier
55862 variables, 238864 clauses
SAT checker: instance is UNSATISFIABLE
Runtime decision procedure: 4.23306s

** Results:
[check.assertion.1] assert: SUCCESS
[naive.unwind.0] unwinding assertion loop 0: SUCCESS
[naive.unwind.1] unwinding assertion loop 1: SUCCESS
[naive.unwind.2] unwinding assertion loop 2: SUCCESS
[naive.unwind.3] unwinding assertion loop 3: SUCCESS
[naive.unwind.4] unwinding assertion loop 4: SUCCESS
[naive.unwind.5] unwinding assertion loop 5: SUCCESS
[fsm_main.unwind.0] unwinding assertion loop 0: SUCCESS

** 0 of 8 failed (1 iteration)
VERIFICATION SUCCESSFUL
\end{lstlisting}

The final source code:

\lstinputlisting[style=customc]{\CURPATH/files/CBMC/4.c}

Obviously, without tools like CBMC, you have a lots of unchecked vulnerable code.

\levelup{}