\myheading{The \ac{DFA}-less version} \leveldown{} DFAs generated by the KMP algorithms are \textit{sparse} and have regularities we can observe easily. One popular optimization is not using DFA, but rather a small "partial match" table: \begin{lstlisting}[style=customc] // Knuth-Morris-Pratt algorithm // copypasted from http://cprogramming.com/snippets/source-code/knuthmorrispratt-kmp-string-search-algorithm char *kmp_search(char *haystack, size_t haystack_size, char *needle, size_t needle_size) { int *T; int i, j; char *result = NULL; if (needle_size==0) return haystack; /* Construct the lookup table */ T = (int*) malloc((needle_size+1) * sizeof(int)); T[0] = -1; for (i=0; i 0 && needle[i] != needle[T[i+1]-1]) T[i+1] = T[T[i+1]-1] + 1; } printf ("restarts table:\n"); for (i=0; i=0) print_compare (haystack, i, needle, j); if (j < 0 || haystack[i] == needle[j]) { ++i, ++j; if (j == needle_size) { result = haystack+i-j; break; } } else { j = T[j]; if (j==-1) printf ("Restarting needle at the beginning\n"); else print_state_needle(needle, j); }; } free(T); return result; } \end{lstlisting} Now let's search for 'eel' in the 'eex eel' string: \begin{lstlisting} restarts table: T[0]=-1 T[1]=0 T[2]=1 T[3]=0 going to compare: eex eel eel ^ going to compare: eex eel eel ^ going to compare: eex eel eel ^ Restarting needle at: eel ^ going to compare: eex eel eel ^ Restarting needle at: eel ^ going to compare: eex eel eel ^ Restarting needle at the beginning going to compare: eex eel eel ^ Restarting needle at the beginning going to compare: eex eel eel ^ going to compare: eex eel eel ^ going to compare: eex eel eel ^ \end{lstlisting} The T[] table is small, which is good for cache memory. But please note: a character is to be loaded twice in case of restart. But this is fine, it's cached anyway at that moment. Now the interesting thing: if the 'partial match' table is empty (all zeroes), the search function can be implemented naively, like I tried in the first part \ref{KMP1}. These are the cases of words like 'tesa', 'lee', 'banana'. But a non-empty table must be constructed for words like 'eel', 'oops', 'test', 'elee' -- these are words where a prefix (1-character minimum) is repeated in the word. But this is not true for 'banana' -- there is no repeating prefix. \myheading{All the files used} \url{\RepoURL/\CURPATH/files_3} \levelup{}