Algorithms_in_C++ 1.0.0
Set of algorithms implemented in C++.
|
The Boyer–Moore algorithm searches for occurrences of pattern P in text T by performing explicit character comparisons at different alignments. Instead of a brute-force search of all alignments (of which there are n - m + 1), Boyer–Moore uses information gained by preprocessing P to skip as many alignments as possible. More...
#include <cassert>
#include <climits>
#include <cstring>
#include <iostream>
#include <string>
#include <vector>
Classes | |
struct | strings::boyer_moore::pattern |
A structure representing all the data we need to search the preprocessed pattern in text. More... | |
Namespaces | |
namespace | strings |
String algorithms. | |
namespace | strings::boyer_moore |
Functions for the Boyer Moore algorithm implementation. | |
Macros | |
#define | APLHABET_SIZE CHAR_MAX |
number of symbols in the alphabet we use | |
Functions | |
void | strings::boyer_moore::init_good_suffix (const std::string &str, std::vector< size_t > &arg) |
A function that preprocess the good suffix thable. | |
void | strings::boyer_moore::init_bad_char (const std::string &str, std::vector< size_t > &arg) |
A function that preprocess the bad char table. | |
void | strings::boyer_moore::init_pattern (const std::string &str, pattern &arg) |
A function that initializes pattern. | |
std::vector< size_t > | strings::boyer_moore::search (const std::string &str, const pattern &arg) |
A function that implements Boyer-Moore's algorithm. | |
bool | strings::boyer_moore::is_prefix (const char *str, const char *pat, size_t len) |
Check if pat is prefix of str. | |
void | and_test (const char *text) |
A test case in which we search for every appearance of the word 'and'. | |
void | pat_test (const char *text) |
A test case in which we search for every appearance of the word 'pat'. | |
static void | tests () |
Self-test implementations. | |
int | main () |
Main function. | |
The Boyer–Moore algorithm searches for occurrences of pattern P in text T by performing explicit character comparisons at different alignments. Instead of a brute-force search of all alignments (of which there are n - m + 1), Boyer–Moore uses information gained by preprocessing P to skip as many alignments as possible.
The key insight in this algorithm is that if the end of the pattern is compared to the text, then jumps along the text can be made rather than checking every character of the text. The reason that this works is that in lining up the pattern against the text, the last character of the pattern is compared to the character in the text.
If the characters do not match, there is no need to continue searching backwards along the text. This leaves us with two cases.
Case 1: If the character in the text does not match any of the characters in the pattern, then the next character in the text to check is located m characters farther along the text, where m is the length of the pattern.
Case 2: If the character in the text is in the pattern, then a partial shift of the pattern along the text is done to line up along the matching character and the process is repeated.
There are two shift rules:
[The bad character rule] (https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm#The_bad_character_rule)
[The good suffix rule] (https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm#The_good_suffix_rule)
The shift rules are implemented as constant-time table lookups, using tables generated during the preprocessing of P.
#define APLHABET_SIZE CHAR_MAX |
number of symbols in the alphabet we use
for assert for CHAR_MAX macro for strlen for IO operations for std::string for std::vector
void and_test | ( | const char * | text | ) |
A test case in which we search for every appearance of the word 'and'.
text | The text in which we search for appearance of the word 'and' |
int main | ( | void | ) |
void pat_test | ( | const char * | text | ) |
A test case in which we search for every appearance of the word 'pat'.
text | The text in which we search for appearance of the word 'pat' |
|
static |
Self-test implementations.