TheAlgorithms/C++ 1.0.0
All the algorithms implemented in C++
Loading...
Searching...
No Matches
boyer_moore.cpp File Reference

The Boyer–Moore algorithm searches for occurrences of pattern P in text T by performing explicit character comparisons at different alignments. Instead of a brute-force search of all alignments (of which there are n - m + 1), Boyer–Moore uses information gained by preprocessing P to skip as many alignments as possible. More...

#include <cassert>
#include <climits>
#include <cstring>
#include <iostream>
#include <string>
#include <vector>
Include dependency graph for boyer_moore.cpp:

Go to the source code of this file.

Classes

struct  strings::boyer_moore::pattern
 A structure representing all the data we need to search the preprocessed pattern in text. More...
 

Namespaces

namespace  strings
 String algorithms.
 
namespace  strings::boyer_moore
 Functions for the Boyer Moore algorithm implementation.
 

Macros

#define APLHABET_SIZE   CHAR_MAX
 number of symbols in the alphabet we use
 

Functions

void strings::boyer_moore::init_good_suffix (const std::string &str, std::vector< size_t > &arg)
 A function that preprocess the good suffix thable.
 
void strings::boyer_moore::init_bad_char (const std::string &str, std::vector< size_t > &arg)
 A function that preprocess the bad char table.
 
void strings::boyer_moore::init_pattern (const std::string &str, pattern &arg)
 A function that initializes pattern.
 
std::vector< size_t > strings::boyer_moore::search (const std::string &str, const pattern &arg)
 A function that implements Boyer-Moore's algorithm.
 
bool strings::boyer_moore::is_prefix (const char *str, const char *pat, size_t len)
 Check if pat is prefix of str.
 
void and_test (const char *text)
 A test case in which we search for every appearance of the word 'and'.
 
void pat_test (const char *text)
 A test case in which we search for every appearance of the word 'pat'.
 
static void tests ()
 Self-test implementations.
 
int main ()
 Main function.
 

Detailed Description

The Boyer–Moore algorithm searches for occurrences of pattern P in text T by performing explicit character comparisons at different alignments. Instead of a brute-force search of all alignments (of which there are n - m + 1), Boyer–Moore uses information gained by preprocessing P to skip as many alignments as possible.

The key insight in this algorithm is that if the end of the pattern is compared to the text, then jumps along the text can be made rather than checking every character of the text. The reason that this works is that in lining up the pattern against the text, the last character of the pattern is compared to the character in the text.

If the characters do not match, there is no need to continue searching backwards along the text. This leaves us with two cases.

Case 1: If the character in the text does not match any of the characters in the pattern, then the next character in the text to check is located m characters farther along the text, where m is the length of the pattern.

Case 2: If the character in the text is in the pattern, then a partial shift of the pattern along the text is done to line up along the matching character and the process is repeated.

There are two shift rules:

[The bad character rule] (https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm#The_bad_character_rule)

[The good suffix rule] (https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm#The_good_suffix_rule)

The shift rules are implemented as constant-time table lookups, using tables generated during the preprocessing of P.

Author
Stoycho Kyosev

Definition in file boyer_moore.cpp.

Macro Definition Documentation

◆ APLHABET_SIZE

#define APLHABET_SIZE   CHAR_MAX

number of symbols in the alphabet we use

for assert for CHAR_MAX macro for strlen for IO operations for std::string for std::vector

Definition at line 52 of file boyer_moore.cpp.

Function Documentation

◆ and_test()

void and_test ( const char * text)

A test case in which we search for every appearance of the word 'and'.

Parameters
textThe text in which we search for appearance of the word 'and'
Returns
void

Definition at line 220 of file boyer_moore.cpp.

220 {
223 std::vector<size_t> indexes = strings::boyer_moore::search(text, ands);
224
225 assert(indexes.size() == 2);
226 assert(strings::boyer_moore::is_prefix(text + indexes[0], "and", 3));
227 assert(strings::boyer_moore::is_prefix(text + indexes[1], "and", 3));
228}
bool is_prefix(const char *str, const char *pat, size_t len)
Check if pat is prefix of str.
void init_pattern(const std::string &str, pattern &arg)
A function that initializes pattern.
std::vector< size_t > search(const std::string &str, const pattern &arg)
A function that implements Boyer-Moore's algorithm.
A structure representing all the data we need to search the preprocessed pattern in text.

◆ main()

int main ( void )

Main function.

Returns
0 on exit

Definition at line 269 of file boyer_moore.cpp.

269 {
270 tests(); // run self-test implementations
271 return 0;
272}
static void tests()
Self-test implementations.

◆ pat_test()

void pat_test ( const char * text)

A test case in which we search for every appearance of the word 'pat'.

Parameters
textThe text in which we search for appearance of the word 'pat'
Returns
void

Definition at line 235 of file boyer_moore.cpp.

235 {
238 std::vector<size_t> indexes = strings::boyer_moore::search(text, pat);
239
240 assert(indexes.size() == 6);
241
242 for (const auto& currentIndex : indexes) {
243 assert(strings::boyer_moore::is_prefix(text + currentIndex, "pat", 3));
244 }
245}

◆ tests()

static void tests ( )
static

Self-test implementations.

Returns
void

Definition at line 250 of file boyer_moore.cpp.

250 {
251 const char* text =
252 "When pat Mr. and Mrs. pat Dursley woke up on the dull, gray \
253 Tuesday our story starts, \
254 there was nothing about pat the cloudy sky outside to pat suggest that\
255 strange and \
256 mysterious things would pat soon be happening all pat over the \
257 country.";
258
259 and_test(text);
260 pat_test(text);
261
262 std::cout << "All tests have successfully passed!\n";
263}
void pat_test(const char *text)
A test case in which we search for every appearance of the word 'pat'.
void and_test(const char *text)
A test case in which we search for every appearance of the word 'and'.