Monday, July 27, 2015

More on the Simplest Pattern Matching Problem

Now that we have the general pattern matching problem defined: we are looking at some text character-by-character, we are going to describe that problem more rigorously by building up from the simplest versions of the problem to more complicated ones. That’s right we are taking this one general problem and turn it into lots of different problems. In the process, we will build up our vocabulary of ways to describe the problem, introduce notations to express the more complicated problems, and develop different techniques that can actually solve these problems.

We have already looked at the simplest problem, the single fixed string recognition problem. That’s the problem where we are given a piece of text that might match a word, and the (single) word we want to determine whether it matches, and we want back a simple yes or no answer. Is the text the word we are asking about or not?

The name of the problem includes two special terms: “fixed string” and “recognition”.

Fixed String means the problem is dealing with a word that has only one spelling and we write that spelling out as part of the problem. Another name for these words is literals, which means we literally write the characters of the word to represent the word. In our case, we had a single fixed string. That is we are looking only for one word. Because we know what the word is (and the reason we know what the word is that the characters are literally spelled out one-by-one as part of the problem and none of the characters can vary from the spelling if we are to have a match), we call it a fixed string problem. Any pattern where the items of the pattern spell out the characters to match exactly (and literally) is called a fixed string pattern.

Recognition means we are simply looking at the text and trying to recognize if it matches the pattern. We are not doing more complicated analysis. We simply want a yes or no, match or not match, answer. We will eventually get to more complicated problems where we want much more than a yes-or-no answer, but for a recognition problem that is all we want.

Implicit in this is the fact that we are looking at the whole text and only if whole text exactly matches the word in question do we return a yes. So, although we didn’t mention it above, this is also a “match” problem.

Match means we want the entire text to match the pattern and to exactly correspond character by character to the pattern. If there are characters left over in the text that we didn’t line up to the pattern, then the answer is no and the text is not a match. Similarly, if there are characters in the pattern that weren’t lined up to the text, the answer is also not a match.

Now, this may seem like a lot of detail to describe a simple problem, but it gives us the start of a rich vocabulary that allows us to precisely distinguish the different pattern matching problems separating the simple ones from those that are complex.

prev: Pattern Matching Is Easy
next: Literals and Quoting

No comments: