Looking at the simplest problem, one might be tempted to
think pattern matching is easy.
The more
cynical might assume that it is just deceptively easy and there are real
complexities lurking.
At some level both
points of view are right.
Pattern matching is intrinsically easy.
Even as we build up more and more complicated
notions of what kinds of patterns we wish to match. These notions will all be
relatively simple steps.
Moreover, we
will generally be able to sole them with essentially the same simple problem solving
methods with a few extensions.
Pattern matching is also deceptively easy.
There are definite places where ones
intuition leads one astray.
Places where
after making three right turns, one finds ourselves unavoidably going
left.
However, it isn’t that
counter-intuitive aspect that is the real issue.
One can learn the places where common sense
fails one and 1 + 1 ≠ 2.
Pattern matching being easy is part of the problem itself.
Along with the sense that much of pattern matching is
well-known and well-understood and little is left to discover is the fact that
it is fairly easy to write a pattern matching solution.
Many people have.
Don’t be surprised when you learn that you
can implement some of the solutions mentioned here in a few hours.
Moreover, it is easy to implement a solution
that incorporates some novel aspect.
Again, many people have.
In fact,
there is a room full of papers showing one novel aspect or another implemented
by someone and never carried forward in my basement.
The hard part of writing a pattern matching solution is the
engineering part, the user-interface part, the scaling part, the combinatorial
complexity part, the keeping simple things simple, but not too simple part.
There are lots of little choices to make in a
solution that impact its performance or the ability of some user to describe
the pattern they want.
Be aware that
even for a one-time throw-away solution, you may have to re-write and change
parts as you become aware of the real needs and the real issues. And, if you
are trying to make something that you aren’t going to throw away, you need to
be all the more conscious of what you choose and why.
Because of that, you will find many points in
this blog where the trade-offs between different solutions are mentioned.
Along those lines, you will find me particularly intolerant
of people who write a hand-written parser or implement their own DFA
interpreter without considering using a well-proven tool to do that job. The
problem isn’t that their solution won’t work.
Pattern matching is easy and it surely will work.
What a solution like that won’t do is
scale.
It will be hard for it to survive
multiple rounds of enhancement without either being entirely re-written or
becoming a nest of convoluted hacks.
It
also won’t easily be made parallel or otherwise optimized, as our optimization
technology works well on stylized code and high-level abstractions, but not so
well when given an arbitrary program that may have assumptions that are hard to
prove or transform.
Long ago I heard a
great quote, which I paraphrase, about a compiler versus an assembly language
programmer that makes this point well, “compiler generated code can generally
reach about 75% the performance of a hand-optimized assembly program, but the
compiler will continue to reach that rate even as the program is modified and
all the careful register assignments in the hand optimized program have been
rendered incorrect”.
This is particularly
true in pattern matching.
You can
quickly write a program that exploits the ideas contained in this blog.
It is much harder and more tedious, thus best
left to a tireless program, to line up all the details of the ideas in the
right order to express an approximation to the optimal solution for a
non-trivial set of patterns.
This is why even if you have what seems to be an easy pattern matching problem, you want to use a well written tool to solve that problem, so that as the problem becomes more complex the solution continues to evolve with it, rather than becoming a rats nest of ad hack patches.
Just as it is easy to implement a pattern matching solution,
it is usually easy to write a pattern itself.
In fact, it needs to be easy to do that.
Otherwise, pattern matching is useless.
However, that easiness gets some people carried away.
It is far too easy to write a pattern that
over-specifies the problem.
As a friend
of mine once said (again slightly paraphrased), “how often does a hacker gets
some sophisticated attack written, but they mess up and get the quotes wrong, and
does one really want to not detect those almost attacks?” To address that, you will find points where
this blog tries to indicate that a simpler solution might cover the same space.
It's one of those curious contradictions that a sophisticated, subtle, and pedantically correct solution may be “worse” than a simpler solution that only approximates the right answer, but does it in a way that works better for the user.
prev: The Alphabet
next: More on the Simplest Pattern Matching Problem
No comments:
Post a Comment