|
Sat
3
Dec '05
|
One strong point of PERL is regular expression matching. I am thinking about implementing a domain specific embeded language in Haskell for the same purpose. Inspired more or less by Parsec and after one hour of coding, I implemented such a simple matcher. It currently only matches a regular expression against a strings, tells us if it matches or not. It lacks the ability of binding certain parts of the string (which might be more useful). An example:
Reg> (many letter <+> (char ',') <+> many1 digit) "abc,123" Just ""
The results indicates that the input string matches and the whole string is consumed. The regular expression is constructed by two <+> (concatenation) combinators. The first sections is a matcher for zero or more letters ([a..z,A..Z]*), second one matches only a comma; the last section matches one or more digits.
Another example of using <-> combinator:
Reg> (many (any <-> (char ',')) <+> char ',' <+> many digit) "abc,123" Just ""
Code:
module Reg where import Monad type Matcher = String -> Maybe String -- character mathers char :: Char -> Matcher char c (x:xs) = if c==x then return xs else Nothing char _ _ = Nothing any :: Matcher any (_:xs) = return xs any _ = Nothing -- useful combinators empty :: Matcher empty = return none :: Matcher none = const Nothing neg :: Matcher -> Matcher neg x s = case x s of Just _ -> Nothing _ -> Just s (<|>) :: Matcher -> Matcher -> Matcher (<|>) x y s = x s ` mplus` y s (<+>) :: Matcher -> Matcher -> Matcher (<+>) x y s = x s >>= y (<->) :: Matcher -> Matcher -> Matcher (<->) x y s = neg y s >>= x many :: Matcher -> Matcher many x s = case x s of Nothing -> return s Just s' -> many x s' many1 :: Matcher -> Matcher many1 x s = x s >>= many x choice :: [Matcher] -> Matcher choice xs = foldr (<|>) none xs -- Commonly used matchers. lower :: Matcher lower = choice $ map char ['a'..'z'] upper :: Matcher upper = choice $ map char ['A'..'Z'] letter :: Matcher letter = choice [lower,upper] digit :: Matcher digit = choice $ map char ['0'..'9']
Now let’s move this to the next step: not only matching the regular expressions, but also bind substrings that are matched. Some definitions above need to be changed. But if the code is well written, only a very small core part needs changes. Let’s see how well it will work out.