Prolog in Haskell - Basic parsing
In the previous log, our Prolog implementation became a second-order programming only lacking a descent parser.
We will use parser, a parser combinator library which enables building parser based on smaller ones and composition.
This library is 20-years old, stable and well documented, yet, most of the tutorial miss tokenization, which is a way to describe a programming language tokens and derive some parsers, as follows:
lexer =
Tok.makeTokenParser
emptyDef
{ Tok.commentLine = "%",
Tok.identStart = letter <|> char '_',
Tok.identLetter = alphaNum <|> char '_',
Tok.reservedOpNames = [":-", ",", "."]
}
identP = Tok.identifier lexer
reservedOpP = Tok.reservedOp lexer
commaSepP = Tok.commaSep lexer
The next step is to parse terms, as follows:
termP = varP <|> atomP
where
varP = do
lookAhead (upper <|> char '_')
var <$> identP
atomP = do
lookAhead lower
atom <$> identP
We can continue by parsing goals, which is challenging because Goals are strong typed, so we have to explicitly convert each arity as follows:
functorArgsP = do
lookAhead lower
name <- identP
args <- option [] $ Tok.parens lexer (commaSepP termP)
return (name, args)
goalP = do
(name, args) <- functorArgsP
return $
case args of
[] -> name @ ()
[t1] -> name @ t1
[t1, t2] -> name @ (t1, t2)
[t1, t2, t3] -> name @ (t1, t2, t3)
_ -> error $ "Unsupported arity for goal: " ++ name
The next logic piece to implement would be Predicate, the problem being that in Prolog, all clauses of the same predicate do not need to co-located, plus, our current implementation of Predicate does not allow incrementally build them.
To do so, we have to implement a two-phases parsing, one to collect each clause, and another one to build Predicate as follows:
data RawClause = RawClause String [Term] [Goal]
clauseP = do
(name, args) <- functorArgsP
body <- option [] $ do
reservedOpP ":-"
commaSepP goalP
reservedOpP "."
return $ RawClause name args body
To group clauses, we should do it by functor name and arity, but also to convert all arguments list arity, as follows:
groupClauses =
map mkPredicate . NonEmpty.groupBy ((==) `on` functorArity) . NonEmpty.sortOn functorArity
where
functorArity (RawClause n args _) = (n, length args)
mkPredicate (x@(RawClause name args1 _) :| xs) =
case length args1 of
0 -> predicate name (toArity0 x) (map toArity0 xs)
1 -> predicate name (toArity1 x) (map toArity1 xs)
2 -> predicate name (toArity2 x) (map toArity2 xs)
_ -> error $ "Unsupported arity for predicate: " ++ name
toArity0 (RawClause _ [] body) = ((), body)
toArity1 (RawClause _ [a] body) = (a, body)
toArity2 (RawClause _ [a, b] body) = ((a, b), body)
The following last step is to parse a full programming:
prologP = Tok.whiteSpace lexer *> (maybe [] groupClauses . NonEmpty.nonEmpty <$> many clauseP) <* eof
parseProlog = parse prologP ""
We can also come up with the following tests:
describe "PrologParser" $ do
it "parses a basic fact with arity 1" $ do
let input = "fruit(tomato)."
let Right [p] = parseProlog input
-- Verify against the manual AST from Spec.hs
hasGoal [p] ("fruit" @ atom "tomato") `shouldBe` True
hasGoal [p] ("fruit" @ atom "carrots") `shouldBe` False
it "parses a fact with arity 0" $ do
let input = "sunny."
let Right [p] = parseProlog input
hasGoal [p] ("sunny" @ ()) `shouldBe` True
it "parses multiple clauses into a single Predicate" $ do
let input =
unlines
[ "fruit(tomato).",
"fruit(melon)."
]
let Right [p] = parseProlog input
hasGoal [p] ("fruit" @ atom "tomato") `shouldBe` True
hasGoal [p] ("fruit" @ atom "melon") `shouldBe` True
it "parses a rule with a body" $ do
let input =
unlines
[ "fruit(melon).",
"salad(melon).",
"fruitSalad(X) :- fruit(X), salad(X)."
]
let Right ps = parseProlog input
-- Should resolve the query just like the manual fixtures
runQuery ps ("fruitSalad" @ var "F") `shouldBe` [Map.singleton (Var "F") (atom "melon")]
it "parses a rule with arity 2" $ do
let input = "eq(X, X)."
let Right [p] = parseProlog input
runQuery [p] ("eq" @ (atom "tomato", var "Y")) `shouldBe` [Map.singleton (Var "Y") (atom "tomato")]
The next log will be the latest on this implementation of Prolog, we will lift parsing at compile time, and sum-up.