Prolog in Haskell - Basic parsing

In the previous log, our Prolog implementation became a second-order programming only lacking a descent parser.

We will use parser, a parser combinator library which enables building parser based on smaller ones and composition.

This library is 20-years old, stable and well documented, yet, most of the tutorial miss tokenization, which is a way to describe a programming language tokens and derive some parsers, as follows:

import Text.Parsec
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Token as Tok
import Text.Parsec.Language (emptyDef)

lexer :: Tok.TokenParser ()
lexer =
  Tok.makeTokenParser
    emptyDef
      { Tok.commentLine = "%",
        Tok.identStart = letter <|> char '_',
        Tok.identLetter = alphaNum <|> char '_',
        Tok.reservedOpNames = [":-", ",", "."]
      }

identP :: Parser String
identP = Tok.identifier lexer

reservedOpP :: String -> Parser ()
reservedOpP = Tok.reservedOp lexer

commaSepP :: Parser a -> Parser [a]
commaSepP = Tok.commaSep lexer

The next step is to parse terms, as follows:

termP :: Parser Term
termP = varP <|> atomP
  where
    varP = do
      lookAhead (upper <|> char '_')
      var <$> identP

    atomP = do
      lookAhead lower
      atom <$> identP

We can continue by parsing goals, which is challenging because Goals are strong typed, so we have to explicitly convert each arity as follows:

functorArgsP :: Parser (String, [Term])
functorArgsP = do
  lookAhead lower
  name <- identP
  args <- option [] $ Tok.parens lexer (commaSepP termP)
  return (name, args)

goalP :: Parser Goal
goalP = do
  (name, args) <- functorArgsP
  return $
    case args of
      [] -> name @ ()
      [t1] -> name @ t1
      [t1, t2] -> name @ (t1, t2)
      [t1, t2, t3] -> name @ (t1, t2, t3)
      _ -> error $ "Unsupported arity for goal: " ++ name

The next logic piece to implement would be Predicate, the problem being that in Prolog, all clauses of the same predicate do not need to co-located, plus, our current implementation of Predicate does not allow incrementally build them.

To do so, we have to implement a two-phases parsing, one to collect each clause, and another one to build Predicate as follows:

data RawClause = RawClause String [Term] [Goal] deriving (Show)

clauseP :: Parser RawClause
clauseP = do
  (name, args) <- functorArgsP
  body <- option [] $ do
    reservedOpP ":-"
    commaSepP goalP
  reservedOpP "."
  return $ RawClause name args body

To group clauses, we should do it by functor name and arity, but also to convert all arguments list arity, as follows:

groupClauses :: NonEmpty RawClause -> [Predicate]
groupClauses =
  map mkPredicate . NonEmpty.groupBy ((==) `on` functorArity) . NonEmpty.sortOn functorArity
  where
    functorArity (RawClause n args _) = (n, length args)

    mkPredicate :: NonEmpty RawClause -> Predicate
    mkPredicate (x@(RawClause name args1 _) :| xs) =
      case length args1 of
        0 -> predicate name (toArity0 x) (map toArity0 xs)
        1 -> predicate name (toArity1 x) (map toArity1 xs)
        2 -> predicate name (toArity2 x) (map toArity2 xs)
        _ -> error $ "Unsupported arity for predicate: " ++ name

    toArity0 (RawClause _ [] body) = ((), body)
    toArity1 (RawClause _ [a] body) = (a, body)
    toArity2 (RawClause _ [a, b] body) = ((a, b), body)

The following last step is to parse a full programming:

prologP :: Parser [Predicate]
prologP = Tok.whiteSpace lexer *> (maybe [] groupClauses . NonEmpty.nonEmpty <$> many clauseP) <* eof

parseProlog :: String -> Either ParseError [Predicate]
parseProlog = parse prologP ""

We can also come up with the following tests:

describe "PrologParser" $ do
  it "parses a basic fact with arity 1" $ do
    let input = "fruit(tomato)."
    let Right [p] = parseProlog input

    -- Verify against the manual AST from Spec.hs
    hasGoal [p] ("fruit" @ atom "tomato") `shouldBe` True
    hasGoal [p] ("fruit" @ atom "carrots") `shouldBe` False

  it "parses a fact with arity 0" $ do
    let input = "sunny."
    let Right [p] = parseProlog input

    hasGoal [p] ("sunny" @ ()) `shouldBe` True

  it "parses multiple clauses into a single Predicate" $ do
    let input =
          unlines
            [ "fruit(tomato).",
              "fruit(melon)."
            ]
    let Right [p] = parseProlog input

    hasGoal [p] ("fruit" @ atom "tomato") `shouldBe` True
    hasGoal [p] ("fruit" @ atom "melon") `shouldBe` True

  it "parses a rule with a body" $ do
    let input =
          unlines
            [ "fruit(melon).",
              "salad(melon).",
              "fruitSalad(X) :- fruit(X), salad(X)."
            ]
    let Right ps = parseProlog input

    -- Should resolve the query just like the manual fixtures
    runQuery ps ("fruitSalad" @ var "F") `shouldBe` [Map.singleton (Var "F") (atom "melon")]

  it "parses a rule with arity 2" $ do
    let input = "eq(X, X)."
    let Right [p] = parseProlog input

    runQuery [p] ("eq" @ (atom "tomato", var "Y")) `shouldBe` [Map.singleton (Var "Y") (atom "tomato")]

The next log will be the latest on this implementation of Prolog, we will lift parsing at compile time, and sum-up.