step 03: Deriving Parsers
Derivation
In the last step, we embedded our nice, applicative row parser combinator API into the Cats ecosystem. Are we done now?
Well, one thing still looks like it could be improved… We said that our applicative chain traverses the parameters of the given parameter function. Assuming that we have an unambiguous mapping from parameter types to primitive/column parsers (which we may not always have) - why should we still have to specify the parsers at all? Can’t we just infer them from the constructor function and be done with it?
We might feel tempted to look into black Scala magic like macros to solve this, but let’s see if we can implement this using our standard tools…
Mapping
Creating a mapping from types to parsers is easy - we just need to provide
given
instances
of RowParser[T]
for each type T
we want to assign a default mapping for.
given RowParser[String] = string
given RowParser[Int] = int
given RowParser[LocalDate] = date
Armed with these, we can rewrite our user parser.
(
summon[RowParser[Int]],
summon[RowParser[String]],
summon[RowParser[LocalDate]]
).mapN(User.apply)
But again, this doesn’t really look like an improvement so far. How can we use these mappings and the constructor functions to derive a combined parser?
Chaining Type Classes
With the applicative approach, we basically are “folding left” over the constructor parameters. For derivation, it feels like we need to “fold right” instead: We start looking for a parser derivation for the output type and need to assemble its parts by traversing the parameters to the left.
Time to introduce our own type class.
trait RowParserDerivable[A, B]:
def deriveRowParser(a: A): RowParser[B]
B
obviously is the target type, but what’s A
? It’s the source of our derivation,
i.e. it’ll be some function.
Base…
The derivation base case is easy: Given A => B
and a row parser
for A
, we can derive a parser for B
simply by mapping:
given[A, B](using RowParser[A]): RowParserDerivable[A => B, B] with
def deriveRowParser(f: A => B): RowParser[B] = summon[RowParser[A]].map(f)
…and Step
Now the inductive step. If we already can derive a parser for some C
from a
function B
, then, given the (curried) function A => B
and a row parser for
A
, we can build a derived parser for C
by first parsing A
, passing the
result into A => B
, deriving a parser for C
from the remaining B
and
applying it to the remainder of the row. The actual code probably is easier
to read than this winded description.
given[A, B, C](
using RowParser[A], RowParserDerivable[B, C]
): RowParserDerivable[A => B, C] with
def deriveRowParser(f: A => B): RowParser[C] =
row =>
val (a, remA) = summon[RowParser[A]].parse(row)
summon[RowParserDerivable[B, C]].deriveRowParser(f(a)).parse(remA)
Profit
And that’s it already.
val userParser: RowParser[User] = User.apply.curried.deriveRowParser
Let’s try it out:
sbt:nanocsv> runMain de.sangamon.nanocsv.step03.main data/users.csv
1,Torsten Test,1970-01-01
2,Andrea Anders,2000-02-20
User(1,Torsten Test,1970-01-01)
User(2,Andrea Anders,2000-02-20)
The full code for this post can be found in package de.sangamon.nanocsv.step03
. This completes our quest
for a decent row parser combinator API for now. In the next step,
we will turn from these lofty heights to the clerical chores of error handling.