Consuming Columns

In the initial implementation, we provide some primitive “column parsers” of type String => T as building blocks for a caller-provided “conversion function” Row => T. To combine these into a “Lego brick” framework, we obviously can’t use either type: Column parsers consume a single column only, a conversion function consumes a whole row (or as much of it as it needs).

To reconcile these, we can think of a parser as consuming an arbitrary number of columns from a row - and returning the remainder along with the result.

trait RowParser[T]:
  def parse(row: Row): (T, Row)

We can redefine our primitive column parsers in terms of RowParser and expect a RowParser from the caller:

private def column[T](f: String => T): RowParser[T] =
  case h :: t => f(h) -> t
  case Nil => throw new CSVException("input exhausted")

val string: RowParser[String] = column(identity)
val int: RowParser[Int] = column(_.toInt)
val date: RowParser[LocalDate] = column(LocalDate.parse)
def parse[T](file: Path)(p: RowParser[T]): List[T] =
  lines(file).map(l => p.parse(row(l))(0))

…but this doesn’t look like we’re making anybody’s life much more comfortable, yet:

  val userParser: RowParser[User] =
    row =>
      val (i, ir) = int.parse(row)
      val (n, nr) = string.parse(ir)
      val (d, dr) = date.parse(nr)
      (User(i, n, d), dr)

Note: We’re playing fast and loose with error semantics here. The previous version only allowed rows with exactly three columns, this one will tolerate trailing data and fail with a different exception on missing columns. Let’s worry about this later.

Combining Parsers

How to generically combine, say, an int, string, date parser to a User one?

Using a List[RowParser[T]] or similar as a building block won’t work - what’s T going to be? The target types of the individual parsers will be lost. We need to handle a heterogeneous list of types. And we kind of have a representation for it - it’s the User#apply() function of type (Int, String, LocalDate) => User, which we can curry to Int => String => LocalDate => User. Can we somehow use this?

Transforming Parsers

Given a function A => B and a parser RowParser[A], we can’t create a B - but we can easily create a RowParser[B]:

def transform[A, B](p: RowParser[A])(f: A => B): RowParser[B] =
  row =>
    val (a, ar) = p.parse(row)
    (f(a), ar)

That’s nice in its own right, i.e. we can modify existing parsers:

transform(string)(_.toLowerCase)

…and we can unify the definition of our primitives:

val string: RowParser[String] =
  case h :: t => h -> t
  case Nil => throw new CSVException("input exhausted")

val int: RowParser[Int] = transform(string)(_.toInt)
val date: RowParser[LocalDate] = transform(string)(LocalDate.parse)

…but it doesn’t help us with traversing a constructor function: We need to consume columns in lockstep with parameter types.

Non-parsing Parsers

Let’s take a step back and revisit our API. A RowParser consumes an arbitrary number of columns: primitive parsers consume one, the User parser consumes three - and there’s nothing that keeps a parser from consuming zero columns. We can have “constant” parsers:

def const[T](v: T): RowParser[T] = (v, _)

So we can combine parsers that actually consume columns with parsers that are simply containers for values. In particular, we can even have parsers of functions: RowParser[A => B].

Folding Functions

Why is this interesting? Well, it means we can thread functions through computations “inside” parsers that actually do some work…

def combine[A, B](fp: RowParser[A => B], p: RowParser[A]): RowParser[B] =
  row =>
    val (f, fr) = fp.parse(row)
    val (v, vr) = p.parse(fr)
    f(v) -> vr

With each #combine() step, we are stripping the leading parameter off our function and inject a parser operation for the corresponding argument instead. Now we can build a parser from parsers guided by a function…

  val userParser: RowParser[User] =
    combine(combine(combine(const((User.apply _).curried), int), string), date)

…and it looks terse, but somewhat illegible.

Syntactic Sugar

An operator alias for #combine()

extension[A, B](fp: RowParser[A => B])
  def <*>(p: RowParser[A]): RowParser[B] = combine(fp, p)

…will yield considerable improvement:

val userParser: RowParser[User] =
  const(User.apply.curried) <*> int <*> string <*> date

Our example still works:

sbt:nanocsv> runMain de.sangamon.nanocsv.step01.main data/users.csv
1,Torsten Test,1970-01-01
2,Andrea Anders,2000-02-20
User(1,Torsten Test,1970-01-01)
User(2,Andrea Anders,2000-02-20)

Now this is starting to look like a decent combinator API.

The full code for this post can be found in package de.sangamon.nanocsv.step01. Next, let’s see if there’s prior art for this idea that we can integrate with.