step 04: Error
Failure Modes
In the last step, we concluded our parser API design for the time being. Let’s turn to error handling.
One common approach to handling failure is to distinguish between two types of errors: those that are a direct consequence of user input, and those that just happen - “internal” errors. There’s a strong correlation between this distinction the notion of pure and side-effecting functions. User errors usually arise from pure computations (they only depend on the input), internal errors result from side effects (I/O failures, etc.).
User errors can either be handled at some point, or they should be brought to the attention of the user. It is useful to be pretty clear about which computations can yield which kind of user errors. Internal errors, on the other hand, are of no interest to the user and will mostly just end up being logged/reported for future investigation. We don’t really care what kind of internal error may arise from a computation, only whether it can happen at all.
Representation
To encode this distinction, a common pattern is to have a specific set of user error
types, typically an ADT, for each module. The result of a potentially failing application
will be either a successful result of the appropriate type or a user error, often
represented by the Either
data type. Internal errors may be propagated as exceptions - not necessarily by throwing
them and letting them bubble up the stack, but by embedding them into a representation
of a side-effecting computation, such as Scala’s built-in
Future
or cats-effect’s
IO
.
Let’s assume that a successful result is of type T
, the user error type is E
and the
side-effecting computation context is F
. This gives rise to the following cases for
function result types:
T
- A pure computation that cannot raise user errors.Either[E, T]
- A pure computation that may raise user errors.F[T]
- A side-effecting computation that may suffer from internal errors.F[Either[E, T]]
- A side-effecting computation with potential internal and user errors.
Obviously we’ll need some API for Either
and F
to handle threading results through these
representations - and we’ll find that this will often be based on generic abstractions just
like Functor
and Applicative
…
So much for theory, let’s put it to practice. We will focus on equipping our core CSV parser with explicit user errors - i.e. for now we’ll continue to let internal errors bubble up as exceptions, and we’ll ignore the file reading part altogether.
Either
Let’s start by defining our user error types. So far we have encountered two error modes in the core parser: Row exhaustion, and failure to parse a column to the expected type.
enum CSVParseFailure:
case ColumnParseFailure(cause: Throwable)
case RowExhaustionFailure
We’ll provide a type alias for the Either
instantiation for this error and make it the return
type of our parse operation.
type CSVResult[T] = Either[CSVParseFailure, T]
trait RowParser[T]:
def parse(row: Row): CSVResult[(T, Row)]
We can easily convert #string
. Note that Either
has an Applicative
instance that threads computations
through its “success” (i.e. right-hand) side, so we can use #pure
.
val string: RowParser[String] =
case h :: t => (h, t).pure
case Nil => RowExhaustionFailure.asLeft[(String, Row)]
Monad
What about #int
and #date
, though? We cannot keep using #map()
over #string
.
We already get a CSVResult
from the string parser, and we want to produce another CSVResult
that
depends on the success value of the former upon conversion.
Fortunately there’s another abstraction
built on top of Applicative
that supports exactly this kind of dependent computation chaining.
Monad
adds another function #flatMap()
(and
its operator alias >>=
) to our tool case. It is somewhat supported by the Scala core language and
library already - many types, including Either
, have a #flatMap()
method, and there’s
for
expressions
that provide syntactic sugar for nested #flatMap()
/#map()
expressions.
To make it easier to digest, let’s split the task of providing a Monad
based replacement for our previous
#map()
usage in two: First define a function #emap()
that converts a RowParser
given a function
A => CSVResult[B]
, implemented using #flatMap()
. Then define a function #guardMap()
that takes an
(impure!) function A => B
, converts it to A => CSVResult[B]
by catching potential exceptions, and feeds
this to #emap()
.
extension[A](p: RowParser[A])
def emap[B](f: A => CSVResult[B]): RowParser[B] =
p.parse(_) >>= { case (res, rem) => f(res).map(_ -> rem) }
def guardMap[B](f: A => B): RowParser[B] =
emap { a => Either.catchNonFatal(f(a)).leftMap(ColumnParseFailure(_)) }
This gives us our new implementations for #int
and #date
.
val int: RowParser[Int] = string.guardMap(_.toInt)
val date: RowParser[LocalDate] = string.guardMap(LocalDate.parse)
We’ll also need to convert our Applicative
instance and the inductive row parser derivation
step to thread a CSVResult
through monad chaining - please refer to the code for details.
Traverse
There’s one last pitfall… In our top level #parse()
, we used to #map()
over the Row
(yes, List
has
Functor
/Applicative
/Monad
instances, as well). Doing this with the new API gives us a
List[CSVResult[T]]
. But we just want to fail the whole computation if any row cannot be parsed -
that is, we want a CSVResult[List[T]]
instead.
Another abstraction to the rescue! List
has a Traverse
instance, which gives us functions for upending this kind of nesting, which work with any Applicative
.
def parse[T](file: Path)(p: RowParser[T]): CSVResult[List[T]] =
lines(file).map(row).traverse(parseRow(p))
Resolution
Finally, we need to resolve the two Either
modes at the top level of our program, e.g. via
pattern matching.
CSVParser.parse(Paths.get(csvFile))(userParser) match
case Left(f) => println(s"ERROR: $f")
case Right(r) => r.foreach(println)
Success,…
sbt:nanocsv> runMain de.sangamon.nanocsv.step04.main data/users.csv
1,Torsten Test,1970-01-01
2,Andrea Anders,2000-02-20
User(1,Torsten Test,1970-01-01)
User(2,Andrea Anders,2000-02-20)
…, user error,…
sbt:nanocsv> runMain de.sangamon.nanocsv.step04.main data/users.csv
1,Torsten Test,1970-01-01
2,Andrea Anders,2000-02-20x
ERROR: ColumnParseFailure(java.time.format.DateTimeParseException:
Text '2000-02-20x' could not be parsed, unparsed text found at index 10)
…and I/O failure.
sbt:nanocsv> runMain de.sangamon.nanocsv.step04.main data/xyz.csv
[error] (run-main-1) java.io.FileNotFoundException:
data/xyz.csv (No such file or directory)
Whew. This was quite a ride, and the benefit/cost ratio of this approach to failure handling may seem somewhat questionable. In its defense…
- We now have a very explicit representation of failure modes - actually they have become part of the normal program flow now, which seems quite right. After all, you need to treat them with the same diligence as the “happy path” - keeping them kind of invisible seems somewhat absurd.
- The prospect of mentally juggling all these new abstractions may feel daunting, but the good news is:
There’s only so many of them (you can go a long way with
Functor
,Applicative
,Monad[Error]
,Traversable
andFoldable
), and they’re almost universally applicable.
I hope the benefit will become even clearer in subsequent posts. The full code for this post can be found in package
de.sangamon.nanocsv.step04
.
In the next step, we will extend failure handling to the file level.