Generic library for structured commands with explicit parallelism
Combining Free Monads and Free Applicatives is a great way to get the advantages of both, as seen in e.g. this talk by Markus Hauck. Tierney offers some simple wrappers to reduce the boilerplate of doing this with your own types, while also enforcing that code is always very explicit about which steps can and can't happen in parallel, freeing you to refactor with confidence that a simple change isn't going to accidentally serialise all of your operations.
<dependency>
<groupId>com.github.m50d</groupId>
<artifactId>tierney-free</artifactId>
<version>1.5</version>
</dependency>
Given a "deferred execution" type like FS2 Task
, Tierney lets you write code that's more explicit
about which things may happen in serial versus in parallel:
import tierney.free._, cats.syntax.all._
(for {
w <- ( (for {
x <- Serial(someTask)
y <- Serial(anotherTask(x))
} yield y).parallel,
Parallel(yetAnotherTask) ).mapN {
(y, z) => someMerge(y, z)
} .serial
v <- ...
} yield {...}).runParallel
y
will be computed after x
, but z
will be computed in parallel with both; attempting to use
for
/yield
on Parallel
s is a compilation error, while using Applicative
functionality
like mapN
on Serial
s will serialize their evaluation.
This means we leave some performance on the table compared to "opportunistic" approaches
that offer both parallel mapN
and serial for
/yield
on the same type,
but we ensure that equivalent code (according to the monad/applicative laws) has equivalent parallelism,
so "safe" refactors should never radically change performance characteristics (in either direction).
The only way to change parallel code to serial (or vice versa) is to change Serial(x)
to
Parallel(x)
or add/remove an explicit .serial
or .parallel
call.
Hopefully these strike the right balance between lightweight and explicit.
Tierney can be used with F[_]
being a custom command type - using TierneyFree[F, A]
like you'd use Free[F, A]
. In this case you simply pass an interpreter to the final
runParallel
call:
sealed trait MyCommand[A]
case class ReadInt(path: String) extends MyCommand[Int]
case class WriteInt(path: String, value: Int) extends MyCommand[Unit]
...
val program = for {
i <- Serial(ReadInt("src"))
_ <- (Parallel(WriteInt("dst1", i)) *> Parallel(WriteInt("dst2", i + 4))).serial
} yield {}
object MyInterpreter extends (MyCommand ~> Task) { ... }
program.runParallel(MyInterpreter)
See this example code for a more in-depth example that corresponds to Markus Hauck's talk.
- Conceptually there are three mutually recursive
TierneyFree
subtypes:Node[F, A] = Coproduct[F, Serial[F, ?], A]
Left
Node
s are where the recursive structure bottoms out in actualF[_]
s- The real type is slightly different since we do the recursion using
a fixed point combinator
FixKK
rather than directly - The type above is called
UNode
Parallel[F, A] = FreeApplicative[Node[F, ?], A]
- Can be used with
mapN
-style composition
- Can be used with
Serial[F, A] = Free[Parallel[F, ?], A
- Can be used with
for
/yield
composition
- Can be used with
- Any
TierneyFree
can be converted to any of the three types using.node
,.serial
and.parallel
- Interpretation functions are on
TierneyFree
itselfrunParallel
requires both aMonad
(for executing serial sections) and aParallelApplicative
, a custom type that representsApplicative
s that we know run operations in parallel. This is really "run as parallel as possible" - serial commands can only ever be executed serially.runSerialOrUnprincipled
executes both serial and parallel commands using the passedMonad
. Therefore every command will run serially, unless theMonad
uses unprincipled "opportunistic" parallelisation.shallowAnalyze
will analyze "as far as possible" i.e. it will "hide" any chained serial commands. E.g. if you're trying to prefetch users, this can only let you prefetch those users whose IDs were passed in to start with, not e.g. friends of those users, since it's not possible to find out their user IDs without actually running the commands.
- Add a deeper equivalent of
shallowAnalyze
to enable us to optimize code afterflatMap
chains - Look at integrating with cats
Parallel
(perhaps renaming ourParallel
) if it ever takes off. - Replace current basic performance test of
ParallelApplicative
with something more robust - Add more
ParallelApplicative
implementations e.g. scalaz-concurrentTask
- Add performance tests for all
ParallelApplicative
implementations - Add
tut-maven-plugin
(once implemented) to enforce that code examples in this document are correct
- Higher-kinded variants of some general constructs as necessary to support the rest of the project
(
core
, potentially to be replaced with use of polykinds from Typelevel Scala) - Recursion-schemes style
Fix
/FixK
/... constructs and traversal implementations (currently alsocore
, potentially to be spun out into a separate module if the above remains necessary) ParallelApplicative
type to express explicit, intentional parallelism (free
, potentially to be spun out into a separate module)- Implementations for common cases (stdlib
Future
, monixTask
, fs2Task
), potentially to be spun out into their own modules to minimize dependencies
- Implementations for common cases (stdlib
- Implementation of a free hybrid monad-applicative structure, making use of all the above (
free
)
- Usual Eclipse Scala development setup:
- Unzip Eclipse
- Install scala-ide plugin
- Install m2eclipse-scala plugin
- Install maven scm connector for git (possibly bundled in recent versions of eclipse)
- File > Import ... , Maven > Check Out Maven Projects from SCM
- Enter the clone URL from GitHub
- Click Next, then Finish immediately i.e. skip as much of the wizard as possible
- For each project:
- Right click > Properties, Scala Compiler
- Advanced tab, XPlugin box: fill in the full path to
kind-projector[...].jar
- Build manager tab, untick "withVersionClasspathValidator" as it is overly sensitive to patch versions for compiler plugins