You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 2, 2019. It is now read-only.
When working on #2 and #4 I set up a dumb CLP wrapper with hacky caching, which I've been running through the sbt shell. This is fine for prototyping, but not good for the long run.
Apache Beam has been repeatedly mentioned as a data-processing framework which might be well-suited to the type of tasks run during data ingest.
Google provides Dataflow as a fully-managed runner for Beam, but it's not the only available solution.
Spotify maintains a Scala wrapper (scio) for the Java Beam APIs.
Porting the ENCODE metadata-processing code to use scio and seeing how it runs through Dataflow should give us a good idea for what it's like to use Beam.
The text was updated successfully, but these errors were encountered:
danxmoran
changed the title
SPIKE - Investigate Apache Beam / Dataflow as a runner for custom ingest code
SPIKE - Investigate Apache Beam / Google Dataflow as a runner for custom ingest code
Sep 25, 2018
danxmoran
changed the title
SPIKE - Investigate Apache Beam / Google Dataflow as a runner for custom ingest code
SPIKE - Investigate Apache Beam / Google Cloud Dataflow as a runner for custom ingest code
Sep 25, 2018
When working on #2 and #4 I set up a dumb CLP wrapper with hacky caching, which I've been running through the sbt shell. This is fine for prototyping, but not good for the long run.
Apache Beam has been repeatedly mentioned as a data-processing framework which might be well-suited to the type of tasks run during data ingest.
Porting the ENCODE metadata-processing code to use scio and seeing how it runs through Dataflow should give us a good idea for what it's like to use Beam.
The text was updated successfully, but these errors were encountered: