Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions website/versioned_docs/version-0.12.2/hoodie_deltastreamer.md
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,27 @@ to trigger/processing of new or changed data as soon as it is available on S3.

Insert code sample from this blog: https://hudi.apache.org/blog/2021/08/23/s3-events-source/#configuration-and-setup

### GCS Events
Google Cloud Storage (GCS) service provides an event notification mechanism which will post notifications when certain
events happen in your GCS bucket. You can read more at [Pubsub Notifications](https://cloud.google.com/storage/docs/pubsub-notifications/).
GCS will put these events in a Cloud Pubsub topic. Apache Hudi provides a GcsEventsSource that can read from Cloud Pubsub
to trigger/processing of new or changed data as soon as it is available on GCS.

#### Setup
A detailed guide on [How to use the system](https://docs.google.com/document/d/1VfvtdvhXw6oEHPgZ_4Be2rkPxIzE0kBCNUiVDsXnSAA/edit#heading=h.tpmqk5oj0crt) is available.
A high level overview of the same is provided below.

1. Configure Cloud Storage Pubsub Notifications for the bucket. Follow Google’s documentation here: [https://cloud.google.com/storage/docs/reporting-changes](reporting changes)
2. Create a Pubsub subscription corresponding to the topic
3. Note the GCS Project Id, the GCS Subscription Id and use them for the following Hoodie configurations:
1. hoodie.deltastreamer.source.gcs.project.id=GCP_PROJECT_ID
2. hoodie.deltastreamer.source.gcs.subscription.id=SUSBCRIPTION_ID
3. Start the GcsEventsSource using the `HoodieDeltaStreamer` utility with --source-class parameter as
`org.apache.hudi.utilities.sources.GcsEventsSource` and hoodie.deltastreamer.source.cloud.meta.ack=true, and path related
configs as described in the detailed guide mentiond above.
4. Start the GcsEventsSource using the `HoodieDeltaStreamer` utility with --source-class parameter as
`org.apache.hudi.utilities.sources.GcsEventsHoodieIncrSource` and other parameters as mentioned in the detailed guide above.

### JDBC Source
Hudi can read from a JDBC source with a full fetch of a table, or Hudi can even read incrementally with checkpointing from a JDBC source.

Expand Down