diff --git a/website/versioned_docs/version-0.12.2/hoodie_deltastreamer.md b/website/versioned_docs/version-0.12.2/hoodie_deltastreamer.md index 42b98a2b90e05..0ee08d16ddee9 100644 --- a/website/versioned_docs/version-0.12.2/hoodie_deltastreamer.md +++ b/website/versioned_docs/version-0.12.2/hoodie_deltastreamer.md @@ -339,6 +339,27 @@ to trigger/processing of new or changed data as soon as it is available on S3. Insert code sample from this blog: https://hudi.apache.org/blog/2021/08/23/s3-events-source/#configuration-and-setup +### GCS Events +Google Cloud Storage (GCS) service provides an event notification mechanism which will post notifications when certain +events happen in your GCS bucket. You can read more at [Pubsub Notifications](https://cloud.google.com/storage/docs/pubsub-notifications/). +GCS will put these events in a Cloud Pubsub topic. Apache Hudi provides a GcsEventsSource that can read from Cloud Pubsub +to trigger/processing of new or changed data as soon as it is available on GCS. + +#### Setup +A detailed guide on [How to use the system](https://docs.google.com/document/d/1VfvtdvhXw6oEHPgZ_4Be2rkPxIzE0kBCNUiVDsXnSAA/edit#heading=h.tpmqk5oj0crt) is available. +A high level overview of the same is provided below. + +1. Configure Cloud Storage Pubsub Notifications for the bucket. Follow Google’s documentation here: [https://cloud.google.com/storage/docs/reporting-changes](reporting changes) +2. Create a Pubsub subscription corresponding to the topic +3. Note the GCS Project Id, the GCS Subscription Id and use them for the following Hoodie configurations: + 1. hoodie.deltastreamer.source.gcs.project.id=GCP_PROJECT_ID + 2. hoodie.deltastreamer.source.gcs.subscription.id=SUSBCRIPTION_ID + 3. Start the GcsEventsSource using the `HoodieDeltaStreamer` utility with --source-class parameter as +`org.apache.hudi.utilities.sources.GcsEventsSource` and hoodie.deltastreamer.source.cloud.meta.ack=true, and path related + configs as described in the detailed guide mentiond above. +4. Start the GcsEventsSource using the `HoodieDeltaStreamer` utility with --source-class parameter as +`org.apache.hudi.utilities.sources.GcsEventsHoodieIncrSource` and other parameters as mentioned in the detailed guide above. + ### JDBC Source Hudi can read from a JDBC source with a full fetch of a table, or Hudi can even read incrementally with checkpointing from a JDBC source.