Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. This guides you through all the steps needed to run an Apache Beam pipeline in the Google Cloud Dataflow runner.
The following instructions help you prepare your Google Cloud project.
-
Install the Cloud SDK.
Note: This is not required in Cloud Shell since it already has the Cloud SDK pre-installed.
-
Create a new Google Cloud project via the New Project page, or via the
gcloud
command line tool.export PROJECT=your-google-cloud-project-id gcloud projects create $PROJECT
-
Setup the Cloud SDK to your GCP project.
gcloud init
-
Enable the APIs: Dataflow, Compute Engine, Cloud Storage, Cloud Storage JSON, Stackdriver Logging, Cloud Resource Manager, and IAM API.
-
Create a service account JSON key via the Create service account key page, or via the
gcloud
command line tool. Here is how to do it through the Create service account key page.- From the Service account list, select New service account.
- In the Service account name field, enter a name.
- From the Role list, select Project > Owner (*).
- Click Create. A JSON file that contains your key downloads to your computer.
Alternatively, you can use
gcloud
through the command line.export PROJECT=$(gcloud config get-value project) export SA_NAME=samples export IAM_ACCOUNT=$SA_NAME@$PROJECT.iam.gserviceaccount.com # Create the service account. gcloud iam service-accounts create $SA_NAME --display-name $SA_NAME # Set the role to Project Owner (*). gcloud projects add-iam-policy-binding $PROJECT \ --member serviceAccount:$IAM_ACCOUNT \ --role roles/owner # Create a JSON file with the service account credentials. gcloud iam service-accounts keys create path/to/your/credentials.json \ --iam-account=$IAM_ACCOUNT
(*) Note: The Role field authorizes your service account to access resources. You can view and change this field later by using the GCP Console IAM page. If you are developing a production app, specify more granular permissions than Project > Owner. For more information, see Granting roles to service accounts.
For more information, see Creating and managing service accounts
-
Set your
GOOGLE_APPLICATION_CREDENTIALS
environment variable to point to your service account key file.export GOOGLE_APPLICATION_CREDENTIALS=path/to/your/credentials.json
For instructions on how to install Python, virtualenv, and the Cloud SDK, see the Setting up a Python development environment guide.