Skip to content

Files

Latest commit

 

History

History
 
 

dataflow

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Getting started with Google Cloud Dataflow

Open in Cloud Shell

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. This guides you through all the steps needed to run an Apache Beam pipeline in the Google Cloud Dataflow runner.

Setting up your Google Cloud project

The following instructions help you prepare your Google Cloud project.

  1. Install the Cloud SDK.

    Note: This is not required in Cloud Shell since it already has the Cloud SDK pre-installed.

  2. Create a new Google Cloud project via the New Project page, or via the gcloud command line tool.

    export PROJECT=your-google-cloud-project-id
    gcloud projects create $PROJECT
  3. Setup the Cloud SDK to your GCP project.

    gcloud init
  4. Enable billing.

  5. Enable the APIs: Dataflow, Compute Engine, Cloud Storage, Cloud Storage JSON, Stackdriver Logging, Cloud Resource Manager, and IAM API.

  6. Create a service account JSON key via the Create service account key page, or via the gcloud command line tool. Here is how to do it through the Create service account key page.

    • From the Service account list, select New service account.
    • In the Service account name field, enter a name.
    • From the Role list, select Project > Owner (*).
    • Click Create. A JSON file that contains your key downloads to your computer.

    Alternatively, you can use gcloud through the command line.

    export PROJECT=$(gcloud config get-value project)
    export SA_NAME=samples
    export IAM_ACCOUNT=$SA_NAME@$PROJECT.iam.gserviceaccount.com
    
    # Create the service account.
    gcloud iam service-accounts create $SA_NAME --display-name $SA_NAME
    
    # Set the role to Project Owner (*).
    gcloud projects add-iam-policy-binding $PROJECT \
      --member serviceAccount:$IAM_ACCOUNT \
      --role roles/owner
    
    # Create a JSON file with the service account credentials.
    gcloud iam service-accounts keys create path/to/your/credentials.json \
      --iam-account=$IAM_ACCOUNT

    (*) Note: The Role field authorizes your service account to access resources. You can view and change this field later by using the GCP Console IAM page. If you are developing a production app, specify more granular permissions than Project > Owner. For more information, see Granting roles to service accounts.

    For more information, see Creating and managing service accounts

  7. Set your GOOGLE_APPLICATION_CREDENTIALS environment variable to point to your service account key file.

    export GOOGLE_APPLICATION_CREDENTIALS=path/to/your/credentials.json

Setting up a Python development environment

For instructions on how to install Python, virtualenv, and the Cloud SDK, see the Setting up a Python development environment guide.