English version | Chinese version README.md
pyGA4
is a Python toolkit designed for extracting, processing, and analyzing data from Google Analytics 4 (GA4).- Whether you're a digital marketing professional, a data analyst, or anyone interested in gaining insights from GA4 data, this package simplifies the process of working with your GA4 data.
First, we assume that everyone has already integrated GA4 data into their respective platform websites (there are many online tutorials).
Next, we will use a free third-party service to stream data into Bigquery. For detailed instructions, please refer to the official documentation.
If successful, you will see tables in Bigquery similar to the following (analytics_xxxx
), ref:
- Query Cost Estimation: Provides the Bigquery
dry run
feature to estimate query cost before execution. - Data Extraction: Easily connect to your GA4 property, retrieve data, and save it for analysis.
- Data Preprocessing: Prepare and clean your GA4 data for analysis with built-in data preprocessing functions.
- Custom Queries: Execute custom queries to filter and aggregate data based on your specific needs.
- Data Analysis: Perform various types of analysis, including user behavior analysis, conversion tracking, and more.
- Data Visualization: Create informative visualizations and reports to communicate your findings effectively.
- Simple Integration: Seamlessly integrate
pyGA4
into your data pipeline or analytics workflow.
For more features, please refer to the package documentation.
pip install pyga4
from google.cloud import bigquery
client = bigquery.Client()
# Or you can use:
# client = bigquery.Client.from_service_account_json(
# './private/service-project-data-dev-01d11c742ba1.json'
# )
from pyga4.model import Ga4Table
# Use your project_id, dataset_name (analytics_xxxx)
ga4_table = Ga4Table(client, PROJECT_ID, DATASET_NAME)
# Show the tables list in dataset, e.g., analytics_date1, analytics_date2
table_id_list = ga4_table all_tables_list
print(table_id_list)
# Select the table you want to analyze
ga4_table.table_id = 'events_intraday_20200812'
# Query with dry run:
ga4_table.query_config.dry_run = True
query = f"""
SELECT event_timestamp FROM `<project_id>.<dataset_name>.<data_table>`
"""
results = ga4_table.query(query) # return None, but you can see the query usage!
Query User ID and Country List
# User attribute
user_id_list = ga4_table.user_id_list
user_country_list = ga4_table.geo_country_list
Query User ID and Country Distribution
from pyga4.analytic import UserAnalytic
# UserAnalytic
user_analytic = UserAnalytic(ga4_table)
countries_dist = user_analytic.countries_distribution
userid_dist = user_analytic.user_id_distribution
# DeviceAnalytic
device_analytic = DeviceAnalytic(ga4_table)
mobile_brand_dist = device_analytic.mobile_brand_distribution
# EventAnalytic
event_analytic = EventAnalytic(ga4_table)
page_loc_dist = event_analytic.pages_distribution