In this tutorial, we will demonstrate how to build the data table from the exported GA4 analytics data in BigQuery. These table can be then used for producing a suitable dataset to interact with ChannelAttribution Pro libraries.
Compared with GA4, BigQuery offers more flexibility in the use of raw analytics data, providing a broader range of options to manage this information and gain powerful insights to fuel your marketing efforts.
To begin with, we will take a brief tour of the GA4 data exported to BigQuery and its nuances.

In order to generate df_session and df_conversion csv files, we only need to pay attention to the following event_* fields:
Event fields:
- event_timestamp: the time (in microseconds) when the event was collected (in UTC time zone)
- event_name: the name of the event (e.g. first_visit, session_start, purchase, etc.)
- event_params: keys and values of the event parameters collected with the event. It is a nested structure composed by a key and an array of four possible values. The code displayed below uses many JSON_* and UNNEST SQL commands to getting those values
User fields:
- user_pseudo_id: the pseudonymous ID (or client ID) of the user who triggered the event
E-commerce fields:
- purchase_revenue_in_usd: purchase revenue collected with the event (only for purchase events)
Collected_traffic_source fields:
- manual_campaign_name: the campaign name collected with the event (i.e. utm_campaign)
- manual_source: the source collected with the event (i.e. utm_source)
- manual_medium: the medium collected with the event (i.e. utm_medium)
The SQL script is as follows:
WITH prep_session_start AS (
SELECT
event_timestamp,
CASE
WHEN SAFE_CAST(event_timestamp AS INT64) IS NOT NULL THEN
FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%E*S', TIMESTAMP_MICROS(CAST(event_timestamp AS INT64)))
ELSE NULL
END AS event_time,
CONCAT(
user_pseudo_id,
(
SELECT
value.int_value
FROM
UNNEST (event_params)
WHERE
key = 'ga_session_id'
)
) AS session_id,
user_pseudo_id,
ARRAY_AGG(
collected_traffic_source.manual_campaign_name IGNORE NULLS
ORDER BY
event_timestamp ASC
LIMIT
1
) [SAFE_OFFSET(0)] AS session_campaign_name,
ARRAY_AGG(
collected_traffic_source.manual_source IGNORE NULLS
ORDER BY
event_timestamp ASC
LIMIT
1
) [SAFE_OFFSET(0)] AS session_source,
ARRAY_AGG(
collected_traffic_source.manual_medium IGNORE NULLS
ORDER BY
event_timestamp ASC
LIMIT
1
) [SAFE_OFFSET(0)] AS session_medium
FROM
`raw_session_dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240301' AND '20240303'
AND event_name = "session_start"
AND user_pseudo_id IS NOT NULL
GROUP BY
event_timestamp,
user_pseudo_id,
session_id
),
add_channel AS (
SELECT
*,
CASE
WHEN session_source = '(direct)' and (session_medium = '(not set)' or session_medium = '(none)')
THEN 'Direct'
WHEN session_medium = 'organic'
THEN 'Organic Search'
WHEN regexp_contains(session_source, '^yandex|yahoo|msn|adpiler$')
THEN 'Organic Search'
WHEN regexp_contains(session_medium, '^social|social-network|social-media|sm|social network|social media$')
THEN 'Organic Social'
WHEN regexp_contains(session_source, '^facebook|instagram|linkedin|pinterest|flipboard|youtube $')
THEN 'Social'
WHEN session_medium = 'paid_social'
THEN 'Paid Social'
WHEN session_medium = 'email'
THEN 'Email'
WHEN session_medium = 'affiliate'
THEN 'Affiliates'
WHEN session_medium = 'referral'
THEN 'Referral'
WHEN regexp_contains(session_medium, '^cpc|ppc|paidsearch$')
THEN 'Paid Search'
WHEN regexp_contains(session_medium, '^cpv|cpa|cpp|content-text$')
THEN 'Other Advertising'
WHEN regexp_contains(session_medium, '^display|cpm|banner$')
THEN 'Display'
WHEN regexp_contains(session_campaign_name, '^PLA$')
THEN 'Paid Search'
WHEN regexp_contains(session_campaign_name, '^PMAX$')
THEN 'Cross Network'
WHEN regexp_contains(session_source, '^IGShopping|amazon-adsystem|teamshopping|adpiler|sfmc$') THEN 'Organic Shopping'
ELSE '(Other)'
END AS channel
FROM prep_session_start
),
session_final AS (
SELECT
event_timestamp AS event_time,
channel,
user_pseudo_id AS user_id,
NULL AS revenue,
0 AS conversion
FROM
add_channel
),
prep_purchase AS (
SELECT
CASE
WHEN SAFE_CAST(event_timestamp AS INT64) IS NOT NULL THEN
FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%E*S', TIMESTAMP_MICROS(CAST(event_timestamp AS INT64)))
ELSE NULL
END AS event_time,
event_name AS channel,
user_pseudo_id AS user_id,
ecommerce.purchase_revenue_in_usd AS revenue,
1 AS conversion
FROM
`raw_session_dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240301' AND '20240303'
AND event_name = "purchase"
AND user_pseudo_id IS NOT NULL
),
purchase_final AS (
SELECT
event_time,
channel,
user_id,
revenue,
conversion
FROM
prep_purchase
),
final AS (
SELECT *
FROM session_final
UNION ALL
SELECT *
FROM purchase_final)
SELECT *
FROM final
ORDER BY user_id, event_time;
Here are some comments to clarify the logic behind the script:
-
Getting a true unique session id: To create the user journey, we need to find each visit of users. By concatenating
user_pseudo_id
and the GA4 property dataga_session_id
found in theevent_params
nested structure, we can create a unique session id, which allows us to retrieve only the first event each time users visit our page (per session). -
We convert
event_timestamp
, expressed as UTC integer, to a UTC string and assign it to theevent_time
field. -
Collected_traffic_source structure: This is a collection of the same event-level traffic source details (
manual_source
,manual_medium
, andmanual_campaign_name
dimensions, among others) that you can find nested inside theevent_params
JSON array. -
Function details:
-
IGNORE NULLS
Clause: When used withARRAY_AGG
, any null values in thecollected_traffic_source
column are excluded from the array, avoiding the need for post-processing to delete them. -
SAFE_OFFSET(0)
: This is used to safely access the first element of the array. If the array is empty (which can happen if all values are null and thus ignored),SAFE_OFFSET
returns null instead of an error. -
GA4 update: Since the last update in Nov 2023,
session_start
andfirst_visit
are events sharing the same information. Hence, we are usingsession_start
in our SQL scripts. -
_TABLE_SUFFIX: This is used for choosing from sharded tables whose name suffixes fall (e.g.,
events_20240301
) within a given date range. -
Be aware that sharded tables are generated automatically once BigQuery finds tables that share the following characteristics:
- Exist in the same dataset
- Have the exact same table schema
- The same prefix
- Have a suffix of the form _YYYYMMDD (eg. 20240501)
-
-
Channel Grouping: The channel grouping here is an example. One can define their own channel grouping based on their business strategy. It is implemented in the section
add_channel
. -
event_name: We are only interested in records where
event_name
= 'session_start' orevent_name
= 'purchase'. -
At the prep_purchase CTE, we rename the event as channel, this is on purpose so we can use this result directly in our python script for data transformation.
-
ecommerce.purchase_revenue_in_usd: Purchase revenue dimension collected with the event (only for purchase events).
Finally, the generated table looks like this:

The post-processing of this table will produce a final dataset suitable to be used in the ChannelAttribution Pro library interface.