Google BigQuery (Edge)

Google BigQuery (Edge) Setup Guide

Google BigQuery is a fully managed, serverless data warehouse that supports Structured Query Language (SQL) to derive meaningful insights from the data. With Hevo Edge, you can ingest data from any supported Source system and load it to a Google BigQuery Destination.

Hevo stages the Source data in Google Cloud Storage (GCS) buckets before loading it to your BigQuery Destination. Hevo can transfer data to the GCS buckets only if the bucket and the dataset are co-located, except when the datasets are in the US multi-region. Read Location considerations for further details about the requirements to transfer data from GCS buckets to BigQuery tables. Buckets are basic units of storage in GCS, and they hold your data until Hevo loads it into your BigQuery tables.

Roles and permissions

Hevo requires that the following predefined role be applied to your Google Cloud Platform (GCP) project and assigned to the connecting Google service account:

Role	Description
BigQuery User	Grants Hevo permission to: - Create new datasets in the project. - Read a dataset’s metadata and list the tables in the dataset. - Create, update, get, and delete tables and views from a dataset. - Read and update data and metadata for the tables and views in a dataset. - Run BigQuery jobs and queries in the GCP project. Note: Hevo is also granted the BigQuery Data Owner role on the datasets it creates.

Role

Description

BigQuery User

Grants Hevo permission to:
- Create new datasets in the project.
- Read a dataset’s metadata and list the tables in the dataset.
- Create, update, get, and delete tables and views from a dataset.
- Read and update data and metadata for the tables and views in a dataset.
- Run BigQuery jobs and queries in the GCP project.

Note: Hevo is also granted the BigQuery Data Owner role on the datasets it creates.

Hevo uses a GCS bucket to temporarily stage data ingested from the Source before loading it into your BigQuery Destination tables. By default, your data is staged in a Hevo-managed GCS bucket. However, you can specify your own GCS bucket while configuring the BigQuery Destination. In case of the latter, you must assign the following role to the connecting Google service account:

Role	Description
Storage Admin	Grants Hevo permission to list, create, view, and delete: - Objects in GCS buckets. - GCS buckets.

If you do not want to assign the Storage Admin role, you can create a custom role at the project or organization level and assign it to the connecting Google service account. Grant the following permissions to the role that you add to give Hevo access to your GCS resources:

resourcemanager.projects.get
storage.buckets.create
storage.buckets.delete
storage.buckets.get
storage.buckets.list
storage.buckets.update
storage.multipartUploads.abort
storage.multipartUploads.create
storage.multipartUploads.list
storage.multipartUploads.listParts
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.list
storage.objects.update

Google account authentication

In Edge, you connect to your BigQuery Destination with a service account. One service account can be mapped to your entire team. You must assign the required roles to your service account, enabling it to access GCP-hosted services, such as BigQuery and GCS buckets. Read Authentication for GCP-hosted Services to know how to assign these roles to your service account.

Getting Started

Setup

Prerequisites

A Google Cloud Platform (GCP) project is available.
A Google service account is created in your GCP project, or the details of Hevo’s service account are available.
The private key for the Google service account is available.
The essential roles are assigned to your Google service account at the project level.
An active billing account is associated with your GCP project.
(Optional) You have access to an existing GCS bucket if you want Hevo to use your GCS bucket.

Set up the Google Cloud Platform (Optional)

1. Access the Google Cloud Platform in one of the following ways

Sign in using your work email address if your organization has a Google Workspace or Cloud Identity account.
Sign up for the Google Cloud Free Program, which offers a 90-day, $300 Free Trial.

Note: A free trial allows you to evaluate Google Cloud products, such as BigQuery and Cloud Storage. However, to continue using Google Cloud, you must upgrade to a paid Cloud Billing account.
1. Go to Google Cloud Computing Services, and click Get started for free.
2. On the Account Information page, complete the form, and then click Agree and continue.
3. On the Verify Your Identity page, create your payment profile and click Start free.
  
  Note: Google verifies the payment method through a one-time transaction. If you want to allow multiple users to log in, create a profile with the Organization account type.

2. Create a Google Cloud Platform project

Note: To create a project, you must have the resourcemanager.projects.create permission. By default, all organization resource users and free trial users are granted this permission.

Sign in to the Google Cloud console. You can also use the account you created in Step 1.
Go to the Manage Resources page, and click + Create Project.
On the New Project page, specify the following, and click Create:
- Project Name: A name for your project that helps you identify it.
- Project ID: A string that uniquely identifies your project. You can keep the identifier assigned by Google Cloud or click Edit to change it.
  
  Note: You cannot change the ID once the project is created.
- Organization: From the drop-down list, select the name of the organization resource where you want to create the project.
  
  Note: This field is not visible if you are a free trial user.
- Location: The parent organization or folder resource where you want to create the project. Click Browse to view and then select a value from the list.

You can specify the ID of this GCP project while configuring BigQuery as a Destination in Edge

3. Enable billing for your Google Cloud Platform project

Note: You can link a billing account to a Google Cloud project if you have the Organization Administrator or the Billing Account Administrator role. Skip this step if you are a free trial user, as billing is enabled during the sign-up process itself.

Sign in to the Google Cloud console and go to the Manage Resources page.
On the Manage Resources page, click the More () icon next to the desired project and select Billing.
On the Billing page, click Link a billing account. If there are no active billing accounts associated with your organization, you need to create one. Read Create a new Cloud Billing account for the steps.
In the displayed pop-up window, select a billing account from the drop-down and click Set Account.

You have successfully enabled billing for your GCP project.

4. Create a Google Cloud Storage Bucket (Optional)

Note: You can perform the following steps only if you have the Storage Admin role and your GCP project is associated with a billing account.

Perform the following steps to create a GCS bucket:

Sign in to the Google Cloud console, go to the Dashboard page, and click the Select from ( ) icon at the top of the page.
In the window that appears, select the project for which you want to create the storage bucket.
Go to the Cloud Storage Buckets page and click + CREATE.
On the Create a bucket page, specify the details in each section and click CONTINUE to move to the next step:
- Name your bucket: A globally unique name for your bucket, which follows Google’s Bucket naming guidelines.
- Choose where to store your data: The location type and the geographical location in which the bucket’s data is stored.
  - Multi-region: A geographical area containing more than two regions, such as the United States or Asia. You should select this location type when you need higher availability of data, lower downtime because of geo-redundancy, or when your customers are spread over a large geographical area.
  - Dual-region: A geographical area containing a specific pair of regions, such as Singapore and Taiwan in Asia. You should select this location type for a higher availability of data than a regional location. The charges incurred to store data are the highest for this location type.
    
    Note: Hevo does not support the dual-region location type.
  - Region: A specific geographical location, such as Mumbai. You should select this location type to store data that needs to be available faster in your BigQuery Destination.
  Select the geographical location from the drop-down list displayed for your location type. This location must be the same as the data location of your dataset. Read Location considerations for further details on the requirements.
  
  Note: You cannot change the bucket’s location once the bucket is created.
- Choose a default storage class for your data: A parameter that affects the bucket’s data availability. The storage class is assigned by default to all objects uploaded to the bucket and also affects the bucket’s monthly costs for data storage and retrieval. Read Data storage pricing for a comparison of the storage costs.
  - Standard: This storage class is most suitable for storing data that must be highly available and/or stored for only a short amount of time. For example, data accessed around the world, streaming videos, or data used in data-intensive computations.
  - Nearline: This storage class is suitable for storing data read or modified once a month or less. For example, data accumulated over a month and accessed once to generate monthly reports.
  - Coldline: This storage class is ideal for storing data accessed at least once every 90 days. For example, a quarterly backup of an organization’s essential data for disaster recovery.
  - Archive: This storage class is appropriate for storing data accessed once a year or less. For example, the digitalization of data stored in physical files or data stored for legal or regulatory reasons.
- Choose how to control access to objects: An option to restrict the visibility and accessibility of your data.
  - Prevent public access: Select the Enforce public access prevention check box if you want to restrict public access to your data via the internet.
  - Access control: Controls the access and access level to your bucket and its data:
    - Uniform: This option ensures the same level of access to all the objects in the bucket. If you need to change this setting to fine-grained, you should change it within 90 days.
      
      Note: Hevo enables uniform access control for the bucket that it creates and manages.
    - Fine-grained: This option enables access control on individual objects (object-level) in addition to bucket-level permissions. This option is disabled for all Cloud Storage resources in the bucket if you grant Uniform access control.
- Choose how to protect object data: An option to provide additional protection and encryption to prevent the loss of data in your bucket. You can select one of the following data protection options:
  - Soft delete policy (For data recovery): This option allows you to retain deleted objects for a default duration of 7 days, during which you can restore them. You can customize this retention period or turn off soft delete by setting the duration to 0. When disabled, deleted objects are permanently removed.
    
    Note: Hevo uses GCS buckets only to temporarily stage data before loading it into your BigQuery Destination, and turning soft delete on or off does not affect this action.
  - Object versioning (For version control): This option allows you to retain multiple versions of an object, enabling recovery of previous versions if needed. However, as BigQuery does not support Cloud Storage object versioning, do not select this option while creating a bucket for Hevo.
  - Retention (For compliance): This option allows you to set a retention period on the bucket. The retention period does not allow an object to be deleted or replaced until the time specified by the retention period elapses. If this value is set too low, the data in the bucket may be deleted before it is loaded to the BigQuery Destination.
    
    Note: Hevo does not set a retention period on the bucket that it creates and manages. It uses the age parameter to set the lifecycle of objects in its GCS bucket. Hence, objects written to Hevo-managed buckets are deleted within 7 days of their creation.
Click CREATE.

You can specify the created GCS bucket if you want to use your own GCS bucket while configuring BigQuery as a Destination in Edge

Obtain your GCP Project ID

In Hevo Edge, you need to specify your GCP project ID when configuring BigQuery as a Destination. Hevo creates your BigQuery datasets in the project associated with this ID.

Perform the following steps to retrieve the globally unique identifier associated with your GCP project:

Sign in to the Google Cloud console and go to the Manage Resources page.
On the Manage Resources page, from the list of projects, identify the project in which you want Hevo to create your BigQuery datasets and make a note of its ID. For example, bq-destination-454711 in the image below.

Note: The project must be the one for which you have created the service account and GCS bucket.

You need to specify this project ID while configuring BigQuery as a Destination in Edge.

Configure BigQuery as a Destination in Edge

Perform the following steps to configure BigQuery as a Destination in Edge:

Select DESTINATIONS in the Navigation Bar.
Click the Edge tab in the Destinations List View and click + Create Edge Destination.
On the Create Destination page, click BigQuery.
In the screen that appears, specify the following:
- Destination Name: A unique name for your Destination, not exceeding 255 characters.
- In the Connect to your BigQuery section:
  - Project ID: The globally unique identifier for your GCP project. This ID is the one you retrieved in the Obtain your GCP Project ID step of the Getting Started section.
  - Location: The geographical region in which you want Hevo to create your BigQuery dataset. Select a location from the drop-down.
  - Use own GCS bucket: Enable this option if you want Hevo to stage the ingested Source data in your own GCS bucket.
    
    Note: Your service account must be assigned the Storage Admin role or a custom role to allow Hevo to read from and write to your GCS bucket.
    
    If enabled, Hevo displays the following field:
    - Bucket: The name of an existing container in your Google Cloud Storage. This bucket can be the one that you added in the Create a Google Cloud Storage Bucket step of the Getting Started section.
      
      Note: Your GCS bucket must be located in the same region as the one specified for your BigQuery dataset.
    If turned off, Hevo uses its own GCS bucket and grants your service account access to read from and write to the bucket.
    
    Note: Hevo uses the age parameter to set the lifecycle of objects in its GCS bucket. Hence, objects written to Hevo-managed buckets are deleted within 7 days of their creation.
- In the Authentication section:
  - Service account private key: A cryptographic password used along with a public key to generate digital signatures. Click the attach () icon to upload the private key file that you generated for your Google service account.
    
    Note: You must remove the extra blank space or empty line at the bottom of the service account private key file before uploading to Hevo.
Click Test & Save to test the connection to your BigQuery Destination. Once the test is successful, Hevo creates your BigQuery Destination. You can configure this Destination in your Edge Pipeline.

Additional Information

Modifying BigQuery Destination Configuration in Edge

You can modify some settings of your BigQuery Destination after its creation. However, any configuration changes will affect all the Pipelines using that Destination.

To modify the configuration of your BigQuery Destination in Edge:

In the detailed view of your Destination, do one of the following:
- Click the More () icon to access the Destination Actions menu, and then click Edit Destination.
- In the Destination Configuration section, click Edit.
On the <Your Destination Name> editing page:

Note: The settings that cannot be changed are grayed out.
- You can specify a new name for your Destination, not exceeding 255 characters.
- In the Authentication section, you can modify the service account configured in the Pipeline. Click the Service account private key field to upload the private key associated with the desired service account.
  
  Note:
  - The service account must be assigned the BigQuery User role.
  - If the BigQuery Destination is configured to use your GCS bucket, the Storage Admin role or a custom role with the necessary permissions must be assigned to the service account.
Click Test & Save to check the connection to your BigQuery Destination and then save the modified configuration.

Data Type Evolution in BigQuery Destinations

Hevo has a standardized data system that defines unified internal data types, referred to as Hevo data types. During the data ingestion phase, the Source data types are mapped to the Hevo data types, which are then transformed into the Destination-specific data types during the data loading phase. A mapping is then generated to evolve the schema of the Destination tables.

The following image illustrates the data type hierarchy applied to Google BigQuery Destination tables:

BigQuery DataType Hierarchy

Data Type Mapping

The following table shows the mapping between Hevo data types and BigQuery data types:

Hevo Data Type	BigQuery Data Type
- ARRAY - JSON	JSON
BOOLEAN	BOOL
BYTEARRAY	BYTES
- BYTE - SHORT - INTEGER - LONG	INT64
DATE	DATE
DATE_TIME	DATETIME
DATETIMETZ	TIMESTAMP
DECIMAL	- NUMERIC - BIGNUMERIC - TRING
- DOUBLE - FLOAT	FLOAT64
TIME	TIME
TIMETZ	STRING
- VARCHAR - XML	STRING

Handling Time and Timestamp data types

For BigQuery Destinations, Hevo maps the Source columns of DATETIMETZ data type to the Destination columns of TIMESTAMP data type.
BigQuery supports fractional seconds up to 9 digits of precision (nanoseconds) for both TIME and TIMESTAMP data types. Hevo truncates any digits beyond this limit. For example, a Source value 12:00:00.1234567890 will be stored as 12:00:00.123456789.

Handling the Decimal data type

For BigQuery Destinations, Hevo maps DECIMAL data values with a fixed precision (P) and scale (S) to the NUMERIC (DECIMAL), BIGNUMERIC (BIGDECIMAL), or STRING data types. This mapping is determined based on the number of significant digits (P) in the numeric value and the number of digits to the right of the decimal point (S).

Refer to the table below to understand the mapping:

Precision and Scale of the Decimal Data Value	BigQuery Data Type
Precision: 38 Scale: 9	NUMERIC
Precision: 76 Scale: 38	BIGNUMERIC
Precision: > 76 Scale: > 38	STRING

Read Numeric Types to know more about the data types, their range, and the literal representation BigQuery uses to represent various numeric values.

Handling of Unsupported Data Types

Hevo does not allow the direct mapping of a Source data type to any of the following BigQuery data types:

ARRAY
GEOGRAPHY
INTERVAL
RANGE
STRUCT
Any other data type not listed in the table above.

Hence, if the Source object is mapped to an existing BigQuery table with columns of unsupported data types, it may become inconsistent. To prevent any inconsistencies during schema evolution, Hevo maps all unsupported data types to the STRING data type in BigQuery.

Destination Considerations

BigQuery recommends using the Avro binary format to load data as it is faster. Hence, Hevo creates data files using this file format.
As BigQuery does not support loading encrypted files from GCS buckets, Hevo writes unencrypted data files to the GCS bucket. Hence, it is recommended to enable the Use your GCS bucket when configuring BigQuery as a Destination in Edge.

Note: By default, BigQuery applies encryption to data stored in your Destination tables.
BigQuery creates a job to track tasks run by applications, such as Hevo, in the system. It creates, schedules, and runs jobs, such as query, load, or copy, whenever data is queried, loaded, or copied. The query and load jobs that Hevo runs in your BigQuery Destination to identify data and load it are subject to the quotas and limits imposed by BigQuery. For example, the maximum row size in a query job can be 100 MB. Thus, if any rows in the query job, which Hevo runs, exceed this limit, Hevo does not load data from the Source object(s) containing those rows.
BigQuery APIs and jobs are subject to daily account limits. Your Hevo Edge Pipeline may fail if these limits are exceeded, as Hevo uses the BigQuery APIs and runs jobs to complete operations such as loading data or creating a dataset.

Limitations

At this time, Hevo supports only user-managed service accounts to authenticate connections to BigQuery.
Only private keys generated using the JSON format for user-managed service accounts are supported.
Hevo supports loading data to BigQuery partitioned tables. However, for field-based partitioning, your partition key must be the corresponding Source object’s primary key. Read Supporting Partitioned Tables in BigQuery for more information.
Hevo does not support GCS buckets created with the dual-region location type.
Hevo sanitizes your Source table and column names if they do not meet Hevo’s defined pattern for these identifiers. Hevo’s safe pattern allows alphanumeric characters and underscores in table and column names. Further, Hevo supports only up to 1024 characters for table names and 300 characters for column names.
Currently, Hevo ignores the casing of Source column names. Hence, it does not distinguish between a column named COLUMN1 and column1 when creating them in your BigQuery Destination tables. However, it propagates the casing for your Source schema and table names. For example, a schema named TEST is created as the dataset <destination_prefix>_TEST in your BigQuery Destination.

Last updated on Nov 18, 2025