Share
Google BigQuery Setup Guide

Starting Release 2.20, Hevo has stopped supporting the streaming inserts feature in Google BigQuery Destinations. As a result, you will not be able to configure a new Google BigQuery Destination with streaming inserts enabled. Any existing Destinations created with this feature will continue to load data in near real-time. Additionally, you can create new Pipelines to load data to your existing Destination.

Google BigQuery is a fully-managed, serverless data warehouse that supports Structured Query Language (SQL) to derive meaningful insights from the data. With Hevo Pipelines, you can ingest data from any Source system and load it to a Google BigQuery Destination.

Hevo Pipelines stage the Source data in Google Cloud Storage (GCS) buckets before loading it to your BigQuery Destination. Hevo can transfer data to the GCS buckets only if they are in the same location as the dataset, except when the datasets are in the US multi-region. Read Location considerations for further details about the requirements to transfer data from GCS buckets to BigQuery tables. Buckets are basic units of storage in GCS, and they hold your data until the time Hevo loads it into your BigQuery tables.

Note: For a select range of Sources, Hevo provides a pre-configured and fully managed Google BigQuery data warehouse Destination. Read Hevo Managed Google BigQuery.

Roles and permissions

Hevo requires that the following predefined roles are assigned to the connecting Google account for the configured GCP project, to enable Hevo to read and write data from and to BigQuery and the GCS buckets.

Roles Description
BigQuery  
BigQuery Data Editor Grants Hevo permission to:

- Create new datasets in the project.

- Read a dataset’s metadata and list the tables in the dataset.

- Create, update, get, and delete tables and views from a dataset.

- Read and update data and metadata for the tables and views in a dataset.
BigQuery Job User Grants Hevo permission to perform actions such as loading data, exporting data, copying data, and querying data in a project.
Google Cloud Storage  
Storage Admin Grants Hevo permission to list, create, view, and delete:

- Objects in GCS buckets.

- GCS buckets.

If you do not want to assign the Storage Admin role, you can create a custom role at the project or organization level, add the following permissions, and assign this role to the connecting Google account, to grant Hevo access to GCS resources:

  • resourcemanager.projects.get

  • storage.buckets.create

  • storage.buckets.delete

  • storage.buckets.get

  • storage.buckets.list

  • storage.buckets.update

  • storage.multipartUploads.abort

  • storage.multipartUploads.create

  • storage.multipartUploads.list

  • storage.multipartUploads.listParts

  • storage.objects.create

  • storage.objects.delete

  • storage.objects.get

  • storage.objects.list

  • storage.objects.update

Note: Hevo uses its GCS bucket if the connecting Google account is not assigned the Storage Admin role or a custom role.

Google account authentication

You can connect to your BigQuery Destination with a user account (OAuth) or a service account. One service account can be mapped to your entire team. You must provide the required roles to both types of accounts for accessing GCP-hosted services such as BigQuery and the GCS buckets. Read Authentication for GCP-hosted Services to know how to assign these roles to your user account and service account.

The following image illustrates the key steps that you need to complete to configure Google BigQuery as a Destination in Hevo:

BigQuery Destination Setup


Handling Source Data with Different Data Types

For teams created in or after Hevo Release 1.56, Hevo automatically modifies the data type of Google BigQuery table columns to accommodate Source data with different data types. Data type promotion is performed on tables whose size is less than 50 GB. Read Handling Different Data Types in Source Data.

Note: Your Hevo release version is mentioned at the bottom of the Navigation Bar.


Handling JSON Fields

Read Parsing Nested JSON Fields in Events to know how Hevo parses the JSON data and makes it available at the Destination. Because BigQuery has built-in support for nested and repeated columns, JSON data is neither split nor compressed, but based on the Source type, arrays may be collapsed into JSON strings or passed as-is to the Destination. Read JSON functions in Standard SQL to know about parsing JSON strings back to arrays.

The fields of a JSON object are shown as a struct (for RECORD) or an array (mode: REPEATED) in the Schema Mapper. The nested fields are not editable.


Modifying BigQuery Destination Configuration

You can modify some of the settings post creation of the BigQuery Destination. However, changing any configuration would affect all the Pipelines that are using this Destination. Refer to section, Potential Impacts of Modifying the Configuration Settings to know the impacts.

To modify the BigQuery Destination configuration:

  1. Navigate to the Destination Detailed View.

  2. Do one of the following:

    • Click the Settings icon next to the Destination name and then, click the Edit icon.

      Modify BQ settings

    • Click the kebab menu on the right and click Edit.

  3. Update the required field values in the Edit Google BigQuery Destination connection settings page. You can modify the following settings:

    • Destination name

    • Project ID

    • Dataset ID

    • GCS Bucket

    • Populate Loaded Timestamp Read the following section to know the impact from making these changes.

  4. Click SAVE DESTINATION.

Potential impacts of modifying the configuration settings

  • Changing any configuration setting impacts all Pipelines using the Destination.

  • Changing the Project ID or Dataset ID may lead to data getting split between the old and the new Dataset. This is because the Events in the old Dataset are not copied to the new one. Further, new tables for all the existing Pipelines that are using this Destination are created in the new Dataset.

  • Changing the Bucket ID may lead to Events getting lost. The Events not yet copied from the old bucket are not copied to the Dataset.


Destination Considerations

  • Hevo supports loading Events in Geometry formats (such as Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection). However, if you are loading any latitude values to your BigQuery Destination, these values must lie within -90 and +90, as that is the range that BigQuery supports. If the data lies outside this range, you can either change the data type in the Source or create a Transformation to change the data type to varchar.

Limitations

  • Hevo currently does not map arrays of type struct in the Source data to arrays of type struct in the BigQuery Destination. Instead, any struct arrays in the Source are mapped as varchar arrays. If needed, you can parse these varchar arrays in the Destination to change them back to struct. Read Strategy: Replicate JSON structure as is while collapsing arrays into strings.

  • Hevo replicates a maximum of 4096 columns to each BigQuery table, of which six are Hevo-reserved metadata columns used during data replication. Therefore, your Pipeline can replicate up to 4090 (4096-6) columns for each table. Read Limits on the Number of Destination Columns.

    To circumvent this limitation, you can use Transformations to remove the columns you do not need.


Last updated on Sep 03, 2024

Tell us what went wrong

Skip to the section