Share

Parsing Nested JSON Fields in Events

To load JSON-formatted Source data to the Destination, you must select how Hevo must parse the nested fields, objects, and arrays so that these are read correctly. The parsing strategy depends upon the Destination type and is independent of the Pipeline mode or the data Source.

You can select the parsing strategy as the last step of creating your Pipeline. If only one strategy is available, it is applied by default and no selection is required, as in the case of Amazon Redshift, Aurora MySQL, MySQL, Postgres, and Snowflake Destinations.

Parsing strategy selection

The following table lists the parsing strategies that Hevo supports for each Destination, the first option being the default one in each case:

Destinations Parsing Strategies
SQL Server - Flatten structs and split arrays to new Events
- Replicate JSON fields as JSON strings and array fields as strings
Amazon Redshift - Replicate JSON fields to JSON column
- Replicate JSON fields as JSON strings and array fields as strings
PostgreSQL, Snowflake - Replicate JSON fields to JSON column
Aurora MySQL, MySQL - Replicate JSON fields to JSON columns
- Replicate JSON fields as JSON strings and array fields as strings
Google BigQuery Two of the following are displayed based on the Source Type:
- Replicate JSON structure as is, while collapsing arrays into strings
- Replicate JSON structure as is, while collapsing nested arrays into strings
- Replicate JSON fields as JSON strings and array fields as strings
Azure Synapse Analytics, Databricks - Replicate JSON fields as JSON strings and array fields as strings

Example

Consider a sample orgs object that contains multiple fields, child objects, and arrays in different nesting orders. Its format in different Sources is illustrated below. The example explores how this data is replicated based on the parsing strategy that is selected.

Sample Data Structure 1

The following is the orgs object as it appears in MongoDB:

Sample JSON data from MongoDB

Sample Data Structure 2

The following is the orgs object as it appears in Sources such as MySQL and PostgreSQL:

Sample JSON data from MySQL


The following sections help you understand how the replicated data appears in the Destination based on the different parsing strategies:

Strategy: Replicate JSON fields as JSON strings and array fields as strings

In this strategy, nested JSON fields are collapsed and serialized as a JSON string of type varchar.

In the current example, the address object, which contains several fields and an array, is collapsed into a single JSON string in the Destination table. Same behaviour can be observed for the contact array, where the name and phone_no fields are collapsed into a single JSON string.

Sample Destination Record

String output


Strategy: Replicate JSON structure as is while collapsing nested arrays into strings

In this parsing strategy, the JSON structure of the Source data is maintained except for nested arrays, which are collapsed into JSON strings. Any nested arrays of arrays or a mix of data types in the array items are collapsed into a JSON string. Read Working with Arrays.

This strategy is used only in case of Google BigQuery Destinations, as it offers specialized support for JSON and also enforces the need for a well-defined schema for the fields within a record.

Sample Destination Record

As seen in the image below, the Source JSON format is retained for all the fields and well-defined arrays.

Arays collapsed into strings in BigQuery

Now, suppose the following record is added to the Source data:

Source data with mixed arrays

Here, the contact array within the address object has differing counts of phone numbers. Such non-uniform arrays are collapsed into a JSON string, as shown below.

Non-uniform arrays collapsed into JSON string


Strategy: Replicate JSON structure as is while collapsing arrays into strings

In this parsing strategy, the JSON structure of the Source data is maintained except for arrays, which are collapsed into JSON strings.

This strategy is used only in the case of Google BigQuery Destination with the following specific Sources:

  • MySQL: Generic MySQL, Amazon RDS MySQL, Amazon Aurora MySQL, Azure MySQL, Google Cloud MySQL, Oracle MySQL, TokuDb MySQL

  • MariaDB: Generic MariaDB, Amazon RDS MariaDB, Azure MariaDB

  • Oracle: Generic Oracle, Amazon RDS Oracle

  • PostgreSQL: Generic PostgreSQL, Aws Aurora PostgreSQL, Amazon RDS PostgreSQL, Google Cloud PostgreSQL, Azure PostgreSQL, Heroku PostgreSQL

  • MS SQL: Generic MS SQL, Amazon RDS MS SQL, Azure MS SQL, Google Cloud SQL Server

  • MongoDB: Generic MongoDB, MongoDB Atlas

  • Amazon DynamoDB

Sample Destination Record

For the Sample Data Structure 1 above, the Source JSON format is retained at the Destination for all the fields and well-defined arrays.

Array to String


Strategy: Flatten structs and split arrays to new Events

Each JSON object or array is ingested as a separate Event Type, and a new table is created for it in the Destination. The child Event carries the property __hevo_ref_id, representing the identifier of the parent object. In addition, the parent object name is prefixed to the Event name. In this structure, each key:value pair holds only one value.

Note: In case the Source table does not have a primary key, then this strategy is not used but the parsing defaults to the Collapse strategy. This is done because there is no primary key to reference in the new Event Types.

The following image illustrates the tables that are created in the Destination for the orgs Event Type and its child arrays:

Different tables created for each object in the Source

  • In the flat_orgs table:

    • Each field of the orgs object is parsed as a separate Event in a flat JSON format.

    • The fields within the address object are converted to separate Events, and the parent field name is prefixed to each Event’s name, for example, address.street. Here, the _id field, which is the same as the primary key of the parent field, maps the child to the parent.

    • The coord array within the address object, and the contact array are not seen here, as these are converted into new Event Types.

  • In the flat_orgs_address_coord table created for the coord array:

    • The __hevo_ref_id is mapped to the primary key (__hevo_ref_id field) of the contact object in the Source.

    • The sequence of the array items is maintained using the __hevo_array_index.

    • The combination of __hevo_array_index and _id becomes the primary key of the Event.

  • Similarly, a separate table flat_orgs_contact is created for the contact array within the address object.

Sample Destination Record

The following image displays a sample of the replicated data for the flat_orgs object:

Flat table with only fields

The following image displays a sample of the replicated data for the flat_orgs_contact object:

Flattened array as new table in Destination


Strategy: Replicate JSON fields to JSON columns

While this format appears similar to the Replicate JSON Fields as JSON Strings parsing strategy, additional metadata is maintained by the system to differentiate the JSON and String data. This metadata is useful while writing the data to appropriate columns (optimized) in the tables of the Destination warehouse where both JSON and String data types are supported.

Sample Destination Record

As seen in the image below, the address and contact objects are replicated as separate JSONs.

JSON fields as JSON columns


Last updated on Jun 16, 2023

Tell us what went wrong

Skip to the section