How do I set a field as a primary key to avoid duplication?
You can set a field as a primary key in the Destination table through either Transformations or Schema Mapper. The Transformation method works only for warehouse Destinations and not for database and Firebolt Destinations. In the Schema Mapper, you need to drop the existing Destination table and create a new table with the desired primary keys.
-
Using Python Transformations. This method works only for warehouse Destinations and not for database and Firebolt Destinations.
-
Using Schema Mapper. This method works for all Destinations. In this method, you can set a primary key for your Destination table, or Event Type, directly from the Mapping Summary page. In case you want to change an existing primary key, you need to drop the existing Destination table and create a new table with the desired primary keys. Read How can I make sure that each record is loaded only once? for the steps to do this.
Define a primary key through Python Transformations
-
Navigate to the Pipeline for which you want to define the primary key.
-
In the left navigation pane, click Transformations to go to the Python-based Transformation interface.
-
Write the following script in the CODE panel:
from io.hevo.api import Event def transform(event): eventName = event.getEventName() if eventName == "<country>": event.setPrimaryKeyFields(['<name>', '<continent>']) properties = event.getProperties() properties['new_pk_fields'] = event.getPrimaryKeyFields() return event
Here, the fields
name
andcontinent
are set as the primary key for thecountry
Event Type.NOTE:
-
Replace the placeholder values in the script with the fields you need.
-
Nullable columns cannot be marked as primary key in Redshift Destination.
-
-
Click DEPLOY to apply the Transformation.
-
In the Schema Mapper page of your Pipeline, click the Event Type for which you deployed the Transformation.
-
In the Event Types section, click the More ( ) icon corresponding to the Event Type for which you added the primary key and disable the Auto Mapping option.
-
In the Mapping Summary section, click the drop-down next to the Destination table name and click Drop Table.
-
In the Event Types section, enable the Auto Mapping option for the Event Type.
-
Navigate to the Pipeline Overview page, and click the More ( ) icon corresponding to the Event Type for which you dropped the Destination table, and click Restart Object.
NOTE:
-
The re-ingested data counts towards your Events quota consumption and is billed.
-
You may get duplicate records in the Destination table; you need to remove them manually.
-
Define a primary key through Schema Mapper
-
Navigate to the Schema Mapper page of your Pipeline.
-
Click on the Event Type for which you want to set the primary key.
-
In the Mapping Summary section, click SET PRIMARY KEY.
-
In the SELECT FIELDS FOR PRIMARY KEY pop-up window, select the field that you want to set as the primary key.
You can also add multiple fields to create a composite key by clicking ADD MORE+. -
Click ADD PRIMARY KEY.
You can view the updated primary key in the Mapping Summary page of the Pipeline.