Handling Deletes in Log-based Pipelines
In the case of Pipelines, while Hevo tracks the records deleted in the Source, it does not delete that data in the Destination. Instead, it updates the metadata column, __hevo__marked_deleted
to True to identify deleted records. In the case of Activate, Hevo does not track delete operations performed on the Warehouse data.
Understanding Deletes in Sources
In Sources, a delete could mean one of the following:
-
A field dropped due to a schema change: A field is dropped from the Event Type resulting in a schema change. A schema change does not affect the Pipeline execution or the existing data in the dropped field. However, in subsequent Pipeline runs, null values are loaded to the dropped field in the Destination.
-
A record deleted in an RDBMS Source: For Pipelines created using the Custom SQL and Table ingestion modes, data is fetched using SQL queries. As it is not possible to capture delete operations using SQL queries, Hevo does not handle deleted Events for these ingestion modes. If you want to capture deletes in the Destination with each Pipeline run, you need to create a new Pipeline with the log-based ingestion mode. As Hevo uses the database logs to ingest the incremental data, it can track all the inserts, updates, and deletes that occur in the Source and replicate these to the Destination. Read Handling Deletes in Log-based Pipelines. If you do not want to retain the deleted data in the Destination, you can truncate the Destination table(s) and manually restart the historical load in every Pipeline run. In this way, you can ensure that only inserts and updates are available in the Destination.
-
A record deleted in a SaaS Source: Some SaaS Sources such as Salesforce move the deleted data to a recycle bin, while other Sources such as Stripe and Shopify capture information about deleted records and objects as an event in their
events
log. In both cases, Hevo replicates the deleted data to the Destination tables and uses the__hevo__marked_deleted
oris_row_deleted
metadata column along with thedeleted_timestamp
metadata column to track this information. Read the respective SaaS Source documentation for information on whether Hevo supports replication of deletes for the Source.
Tracking Deletes
Hevo tracks deleted Events through the metadata column, __hevo__marked_deleted
, which is of the boolean data type. Default value: False.
This column is created by Hevo in the Destination tables and can contain one of the following values:
-
False: If the ingested Event was not deleted in the Source.
-
True: If the ingested Event was deleted in the Source.
-
Null: If the metadata column is added to an existing table. All records with a null value in this column are considered not deleted.
You can add this column in your SQL query to ignore or include the deleted data from/in your analysis. You can also delete the data from your Destination table based on the value of this column.
Note:
-
The
__hevo__marked_deleted
column was visible in the Schema Mapper until before September 3, 2019. It is not visible in the Schema Mapper any longer but is added to the Destination tables automatically. -
Hevo tracks deleted data in a Stripe Source through the metadata columns
is_row_deleted
anddeleted_timestamp
. Read Data Replication in Stripe. -
Hevo does not track deletes for a table if it is deleted in the Source post-Pipeline creation.
Read Additions to the Destination Schema.
Handling Deletes in Destinations
In data warehouses and databases, deleted Events are handled during the deduplication process when primary keys are defined for the Destination tables. Read Loading Data in a Database and Loading Data to a Data Warehouse for details on the deduplication process.
If no primary keys are defined, the deleted Events are not deduplicated; the append-only approach is taken, and the ingested data is appended at the end in the Destination tables. However, to identify a deleted Event, Hevo updates the metadata columns in the Event as follows:
-
Sets the metadata column,
__hevo__marked_deleted
to True. -
Updates the
__hevo__modified_source_at
to the timestamp of when the Event was deleted in the Source.