Share

Handling Deletes in Log-based Pipelines

In the case of Pipelines, while Hevo tracks the records deleted in the Source, it does not delete that data in the Destination. Instead, it updates the metadata column, __hevo__marked_deleted to True to identify deleted records. In the case of Activate, Hevo does not track delete operations performed on the Warehouse data.

Understanding Deletes in Sources

In Sources, a delete could mean one of the following:

  • A field dropped due to a schema change: A field is dropped from the Event Type resulting in a schema change. A schema change does not affect the Pipeline execution or the existing data in the dropped field. However, in subsequent Pipeline runs, null values are loaded to the dropped field in the Destination.

  • A record deleted in an RDBMS Source: For Pipelines created using the Custom SQL and Table ingestion modes, data is fetched using SQL queries. As it is not possible to capture delete operations using SQL queries, Hevo does not handle deleted Events for these ingestion modes. If you want to capture deletes in the Destination with each Pipeline run, you need to create a new Pipeline with the log-based ingestion mode. As Hevo uses the database logs to ingest the incremental data, it can track all the inserts, updates, and deletes that occur in the Source and replicate these to the Destination. Read Handling Deletes in Log-based Pipelines. If you do not want to retain the deleted data in the Destination, you can truncate the Destination table(s) and manually restart the historical load in every Pipeline run. In this way, you can ensure that only inserts and updates are available in the Destination.

  • A record deleted in a SaaS Source: Some SaaS Sources such as Salesforce move the deleted data to a recycle bin, while other Sources such as Stripe and Shopify capture information about deleted records and objects as an event in their events log. In both cases, Hevo replicates the deleted data to the Destination tables and uses the __hevo__marked_deleted or is_row_deleted metadata column along with the deleted_timestampmetadata column to track this information. Read the respective SaaS Source documentation for information on whether Hevo supports replication of deletes for the Source.


Tracking Deletes

Hevo tracks deleted Events through the metadata column, __hevo__marked_deleted, which is of the boolean data type. Default value: False.

This column is created by Hevo in the Destination tables and can contain one of the following values:

  • False: If the ingested Event was not deleted in the Source.

  • True: If the ingested Event was deleted in the Source.

  • Null: If the metadata column is added to an existing table. All records with a null value in this column are considered not deleted.

You can add this column in your SQL query to ignore or include the deleted data from/in your analysis. You can also delete the data from your Destination table based on the value of this column.

Note:

  • The __hevo__marked_deleted column was visible in the Schema Mapper until before September 3, 2019. It is not visible in the Schema Mapper any longer but is added to the Destination tables automatically.

  • Hevo tracks deleted data in a Stripe Source through the metadata columns is_row_deleted and deleted_timestamp. Read Data Replication in Stripe.

  • Hevo does not track deletes for a table if it is deleted in the Source post-Pipeline creation.

Read Additions to the Destination Schema.


Handling Deletes in Destinations

In data warehouses and databases, deleted Events are handled during the deduplication process when primary keys are defined for the Destination tables. Read Loading Data in a Database and Loading Data to a Data Warehouse for details on the deduplication process.

If no primary keys are defined, the deleted Events are not deduplicated; the append-only approach is taken, and the ingested data is appended at the end in the Destination tables. However, to identify a deleted Event, Hevo updates the metadata columns in the Event as follows:

  • Sets the metadata column, __hevo__marked_deleted to True.

  • Updates the __hevo__modified_source_at to the timestamp of when the Event was deleted in the Source.




Last updated on Sep 23, 2024

Tell us what went wrong

Skip to the section