Hevo Operator
HevoPipelineOperator is Hevo’s implementation of the Airflow Operator. It defines the Pipeline sync task in a Directed Acyclic Graph (DAG).
Parameters
The HevoPipelineOperator accepts the following parameters:
Required
| Name | Type | Default | Description |
|---|---|---|---|
pipeline_id (Required)Note: Supports Jinja templating. |
INT |
NA | The unique identifier of the Hevo Pipeline to be synced. Note: The Pipeline must be in the Initialized state. |
Optional
| Name | Type | Default | Description |
|---|---|---|---|
action |
PipelineAction |
SYNC_NOW | The operation to trigger for the specified Pipeline. Allowable values: SYNC_NOW: Trigger the incremental sync for the Pipeline. Note: The Pipeline must be in the Initialized state. RESYNC: Restart the historical load for all the active objects in the specified Pipeline. |
job_type |
- JobType- STR
|
INCREMENTAL or RESYNC | Identify the job to track after initiating a sync. Allowable values: INCREMENTAL: A job that syncs new and updated data. This is the default value for the SYNC_NOW action. HISTORICAL: A job that syncs all existing data from the Source. RESYNC: A job that re-ingests historical data for all active objects in the Pipeline and updates the Destination tables if changes are detected in the re-ingested data. This is the default value for the RESYNC action. RESYNC_WITH_DROP_AND_LOAD: A job that drops and recreates the Destination tables for all active objects in the Pipeline. During this process, the tables are temporarily unavailable. Hevo then re-ingests all data from the Source objects and loads it into the newly created Destination tables. |
connection_id |
STR |
hevo_airflow_conn_id | The ID of the connection created in Airflow for Hevo API credentials and connection details. |
poll_interval |
INT |
15 | The time (in seconds) between status checks while waiting for the Pipeline job to complete. |
retry_limit |
INT |
10 | The maximum number of attempts after initiating the sync to identify the active job. |
deferrable |
BOOL |
True Note: It is recommended to set this parameter to True in production environments. |
A flag to determine whether the task should initiate the sync, release the worker slot, and defer the monitoring to the Airflow triggerer. Note: The Airflow Triggerer service must be running. |
wait_for_completion |
BOOL |
True | A flag to determine whether the Operator should wait for the Pipeline job to complete. Allowable values: True: Wait for the job to complete. False: Do not wait for the job to complete. Capture the job ID and transfer it via the cross-communication mechanism (Xcom) in Airflow to a HevoSensor. |
accept_completed_with_failures |
BOOL |
False | A flag to accept partial failures for the job. Allowable values: True: Treat the Pipeline job’s Completed with Failures status as success. False: Fail the task when the status of the Pipeline job is Completed with Failures. |
ensure_new_job |
BOOL |
True | A flag to prevent the operator from triggering a new job when previous jobs are still running. Allowable values: True: Fail the task if a job is already in progress in the Pipeline. False: Trigger a new job even when an active job is detected in the Pipeline. |
drop_and_load |
BOOL |
False | A flag to determine whether the Drop and Load operation should be run for the specified Pipeline during the RESYNC action. Allowable values: True: Drop existing data from the Destination tables for all active objects in the Pipeline and then load the ingested data into them. False: Evolve the schema and load data ingested from the Source for all active objects in the Pipeline into the existing Destination tables. |
The following table lists the methods that the HevoPipelineOperator provides:
| Method | Description |
|---|---|
execute |
Runs the sync operation for the specified Pipeline after validating its status. |
execute_complete |
Completes task execution when the Pipeline job reaches a final state, such as Completed, Completed with Failures, or Failed. |
_wait_synchronously |
Waits for the Pipeline job to complete, blocking the worker slot. |
Example
The following example creates three HevoPipelineOperator tasks: trigger_pipeline_sync, resync_pipeline, and trigger_only.
-
The
trigger_pipeline_synctask runs the SYNC_NOW action for the specified Pipeline in the Trigger and Asynchronous Wait mode, as indicated by thedeferrable=Trueandwait_for_completion=Trueparameters. -
The
resync_pipelinetask runs the RESYNC action for the specified Pipeline in the Trigger and Asynchronous Wait mode, as indicated by thedeferrable=Trueandwait_for_completion=Trueparameters. -
The
trigger_onlytask runs in the Trigger and Continue mode, as indicated by thedeferrable=Falseandwait_for_completion=Falseparameters.
from airflow import DAG
from airflow.hevo.models.job import JobType
from airflow.hevo.models.pipeline import PipelineAction
from airflow.hevo.operators import HevoPipelineOperator
# Deferrable operator with SYNC_NOW (default)
trigger_sync = HevoPipelineOperator(
task_id="trigger_pipeline_sync",
pipeline_id=123,
action=PipelineAction.SYNC_NOW, # Default - can be omitted
job_type=JobType.INCREMENTAL, # Default for SYNC_NOW - can be omitted
deferrable=True,
wait_for_completion=True,
poll_interval=10,
accept_completed_with_failures=False,
ensure_new_job=True, # Default: True - prevents duplicate jobs
connection_id="hevo_production"
)
# Full historical resync
resync_pipeline = HevoPipelineOperator(
task_id="resync_pipeline",
pipeline_id=123,
action=PipelineAction.RESYNC, # Full historical reload
deferrable=True,
wait_for_completion=True,
poll_interval=30, # Less frequent polling for long jobs
accept_completed_with_failures=True
)
# Trigger and continue
trigger_only = HevoOperator(
task_id="trigger_only",
pipeline_id=456,
deferrable=False,
wait_for_completion=False # Returns job_id via XCom
)