Share

Ingestion and Loading Frequency

Hevo Pipelines ingest data from your Sources via the Source Connectors and load it to a Destination of your choice using the Consumers. The Source Connectors run multiple ingestions tasks as per a schedule to fetch data from your Source. This schedule is the ingestion frequency, and its default value depends on the Source type.

Similarly, to load the ingested data to your Destinations, the Consumers run several tasks on a schedule, termed the loading frequency. The loading frequency affects all Pipelines using the Destination, and its default value depends on the Destination type.

Ingestion Frequency

The ingestion frequency is driven by factors such as the API rate limits, the network throughput, and the Source throughput. As a result, most Sources have a:

  • Minimum ingestion frequency: The maximum amount of time that Hevo waits before it polls the Source for new data. For example, Hevo may need to ingest data from the Source at least once every 24 hours to avoid the risk of losing data.

    Note: For teams created after Release 2.21, Hevo has changed the minimum ingestion frequency for all Sources to 30 minutes.

  • Default ingestion frequency: If not changed, Hevo waits for at least this amount of time to poll the Source for new data.

    Note: For teams created after Release 2.21, Hevo has changed the default ingestion frequency to 6 hours for table-mode Pipelines and 30 minutes for log-based Pipelines.

  • Maximum ingestion frequency: If not changed, the minimum amount of time that Hevo can wait before polling the Source for new data. For example, some Sources may support ingestion as soon as every 15 minutes.

    Note: For teams created after Release 2.21, Hevo has changed the maximum ingestion frequency to 24 hours for table-mode Pipelines and 12 hours for log-based Pipelines.

  • Custom ingestion frequency: Any user-specified frequency that lies between the minimum and maximum ingestion frequencies depending on the allowed range for the Source. Read the Data Replication section in your Source documentation for this range.

You can change the ingestion frequency for most Sources by changing the Pipeline schedule, except the following:

  • Webhooks: As these Sources ingest data in real-time, it is not possible to change the ingestion frequency.

  • Kafka: A Kafka Source ingests data in real-time. As a result, it is not possible to change the ingestion frequency.

  • SaaS Sources: Some SaaS Sources impose strict API limits to prevent frequent data reads. As a result, the ingestion frequency for such SaaS Sources cannot be modified.

For all other Pipelines, you can schedule the ingestion to run:

  • Daily: The ingestion runs as per a fixed schedule every day. For example, you may want to ingest data every two hours, after peak hours.

  • At a fixed interval: The ingestion may be scheduled to ingest data every n minutes or n hours, where n is an integer value. For example, you may want to ingest data from your Facebook Ads every two hours instead of the default one hour.

Read Creating a Custom Ingestion Schedule.

The Pipeline ingestion frequency directly impacts the Events quota and consumption. Read Pipeline Frequency.


Loading Frequency

The process of writing the ingested data to the Destination is termed data loading. Hevo loads data to the database Destinations in real-time and for the data warehouse Destinations, it syncs data as per the loading frequency.

In the case of real-time loading, data is read from the messaging queue and written to the Destination tables. This process takes place via in-memory caching. However, the time taken to load the data to the Destination may exceed the retention period of the messaging queues. In that case, data is written to temporary files in Hevo’s Amazon S3 bucket and then loaded to the Destination tables from there.

For each data warehouse Destination, Hevo writes data at a default loading frequency, and this loading frequency is optimized to reduce the cost of synchronizing the data with the Destination tables. The loading task checks for files to be read from the staging location and if there is no new data to be synced, it skips the loading. You can set a high loading frequency to suit your business requirements. However, based on the Destination, increasing the load frequency may increase the cost of your load queries.

Most data warehouse Destinations in Hevo allow you to set a custom loading schedule. You can schedule the data loading to run:

  • Daily: Set up a fixed daily schedule. For example, you may want to load data every four hours during peak hours, and every hour after peak hours.

  • At a fixed interval: Schedule the data to be loaded every n minutes or n hours, where n is an integer value. For example, you may want to sync data with your high-availability BigQuery instance every 15 minutes instead of the default one hour.

Read Creating a Custom Data Loading Schedule.

Effect of Ingestion Frequency on Data Loading

The ingestion frequency defined at the Source does not directly affect the Destination or the loading frequency. However, if data is loaded to the Destination at a much slower rate than the rate at which it is ingested, the ingested data is stored at the staging location. In the case of log-based Pipelines as well, when the speed at which the data is written to the Destination is slow, the data remains at the staging location.


Last updated on Jun 21, 2024

Tell us what went wrong

Skip to the section