Apify is an online platform that provides custom solutions, such as automation of data extraction from websites and its related processes, and create bots that can perform recurring actions for you. It also allows you to integrate multiple platforms, such as Google Sheets, Github, Slack, and many other APIs and webhooks.
You can replicate the data from your Apify account to a Destination database or data warehouse using Hevo Pipelines. Apify uses an API token to identify Hevo and authorize the request for accessing account data. Hevo ingests the data in Full Load mode. Refer to section, Data Model for the list of supported objects.
Prerequisites
Obtaining the API Token
The API tokens you generate in Apify do not expire. Therefore, you can use an existing token or create a new one to authenticate Hevo on your Apify account.
Note: You must log in as an Admin user to perform these steps.
To obtain the API token:
-
Log in to your Apify account.
-
In the left navigation pane, click Settings.
-
In the Settings page, click Integrations.
-
Under the Personal API tokens section, do one of the following:
Configuring Apify as a Source
Perform the following steps to configure Apify as the Source in your Pipeline:
-
Click PIPELINES in the Navigation Bar.
-
Click + CREATE PIPELINE in the Pipelines List View.
-
In the Select Source Type page, select Apify.
-
In the Configure your Apify Source page, specify the following:
-
Pipeline Name: A unique name for the Pipeline, not exceeding 255 characters.
-
API Token: The API token that you retrieved from your Apify account.
-
Click TEST & CONTINUE.
-
Proceed to configure the data ingestion and setting up the Destination.
Data Replication
For Teams Created |
Default Ingestion Frequency |
Minimum Ingestion Frequency |
Maximum Ingestion Frequency |
Custom Frequency Range (in Hrs) |
Before Release 2.21 |
1 Hr |
1 Hr |
24 Hrs |
1-24 |
After Release 2.21 |
6 Hrs |
30 Mins |
24 Hrs |
1-24 |
Note: The custom frequency must be set in hours as an integer value. For example, 1, 2, or 3 but not 1.5 or 1.75.
Hevo ingests all the objects in Full Load mode in each run of the Pipeline.
Schema and Primary Keys
Hevo uses the following schema to upload the records in the Destination database:
Data Model
The following is the list of tables (objects) that are created at the Destination when you run the Pipeline:
Object |
Description |
Actor Builds |
Contains details of the executable code (builds) created corresponding to the actors in your Apify account. |
Actor Runs |
Contains details of the executions of an actor. |
Actor Tasks |
Contains details of the multiple variants of an actor created for specific scenarios. |
Actor Versions |
Contains details of the various versions of an actor. |
Actors |
Contains details of the serverless cloud programs created by a user to perform human-like actions. These programs can perform various tasks, such as filling forms, crawling websites, and unsubscribing from promotional mails. |
Datasets |
Contains details of the results retrieved from actions performed by an actor. |
Key Value Stores |
Contains details of the saved records that can be read later. These records can be of any kind, such as images, inputs and outputs of an actor, or HTML documents. |
Schedules |
Contains details of the specific times at which you want to run your actors. Multiple actors and actor tasks can be run in a schedule. |
Read the detailed Hevo documentation for the following related topics:
Source Considerations
-
Pagination: Each API response for an Apify object fetches one page with up to 1000 records except the Actor Versions object, for which fewer number of records may be fetched.
-
Rate Limit: Apify imposes a limit of 30 API calls per second. If the limit is exceeded, Hevo defers the ingestion till the limits reset. Read API Rate Limits to configure a suitable ingestion frequency for your Pipeline.
Limitations
-
Hevo currently does not support deletes. Therefore, any data deleted in the Source may continue to exist in the Destination.
-
You cannot specify a duration for loading the historical data. Hevo loads the entire data present in your Apify account.
-
Hevo does not load an Event into the Destination table if its size exceeds 128 MB, which may lead to discrepancies between your Source and Destination data. To avoid such a scenario, ensure that each row in your Source objects contains less than 100 MB of data.
Revision History
Refer to the following table for the list of key updates made to this page:
Date |
Release |
Description of Change |
Jan-07-2025 |
NA |
Updated the Limitations section to add information on Event size. |
Mar-05-2024 |
2.21 |
Updated the ingestion frequency table in the Data Replication section. |
Sep-22-2023 |
NA |
Updated the page contents to reflect the latest Apify user interface (UI). |
Jan-23-2023 |
2.06 |
New document. |