Amplitude Analytics helps generate thorough product analytics of web and mobile application usages to help you make data driven decisions. You can replicate data from your Amplitude account to a database, data warehouse, or file storage system using Hevo Pipelines.
Note: Hevo fetches data from Amplitude Analytics in a zipped folder to perform the data query.
Prerequisites
Retrieving the Amplitude API Key and Secret
You require an API key and a secret to authenticate Hevo on your Amplitude Analytics account.
-
Log in to your Amplitude account.
-
In the top right corner of the home page, click the Settings ( ) icon, and then select Organization settings.
-
Under Organization settings, click Projects, and then select a project whose data you would like to sync.
-
Under the General tab, click Show to reveal the API Key and Secret Key. Copy and save them securely like any other password.
You can use these credentials while configuring your Hevo Pipeline.
Configuring Amplitude Analytics as a Source
Perform the following steps to configure Amplitude Analytics as the Source in your Pipeline:
-
Click PIPELINES in the Navigation Bar.
-
Click + CREATE PIPELINE in the Pipelines List View.
-
On the Select Source Type page, select Amplitude Analytics.
-
On the Configure your Amplitude Analytics Source page, specify the following:
-
Pipeline Name: A unique name for your Pipeline.
-
API Key: The API key you retrieved from your Amplitude account.
-
Secret Key: The secret key you retrieved from your Amplitude account.
-
Historical Sync Duration: The duration for which you want to ingest the existing data from the Source. Default duration: 3 Months.
Note: If you select All Available Data, Hevo ingests all the data available in your Amplitude Analytics account since January 01, 2012.
-
Click TEST & CONTINUE.
-
Proceed to configuring the data ingestion and setting up the Destination.
Data Replication
For Teams Created |
Default Ingestion Frequency |
Minimum Ingestion Frequency |
Maximum Ingestion Frequency |
Custom Frequency Range (in Hrs) |
Before Release 2.21 |
1 Hr |
1 Hr |
24 Hrs |
1-24 |
After Release 2.21 |
6 Hrs |
30 Mins |
24 Hrs |
1-24 |
Note: The custom frequency must be set in hours as an integer value. For example, 1, 2, or 3 but not 1.5 or 1.75.
-
Historical Data : The first run of the Pipeline ingests historical data for the selected objects on the basis of the historical sync duration specified at the time of creating the Pipeline and loads it to the Destination. Default duration: 3 Months.
-
Incremental Data: Once the historical load is complete, data is ingested as per the ingestion frequency in Full Load or Incremental mode, as applicable.
Note: The size of the zipped folder from the Source must not exceed 4 GB, else, the data query fails with an exception, Invalid CEN header. In the case of an exception, Hevo automatically adjusts the ingestion duration of the historical load and the incremental data, ingesting the data in smaller zip files over multiple cycles.
Data Model
The following is the list of tables (objects) that are created at the Destination when you run the Pipeline.
Table Name |
Description |
Cohort |
A list of all unique behavioural cohorts created within Amplitude |
Event |
An action that a user takes in your product. This could be anything from pushing a button, completing a level, or making a payment |
Event Category |
All event data is mapped to an Event Category entity which helps to categorise and describe live events and properties. |
Event Type |
All events are mapped to an Event Type entity which is maintained in this table. |
Group |
Each grouping of users that is created in Amplitude along with their dedicated name and description. |
User |
Any person who has logged at least one event and to whom events are attributed. |
User Cohort |
A mapping between User and the User Cohort they belong in. |
User Group |
Groups of users defined by their actions within a specific time period. |
Schema and Primary Keys
Hevo uses the following schema to upload the records in the Destination:
The User object defines each unique user through a combination of User ID, Amplitude ID, and Device ID. You can reference these three columns while making joins to the Event object.
Read the detailed Hevo documentation for the following related topics:
Limitations
-
There is a two hour delay in the data exported from Amplitude Analytics getting loaded into your data warehouse.
For example, data sent between 8-9 PM begins to load at 9 PM and becomes available in your Destination after 11 PM, depending on the load frequency you have set.
-
Hevo does not load an Event into the Destination table if its size exceeds 128 MB, which may lead to discrepancies between your Source and Destination data. To avoid such a scenario, ensure that each row in your Source objects contains less than 100 MB of data.
Revision History
Refer to the following table for the list of key updates made to this page:
Date |
Release |
Description of Change |
Jan-07-2025 |
NA |
Updated the Limitations section to add information on Event size. |
Nov-05-2024 |
NA |
Updated section, Retrieving the Amplitude API Key and Secret as per the latest Amplitude Analytics UI. |
Mar-05-2024 |
2.21 |
Updated the ingestion frequency table in the Data Replication section. |
Apr-04-2023 |
NA |
Updated section, Configuring Amplitude Analytics as a Source to update the information about historical sync duration. |
Jun-21-2022 |
1.91 |
- Modified section, Configuring Amplitude Analytics as a Source to reflect the latest UI changes. - Updated the Pipeline frequency information in the Data Replication section. |
Mar-07-2022 |
1.83 |
Updated the introduction paragraph and the section,Data Replication, about automatic adjustment of ingestion duration. |
Oct-25-2021 |
NA |
Added the Pipeline frequency information in the Data Replication section. |
Apr-06-2021 |
1.60 |
- Added a note to the section Schema and Primary Keys - Updated the ERD. The User object now has three fields, user_id , amplitude_id and device_id as primary keys. The field uuid in the Event object is also a primary key now. |