Share
Elasticsearch Setup Guide

Elasticsearch is a distributed, RESTful search and analytics engine that centrally stores your data so you can search, index, and analyze data of all shapes and sizes. As Elasticsearch relies on indices to search and fetch documents from your data, it preempts operations that may cause memory issues and stops them with exceptions. Hevo parses some of these exceptions and recommends corrective actions. Read Configuration Changes in Elasticsearch to know about these.

Hevo connects to your Elasticsearch cluster using the Elasticsearch Transport Client and synchronizes the data available in the cluster to your preferred data warehouse using indices. Currently, Hevo supports the following variants:

  • Generic Elasticsearch
  • AWS Elasticsearch

Source Considerations

  • Elasticsearch does not have the capability to expose each document modification. As a result, it can be difficult to sort multiple documents if they have the same value in the field used for sorting. Therefore, an additional unique identifier is required to sort them properly. This identifier differs based on your Elasticsearch version.

    • For versions 8.0 and above: Specify the unique identifier field while configuring your objects. This field must be one of the following data types: unsigned long, long, integer, short, byte, float, double, half float, or scaled float.
    • For versions below 8.0: The _id field is used by default as the unique identifier field.

Limitations

  • Only Native Realm authentication is supported.

  • Hevo currently does not support deletes. Therefore, any data deleted in the Source may continue to exist in the Destination.

  • Hevo does not support the replication of hidden objects.

  • Hevo does not load an Event into the Destination table if its size exceeds 128 MB, which may lead to discrepancies between your Source and Destination data. To avoid such a scenario, ensure that each row in your Source objects contains less than 100 MB of data.



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Jan-07-2025 NA Updated the Limitations section to add information on Event size.
Nov-18-2024 NA Renamed section Set up the EC2 instance to Set up the EC2 instance and Whitelist Hevo’s IP addresses and updated it as per the latest Elasticsearch UI.
Mar-05-2024 2.21 Updated the ingestion frequency table in the Data Replication section.
Jan-16-2024 NA Updated section, Source Considerations to add information about _id field being used for sorting only in specific Elasticsearch versions.
Jul-21-2023 NA Updated section, Limitations to add information about Hevo not supporting replication for hidden objects.
Nov-22-2022 NA Updated section, Limitations to add information about Hevo not capturing deletes.
Aug-24-2022 NA Updated sections, Data Replication and Configure Elasticsearch Connection Settings to restructure the content for better understanding and coherence.
Jun-09-2022 NA Added a reference to the Configuration Changes in Elasticsearch page in the Overview section.
Apr-11-2022 1.86 Added a note in the Connection Settings about setting up a reverse proxy server for connecting to an AWS Elasticsearch Source.
Feb-21-2022 1.82 Added section, (Optional) Connect to Elasticsearch hosted inside a Virtual Private Cloud (VPC)
Jan-03-2022 1.79 Updated the description of the Include New Tables in the Pipeline advance setting in the Configure Elasticsearch Connection Settings section.
Jul-26-2021 1.68 Added a note for the Database Host field.
Jul-12-2021 1.67 Added the field Include New Tables in the Pipeline under Source configuration settings.
Jun-01-2021 1.64 Updated the Configure Elasticsearch Connection Settings section to include the Connect Through HTTPS setting.
Last updated on Mar 03, 2025

Tell us what went wrong

Skip to the section