Share
Amazon S3 Setup Guide

Prerequisites


Create an IAM Policy

Create an IAM policy with the ListBucket and GetObject permissions. These permissions are required for Hevo to access data from your S3 bucket.

To do this:

  1. Log in to the AWS IAM Console.

  2. In the left navigation pane, under Access management, click Policies.

    AWS nav bar

  3. In the Policies page, click Create policy.

    Create policy

  4. In the Specify permissions page, click JSON, and in the Policy editor section, paste the following JSON statements:

    JSON tab

    Note: Replace the placeholder values in the commands below with your own. For example, <your_bucket_name> with Hevo-S3-bucket.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject",
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::<your_bucket_name>",
                    "arn:aws:s3:::<your_bucket_name>/*"
                ]
            }
        ]
    }
    

    The JSON statements allow Hevo to access and ingest data from the bucket name you specify.

    JSON statements

  5. At the bottom of the page, click Next.

    Next page

  6. In the Review and create page, specify the Policy name, and at the bottom of the page, click Create policy.

    Create Policy


Obtain Amazon S3 Credentials

You must generate either access credentials or IAM role-based credentials and assign them the IAM policy to access and ingest your S3 data.

Generate IAM role-based credentials

To generate your IAM role-based credentials, you need to create an IAM role for Hevo and assign the policy that you created in Step 1 above, to the role. Use the Amazon Resource Name (ARN) and external ID from this role while creating your Pipeline.

1. Create an IAM role and assign the IAM policy

  1. Log in to the AWS IAM Console.

  2. In the left navigation pane, under Access Management, click Roles.

    Role Nav bar

  3. In the Roles page, click Create role.

    Create role

  4. In the Trusted entity type section, select AWS account.

    AWS account

  5. In the An AWS account section, select Another AWS account, and in the Account ID field, specify Hevo’s Account ID, 393309748692.

    Account ID

    This account ID enables you to assign a role to Hevo and ingest data from your S3 bucket for replicating it to your desired Destination.

  6. In the Options section, select the Require external ID check box, specify an External ID of your choice, and click Next.

    External ID

  7. In the Add Permissions page, select the policy that you created in Step 1 above, and at the bottom of the page, click Next.

    Select Policy

  8. In the Name, review, and create page, specify the Role name and Description of your choice, and at the bottom of the page, click Create role.

    Role Description

You are redirected to the Roles page.

2. Obtain the ARN and external ID

  1. In the Roles page of your IAM console, click the role that you created above.

    Click Created Role

  2. In the <Role name> page, Summary section, click the copy icon below the ARN field and save it securely like any other password.

    Copy ARN

  3. In the Trust relationships tab, copy the external ID corresponding to the sts:ExternalID field. For example, hevo-role-external-id in the image below.

    Copy External ID

You can use this ARN and external ID while configuring your Pipeline.

Generate access credentials

Your access credentials include the access key and the secret access key. To generate these, you need to create an IAM user for Hevo and assign the policy you created in Step 1 above, to it.

Note: The secret key is associated with an access key and is visible only once. Therefore, you must make sure to save the details or download the key file for later use.

1. Create an IAM user and assign the IAM policy

  1. Log in to the AWS IAM Console.

  2. In the left navigation pane, under Access management, click Users.

    Users nav bar

  3. In the Users page, click Add users.

    Add User

  4. In the Specify user details page, specify the User name, and click Next.

    User Details

  5. In the Set permissions page, Permissions options section, click Attach policies directly.

    Attach Policy

  6. In the Permissions policies section, search and select the check box corresponding to the policy that you created in Step 1 above, and at the bottom of the page, click Next.

    Select Policy

  7. At the bottom of the Review and create page, click Create user.

    Create User

2. Generate the access keys

  1. In the Users page of your IAM console, click the user that you created above.

    Select User

  2. In the <User name> page, select the Security credentials tab.

    Security Credentials

  3. In the Access keys section, click Create access key.

    Create Access Key

  4. In the Access key best practices & alternatives page, select Command Line Interface (CLI).

    Access key practices

  5. At the bottom of the page, select the I understand the above…. check box and click Next.

    Terms and Conditions

  6. (Optional) Specify a description for the access key.

    Access Key description

  7. Click Create access key.

  8. In the Retrieve access keys page, Access key section, click the copy icon in the Access key and Secret access key fields and save the keys securely like any other password.

    Optionally, click Download .csv file to save the keys on your local machine.

    Note: Once you leave this page, you cannot view these keys again.

    Save Keys

You can use these keys while configuring your Pipeline.


Configure Amazon S3 as a Source

Perform the following steps to configure S3 as the Source in your Pipeline:

  1. Click PIPELINES in the Navigation Bar.

  2. Click + CREATE PIPELINE in the Pipelines List View.

  3. In the Select Source Type page, select S3.

  4. In the Configure your S3 Source page, specify the following:

    S3 settings

    • Pipeline Name: A unique name for the Pipeline.

    • Source Setup: The credentials needed to allow Hevo to access data from your S3 account. Select one of the following setup methods:

  5. Click TEST & CONTINUE.

  6. In the Data Root section, specify the following. The data root signifies the directories or files that contain your data. By default, the files are listed from the root directory.

    Select folders to be ingested

    • Select the folders from which you want to ingest data.

      Note: If Hevo cannot retrieve the list of files from your S3 bucket, it displays the Path Prefix field. In this situation, you must specify the prefix of the path for the directory that contains your data. To specify the path prefixes for multiple files, you can click the Plus ( Plus icon ) icon.

      Path Prefix

    • File Format: The format of the data file in the selected folders. Hevo supports AVRO, CSV, JSON, TSV, and XML formats.

      Note: You can select only one file format at a time. If your Source data is in a different format, you can export the data to either of the supported formats and then ingest the files.

      Based on the format you select, you must specify some additional settings:

      • Field Delimiter: The character on which the fields in each line are separated. For example, \t or ,.

        Note: This field is visible only for CSV data.

      • Create Events from child nodes: If enabled, Hevo loads each node present under the root node in the XML file as a separate Event. If disabled, Hevo combines and loads all nodes present in the XML file as a single Event.

        Note: This field is visible only for XML data.

      • Header Row: The row number in your CSV file whose data you want Hevo to use as column headers. Hevo starts ingesting data from the specified header row in your CSV file, thus skipping all the rows before it. Default value: 1.

        If you set the header row to 0, Hevo automatically generates the column headers during ingestion. Refer to the Example to understand this behavior.

        Note: This field is visible only for CSV data.

      • Include compressed files: If enabled, Hevo also ingests the compressed files of the selected file format from the folders. Hevo supports the tar.gz and zip compression types only. If disabled, Hevo does not ingest any compressed file present in the selected folders.

        Note: This field is visible for all supported data formats.

      • Create Event Types from folders: If enabled, Hevo ingests each subfolder as a separate Event Type. If disabled, Hevo merges subfolders into their parent folders and ingests them as one Event Type.

        Note: This field is visible for all supported data formats.

      • Convert date/time format fields to timestamp: If enabled, Hevo converts the date/time format within the files of selected folders to timestamp. For example, 07/11/2022, 12:39:23 to 1667804963. If disabled, Hevo ingests the datetime fields in the same format.

        Note: This field is visible for all supported data formats.

    • Click CONFIGURE SOURCE.

  7. Proceed to configuring the data ingestion and setting up the Destination.


Data Replication

For Teams Created Default Ingestion Frequency Minimum Ingestion Frequency Maximum Ingestion Frequency Custom Frequency Range (in Hrs)
Before Release 2.21 1 Hr 5 Mins 24 Hrs 1-24
After Release 2.21 6 Hrs 30 Mins 24 Hrs 1-24

Note: The custom frequency must be set in hours as an integer value. For example, 1, 2, or 3 but not 1.5 or 1.75.


Last updated on Sep 03, 2024

Tell us what went wrong

Skip to the section