Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Blog
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
Yandex Data Transfer
  • Available transfers
  • Getting started
  • Step-by-step guide
    • All instructions
    • Preparing for the transfer
    • Configuring endpoints
      • Endpoint management
      • Configuring source endpoints
        • Apache Kafka®
        • AWS CloudTrail
        • BigQuery
        • ClickHouse
        • Eventhub
        • Greenplum®
        • MongoDB
        • MySQL
        • Oracle
        • PostgreSQL
        • S3
        • Yandex Data Streams
        • Yandex Managed Service for YDB
      • Configuring target endpoints
    • Managing the transfer process
    • Working with databases during the transfer
    • Monitoring the transfer status
  • Practical guidelines
  • Concepts
  • Troubleshooting
  • Access management
  • Pricing policy
  • API reference
  • Questions and answers
  1. Step-by-step guide
  2. Configuring endpoints
  3. Configuring source endpoints
  4. S3

Configuring S3 source endpoint

Written by
Yandex Cloud
  • Settings

When creating or updating an endpoint, configure access to S3-compatible storage.

Settings

Management console
  • Dataset: Specify the name of an auxiliary table that will be used for the connection.

  • Path Pattern: Enter the path pattern. If the bucket only includes files, use the value **.

  • Schema: Specify a JSON schema in {"column": "data type"} format. Use the value {} for automatic schema detection based on files.

  • Format: Select a format that matches your files (CSV, parquet, Avro, or JSON Lines).

    • CSV: Specify the settings of CSV files:

      • Delimiter: Delimiter character.
      • Quote Char: Character used to escape reserved characters.
      • Escape Char: Character used to escape special characters.
      • Encoding: The encoding used.
      • Double Quote: Enable this option to replace double quotes with single quotes.
      • Newlines In Values: Enable the option if your text data values might include newline characters.
      • Block Size: Size of a data chunk used to read data from files, in bytes.
      • Additional Reader Options: Required CSV ConvertOptions to edit. Specified as a JSON-string.
      • Advanced Options: Required CSV ReadOptions to edit. Specified as a JSON-string.
    • parquet: Specify parquet-files settings:

      • Buffer Size: Size of the buffer used to deserialize specific parts of columns.
      • Columns: Columns for reading data. Leave this field empty to read all the columns.
      • Batch Size: Maximum number of records in a batch.
    • JSON Lines: Specify the settings for JSON Lines:

      • Allow newlines in values: Enable this option to allow newlines in JSON values. This may affect the transfer speed.
      • Unexpected field behavior: Specify how to handle JSON fields outside the explicit_schema (if the field values are set). For more information, see the PyArrow documentation.
      • Block Size: Specify the block size (in bytes) from each file to be handled in-memory simultaneously. If the value you set is too large, the Out of memory error may occur during the transfer.
  • S3: Amazon Web Services — specify settings for the S3 provider:

    • Bucket: Bucket name.
    • AWS Access Key Id and AWS Secret Access Key: ID and contents of the AWS key used to access a private bucket.
    • (Optional) Path Prefix: Prefix for folders and files that shouldn't be processed by AWS.
    • (Optional) Endpoint: Services that need to be used but aren't compatible with Amazon S3. Leave this field empty to use the Amazon service.
    • Use SSL: Enable to use custom servers over HTTPS. Ignored when using the Amazon service.
    • Verify SSL Cert: Enable to skip authentication of the server's SSL certificate. This setting is useful if you use self-signed certificates. Ignored when using the Amazon service.

For more information about settings, see the Airbyte® documentation.

Airbyte® is a registered trademark of Airbyte, Inc in the United States and/or other countries.

Was the article helpful?

Language / Region
Yandex project
© 2023 Yandex.Cloud LLC