Configuring S3 source endpoint
When creating or updating an endpoint, configure access to S3-compatible storage.
Settings
-
Dataset: Specify the name of an auxiliary table that will be used for the connection.
-
Path Pattern: Enter the path pattern. If the bucket only includes files, use the value
**
. -
Schema: Specify a JSON schema in
{"column": "data type"}
format. Use the value{}
for automatic schema detection based on files. -
Format: Select a format that matches your files (
CSV
,parquet
,Avro
, orJSON Lines
).-
CSV: Specify the settings of CSV files:
- Delimiter: Delimiter character.
- Quote Char: Character used to escape reserved characters.
- Escape Char: Character used to escape special characters.
- Encoding: The encoding used.
- Double Quote: Enable this option to replace double quotes with single quotes.
- Newlines In Values: Enable the option if your text data values might include newline characters.
- Block Size: Size of a data chunk used to read data from files, in bytes.
- Additional Reader Options: Required CSV ConvertOptions to edit. Specified as a JSON-string.
- Advanced Options: Required CSV ReadOptions to edit. Specified as a JSON-string.
-
parquet: Specify parquet-files settings:
- Buffer Size: Size of the buffer used to deserialize specific parts of columns.
- Columns: Columns for reading data. Leave this field empty to read all the columns.
- Batch Size: Maximum number of records in a batch.
-
JSON Lines: Specify the settings for JSON Lines:
- Allow newlines in values: Enable this option to allow newlines in JSON values. This may affect the transfer speed.
- Unexpected field behavior: Specify how to handle JSON fields outside the
explicit_schema
(if the field values are set). For more information, see the PyArrow documentation. - Block Size: Specify the block size (in bytes) from each file to be handled in-memory simultaneously. If the value you set is too large, the
Out of memory
error may occur during the transfer.
-
-
S3: Amazon Web Services — specify settings for the S3 provider:
- Bucket: Bucket name.
- AWS Access Key Id and AWS Secret Access Key: ID and contents of the AWS key used to access a private bucket.
- (Optional) Path Prefix: Prefix for folders and files that shouldn't be processed by AWS.
- (Optional) Endpoint: Services that need to be used but aren't compatible with Amazon S3. Leave this field empty to use the Amazon service.
- Use SSL: Enable to use custom servers over HTTPS. Ignored when using the Amazon service.
- Verify SSL Cert: Enable to skip authentication of the server's SSL certificate. This setting is useful if you use self-signed certificates. Ignored when using the Amazon service.
For more information about settings, see the Airbyte® documentation.
Airbyte® is a registered trademark of Airbyte, Inc in the United States and/or other countries.