Configuring a ClickHouse source endpoint
When creating or editing an endpoint, you can define:
- Yandex Managed Service for ClickHouse cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
- Additional parameters.
Managed Service for ClickHouse cluster
Connecting to the database with the cluster ID specified in Yandex Cloud. Available only for clusters deployed in Managed Service for ClickHouse.
-
MDB cluster ID: Select the cluster to connect to.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
This will let you apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Network in Yandex Data Transfer.
-
Username: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
Database name: Specify the name of the database in the selected cluster.
- Endpoint type:
clickhouse-source
.
-
--cluster-id
: ID of the cluster you need to connect to. -
--database
: Database name. -
--user
: Username that Data Transfer will use to connect to the database. -
--security-group
: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Network in Yandex Data Transfer. -
To set a user password to access the database, use one of the parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
clickhouse_source
.
-
connection.connection_options.mdb_cluster_id
: ID of cluster to connect to. -
subnet_id
: ID of the subnet hosting the cluster. If not specified, the cluster must be accessible from the internet.If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
-
security_groups
: Specify the security groups for network traffic.This will let you apply the specified security group rules to the VMs and clusters in the
subnet_id
network without changing the settings of these VMs and clusters. For more information, see Network in Yandex Data Transfer. -
connection.connection_options.database
: Database name. -
connection.connection_options.user
: Username that Data Transfer will use to connect to the database. -
connection.connection_options.password.raw
: Password in text form.
Example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint name in Terraform>" {
name = "<endpoint name>"
settings {
clickhouse_source {
security_groups = [ "list of security group IDs" ]
subnet_id = "<subnet ID>"
connection {
connection_options {
mdb_cluster_id = "<Managed Service for ClickHouse cluster ID>"
database = "<name of database to transfer>"
user = "<username to connect>"
password {
raw = "<user password>"
}
}
}
<advanced endpoint settings>
}
}
}
For more information, see the Terraform provider documentation.
-
securityGroups
: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Network in Yandex Data Transfer. -
mdbClusterId
: ID of the cluster you need to connect to. -
database
: Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Custom installation
Connecting to the database with explicitly specified network addresses and ports.
-
HTTP port: Set the number of the port that Data Transfer will use for the connection.
When connecting via the HTTP port:
- For optional fields, default values are used (if any).
- Recording complex types is supported (such as
array
andtuple
).
-
Native port: Set the number of the native port that Data Transfer will use for the connection.
-
PEM certificate: If transmitted data needs to be encrypted, for example, to meet the requirements of PCI DSS, upload the certificate file or add its contents as text.
-
Shards
- ID: Specify a row that will allow the service to distinguish shards from each other.
- Hosts: Specify FQDNs or IP addresses of the hosts in the shard.
-
Connection via SSL: Enable if the cluster supports only encrypted connections.
-
Subnet ID: Select or create a subnet in the desired availability zone.
If the source and target are geographically close, connecting over the selected subnet speeds up the transfer.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
This will let you apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Network in Yandex Data Transfer.
-
Username: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
Database name: Specify the name of the database in the selected cluster.
- Endpoint type:
clickhouse-source
.
-
--host
: IP address or FQDN of the master host you want to connect to. -
--port
: Number of the port that Data Transfer will use for the connection. -
--ca-certificate
— If the transmitted data needs to be encrypted, for example, to meet the requirements of PCI DSS. -
--subnet-id
: ID of the subnet the host resides in. -
--database
: Database name. -
--user
: Username that Data Transfer will use to connect to the database. -
--security-group
: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Network in Yandex Data Transfer. -
To set a user password to access the database, use one of the parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
clickhouse_source
.
-
Shard settings:
connection.connection_options.on_premise.shards.name
: Shard name that the service will use to distinguish shards from each other.connection.connection_options.on_premise.shards.hosts
: specify the FQDNs or IP addresses of the hosts in the shard.
-
connection.connection_options.on_premise.http_port
: Port number that Data Transfer will use for HTTP connections. -
connection.connection_options.on_premise.native_port
: Port number that Data Transfer will use for connections to the ClickHouse native interface. -
connection.connection_options.on_premise.tls_mode.enabled.ca_certificate
: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements. -
subnet_id
: ID of the subnet hosting the cluster. If not specified, the cluster must be accessible from the internet.If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
-
security_groups
: Specify the security groups for network traffic.This will let you apply the specified security group rules to the VMs and clusters in the
subnet_id
network without changing the settings of these VMs and clusters. For more information, see Network in Yandex Data Transfer. -
connection.connection_options.database
: Database name. -
connection.connection_options.user
: Username that Data Transfer will use to connect to the database. -
connection.connection_options.password.raw
: Password in text form.
Example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint name in Terraform>" {
name = "<endpoint name>"
settings {
clickhouse_source {
security_groups = [ "list of security group IDs" ]
subnet_id = "<subnet ID>"
connection {
connection_options {
on_premise {
http_port = "<HTTP connection port>"
native_port = "<native interface connection port>"
shards {
name = "<shard name>"
hosts = [ "shard host IP or FQDN list" ]
}
tls_mode {
enabled {
ca_certificate = "<certificate in PEM format>"
}
}
}
database = "<name of database being transferred>"
user = "<username for connection>"
password {
raw = "<user password>"
}
}
}
<advanced endpoint settings>
}
}
}
For more information, see the Terraform provider documentation.
onPremise
: Database connection parameters:-
hosts
— IP address or FQDN of the master host to connect to. -
port
: The number of the port that Data Transfer will use for the connection. -
tlsMode
: Parameters of encryption of transmitted data if it is required, for example, to meet the requirements of PCI DSS. -
subnetId
: ID of the subnet the host resides in.
-
-
securityGroups
: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Network in Yandex Data Transfer. -
database
: Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Additional settings
-
List of included tables: Data is only transferred from the listed tables.
When you add new tables when editing an endpoint used in Snapshot and increment or Increment transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects to be transferred field in the transfer settings.
-
List of excluded tables: Data from these listed tables is not transferred.
Both lists support expressions in the following format:
<schema name>.<table name>
: Fully qualified table name.<schema name>.*
: All tables in the specified schema.<table name>
: Table in the default schema.
Leave the lists empty to transfer all the tables.
-
include_tables
: List of included tables. If this is on, the data will only be transferred from the tables in this list.When you add new tables when editing an endpoint used in Snapshot and increment or Increment transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects to be transferred field in the transfer settings.
-
exclude_tables
: List of excluded tables. Data from tables on this list will not be transmitted.
Both lists support expressions in the following format:
<schema name>.<table name>
: Fully qualified table name.<schema name>.*
: All tables in the specified schema.<table name>
: Table in the default schema.
Known limitations
Transfers will fail if ClickHouse source tables contain the following types of columns:
Type | Error example |
---|---|
Int128 |
unhandled type Int128 |
Int256 |
unhandled type Int256 |
UInt128 |
unhandled type UInt128 |
UInt256 |
unhandled type UInt256 |
Bool |
unhandled type Bool |
Date32 |
unhandled type Date32 |
JSON |
unhandled type '<field name> <type name>' |
Array(Date) |
Can't transfer type 'Array(Date)', column '<column name>' |
Array(DateTime) |
Can't transfer type 'Array(DateTime)', column '<column name>' |
Array(DateTime64) |
Can't transfer type 'Array(DateTime64)', column '<column name>' |
Map(,) |
unhandled type Map(<type name>, <type name>) |
Supported table types
If a ClickHouse cluster is multi-host, you can transfer tables and materialized views based on ReplicatedMergeTree
or Distributed
engines only. Moreover, these tables and views must be present in all cluster hosts.
If the list of included tables contains tables or views with other engines or they're missing in some cluster hosts, a transfer will fail with an error saying the following tables have not Distributed or Replicated engines and are not yet supported
.