Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Blog
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
Yandex Data Transfer
  • Available transfers
  • Getting started
  • Step-by-step guide
    • All instructions
    • Preparing for the transfer
    • Configuring endpoints
      • Endpoint management
      • Configuring source endpoints
      • Configuring target endpoints
        • Apache Kafka®
        • ClickHouse
        • Greenplum®
        • MongoDB
        • MySQL
        • Object Storage
        • PostgreSQL
        • Yandex Managed Service for YDB
    • Managing the transfer process
    • Working with databases during the transfer
    • Monitoring the transfer status
  • Practical guidelines
  • Concepts
  • Troubleshooting
  • Access management
  • Pricing policy
  • API reference
  • Questions and answers
  1. Step-by-step guide
  2. Configuring endpoints
  3. Configuring target endpoints
  4. ClickHouse

Configuring a ClickHouse target endpoint

Written by
Yandex Cloud
,
improved by
Alexey K.
  • Managed Service for ClickHouse cluster
  • Custom installation
  • Additional settings

When creating or editing an endpoint, you can define:

  • Yandex Managed Service for ClickHouse cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
  • Additional parameters.

Managed Service for ClickHouse cluster

Connecting to the database with the cluster ID specified in Yandex Cloud. Available only for clusters deployed in Yandex Managed Service for ClickHouse.

Management console
CLI
Terraform
API
  • MDB cluster ID: Select the cluster to connect to.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

    This will let you apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Network in Yandex Data Transfer.

  • Username: Specify the username that Data Transfer will use to connect to the database.

  • Password: Enter the user's password to the database.

  • Database name: Specify the name of the database in the selected cluster.

  • Endpoint type: clickhouse-target.
  • --cluster-id: ID of the cluster you need to connect to.

  • --database: Database name.

  • --user: Username that Data Transfer will use to connect to the database.

  • --security-group: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Network in Yandex Data Transfer.

  • To set a user password to access the database, use one of the parameters:

    • --raw-password: Password as text.

    • --password-file: The path to the password file.

  • Endpoint type: clickhouse_target.
  • connection.connection_options.mdb_cluster_id: ID of cluster to connect to.

  • subnet_id: ID of the subnet hosting the cluster. If not specified, the cluster must be accessible from the internet.

    If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.

  • security_groups: Specify the security groups for network traffic.

    This will let you apply the specified security group rules to the VMs and clusters in the subnet_id network without changing the settings of these VMs and clusters. For more information, see Network in Yandex Data Transfer.

  • connection.connection_options.database: Database name.

  • connection.connection_options.user: Username that Data Transfer will use to connect to the database.

  • connection.connection_options.password.raw: Password in text form.

Example of the configuration file structure:

resource "yandex_datatransfer_endpoint" "<endpoint name in Terraform>" {
  name = "<endpoint name>"
  settings {
    clickhouse_target {
      security_groups = [ "list of security group IDs" ]
      subnet_id       = "<subnet ID>"
      connection {
        connection_options {
          mdb_cluster_id = "<Managed Service for ClickHouse cluster ID>"
          database       = "<name of database to transfer>"
          user           = "<username to connect>"
          password {
            raw = "<user password>"
          }
        }
      }
      <advanced endpoint settings>
    }
  }
}

For more information, see the Terraform provider documentation.

  • securityGroups: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Network in Yandex Data Transfer.

  • mdbClusterId: ID of the cluster you need to connect to.

  • database: Database name.

  • user: Username that Data Transfer will use to connect to the database.

  • password.raw: Database user password (in text form).

Custom installation

Connecting to the database with explicitly specified network addresses and ports.

Management console
CLI
Terraform
API
  • HTTP port: Set the number of the port that Data Transfer will use for the connection.

    When connecting via the HTTP port:

    • For optional fields, default values are used (if any).
    • Recording complex types is supported (such as array and tuple).
  • Native port: Set the number of the native port that Data Transfer will use for the connection.

  • PEM certificate: If transmitted data needs to be encrypted, for example, to meet the requirements of PCI DSS, upload the certificate file or add its contents as text.

  • Shards

    • ID: Specify a row that will allow the service to distinguish shards from each other.
    • Hosts: Specify FQDNs or IP addresses of the hosts in the shard.
  • Connection via SSL: Enable if the cluster supports only encrypted connections.

  • Subnet ID: Select or create a subnet in the desired availability zone.

    If the source and target are geographically close, connecting over the selected subnet speeds up the transfer.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

    This will let you apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Network in Yandex Data Transfer.

  • Username: Specify the username that Data Transfer will use to connect to the database.

  • Password: Enter the user's password to the database.

  • Database name: Specify the name of the database in the selected cluster.

  • Endpoint type: clickhouse-target.
  • --host: IP address or FQDN of the master host you want to connect to.

  • --port: Number of the port that Data Transfer will use for the connection.

  • --ca-certificate — If the transmitted data needs to be encrypted, for example, to meet the requirements of PCI DSS.

  • --subnet-id: ID of the subnet the host resides in.

  • --database: Database name.

  • --user: Username that Data Transfer will use to connect to the database.

  • --security-group: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Network in Yandex Data Transfer.

  • To set a user password to access the database, use one of the parameters:

    • --raw-password: Password as text.

    • --password-file: The path to the password file.

  • Endpoint type: clickhouse_target.
  • Shard settings:

    • connection.connection_options.on_premise.shards.name: Shard name that the service will use to distinguish shards from each other.
    • connection.connection_options.on_premise.shards.hosts: specify the FQDNs or IP addresses of the hosts in the shard.
  • connection.connection_options.on_premise.http_port: Port number that Data Transfer will use for HTTP connections.

  • connection.connection_options.on_premise.native_port: Port number that Data Transfer will use for connections to the ClickHouse native interface.

  • connection.connection_options.on_premise.tls_mode.enabled.ca_certificate: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements.

  • subnet_id: ID of the subnet hosting the cluster. If not specified, the cluster must be accessible from the internet.

    If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.

  • security_groups: Specify the security groups for network traffic.

    This will let you apply the specified security group rules to the VMs and clusters in the subnet_id network without changing the settings of these VMs and clusters. For more information, see Network in Yandex Data Transfer.

  • connection.connection_options.database: Database name.

  • connection.connection_options.user: Username that Data Transfer will use to connect to the database.

  • connection.connection_options.password.raw: Password in text form.

Example of the configuration file structure:

resource "yandex_datatransfer_endpoint" "<endpoint name in Terraform>" {
  name = "<endpoint name>"
  settings {
    clickhouse_target {
      security_groups = [ "list of security group IDs" ]
      subnet_id       = "<subnet ID>"
      connection {
        connection_options {
          on_premise {
            http_port   = "<HTTP connection port>"
            native_port = "<native interface connection port>"
            shards {
              name  = "<shard name>"
              hosts = [ "list of shard host IPs or FQDNs" ]
            }
            tls_mode {
              enabled {
                ca_certificate = "<certificate in PEM format>"
              }
            }
          }
          database = "<name of database to transfer>"
          user     = "<username to connect>"
          password {
            raw = "<user password>"
          }
        }
      }
      <advanced endpoint settings>
    }
  }
}

For more information, see the Terraform provider documentation.

  • onPremise: Database connection parameters:
    • hosts — IP address or FQDN of the master host to connect to.

    • port: The number of the port that Data Transfer will use for the connection.

    • tlsMode: Parameters of encryption of transmitted data if it is required, for example, to meet the requirements of PCI DSS.

    • subnetId: ID of the subnet the host resides in.

  • securityGroups: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Network in Yandex Data Transfer.

  • database: Database name.

  • user: Username that Data Transfer will use to connect to the database.

  • password.raw: Database user password (in text form).

Additional settings

Management console
Terraform
  • Cleanup policy: Select a way to clean up data in the target database before the transfer:

    • Drop: Fully delete tables included in the transfer (default).

      Use this option so that the latest version of the table schema is always transferred to the target database from the source whenever the transfer is activated.

    • Disabled: Do not clean.

      Select this option if only replication without copying data is performed.

    • Truncate: Delete only the data from the tables included in the transfer but leave the schema.

      Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.

  • Sharding configuration: Specify the settings for sharding:

    • No sharding: No sharding is used.

    • Shard by column value: The name of the table column that data will be sharded by. A uniform distribution across shards will be determined by a hash from this column value. Specify the name of the column to be sharded in the appropriate field.

      For sharding by specific column values, specify them in the Mapping field. This field defines the mapping between the column and shard index values (the sequential number of the shard in the name-sorted list of shards), to enable sharding by specific data values.

    • Shard by transfer ID: Data will be distributed across shards based on the transfer ID value. The transfer will ignore the Mapping setting and will only shard the data based on the transfer ID.

      Warning

      If you omit the sharding columns and the Shard by transfer ID setting, all the data will be transferred to the same shard.

  • Rename tables: If necessary, specify the settings for renaming tables during a transfer.

  • Write interval: Specify the delay with which the data should arrive at the target cluster. Increase the value in this field if ClickHouse fails to merge data parts.

  • clickhouse_cluster_name: Specify the name of the cluster that the data will be transferred to.

  • alt_names: If necessary, set rules for renaming the source database tables when transferring them to the target database:

    • from_name: Source table name.
    • to_name: Target table name.
  • sharding.column_value_hash.column_name: The name of the column in tables that data will be sharded by. A uniform distribution across shards will be determined by a hash from this column value.

  • sharding.transfer_id: If true, the data is sharded based on the transfer ID value. The transfer will ignore the sharding.column_value_hash.column_name setting and will only shard the data based on the transfer ID.

    Warning

    If you omit the sharding columns and the sharding.transfer_id setting, all the data will be moved to the same shard.

  • cleanup_policy: Select a way to clean up data in the target database before the transfer:

    • CLICKHOUSE_CLEANUP_POLICY_DROP: Fully delete tables included in the transfer (default).

      Use this option so that the latest version of the table schema is always transferred to the target database from the source whenever the transfer is activated.

    • CLICKHOUSE_CLEANUP_POLICY_DISABLED: Do not clean up.

      Select this option if only replication without copying data is performed.

Was the article helpful?

Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
In this article:
  • Managed Service for ClickHouse cluster
  • Custom installation
  • Additional settings