Migrating data to Managed Service for ClickHouse
To migrate your database to Managed Service for ClickHouse, you need to transfer the data directly, close the old database for writing, and then transfer the load to the database cluster in Yandex.Cloud.
To transfer data to a Managed Service for ClickHouse cluster, you can use Apache ZooKeeper and clickhouse-copier.
Transfer data to an intermediate virtual machine in Compute Cloud if:
- The Managed Service for ClickHouse cluster isn't accessible from the internet.
- The network equipment or connection to the ClickHouse cluster in Yandex.Cloud isn't reliable enough.
- There is no environment to run
clickhouse-copier
.
Migration stages:
- Prepare for migration.
- Install ZooKeeper.
- Create a cluster Managed Service for ClickHouse.
- Create a task for
clickhouse-copier
. - Add a task for
clickhouse-copier
in ZooKeeper. - Run
clickhouse-copier
.
Prepare for migration
-
Compatible software versions
- ClickHouse versions must be the same in both clusters.
- The
clickhouse-copier
version must be the same or higher than the ClickHouse version in the Managed Service for ClickHouse cluster. - ZooKeeper ver. 3.4.10 and higher.
-
Check that the source cluster is ready to migrate:
- SSL is enabled for encrypting traffic.
- The load on the database or shard that data will be copied from won't create any problems.
clickhouse-copier
has access to the database, and the account used has read-only access.
-
If you are using a virtual machine in Yandex.Cloud for migration:
- Create your VMs in the same cloud network as the Managed Service for ClickHouse cluster.
- The computing power of the VM should be chosen based on the amount of data transferred.
Install Zookeeper
To migrate data, start a ZooKeeper node.
-
Install Java Runtime Environment:
$ sudo apt-get install default-jre
-
Add the user that you want to run ZooKeeper under:
$ sudo adduser hadoop
-
Create a directory for ZooKeeper data:
$ sudo mkdir -p /var/data/zookeeper $ sudo chown -R hadoop:hadoop /var/data
-
Install ZooKeeper (single-node setup):
-
Download a distribution, such as version 3.4.13:
$ cd /opt $ sudo wget http://apache.is.co.za/zookeeper/zookeeper-3.4.13/zookeeper-3.4.13.tar.gz $ sudo tar -xvf zookeeper-3.4.13.tar.gz $ sudo chown hadoop:hadoop -R zookeeper-3.4.13
-
Switch to the user you created earlier to launch ZooKeeper:
$ su hadoop
-
Create the file
/opt/zookeeper-3.4.13/conf/zoo.cfg
with the following contents:tickTime=2000 dataDir=/var/data/zookeeper clientPort=2181
-
The master node must have a unique ID. To do this, create a
/var/data/zookeeper/myid
file with ID as its contents (for example, "1").
-
-
To launch ZooKeeper for debugging:
$ bash /opt/zookeeper-3.4.13/bin/zkServer.sh start-foreground
-
To launch ZooKeeper as normal:
$ bash /opt/zookeeper-3.4.13/bin/zkServer.sh start
Create a cluster Managed Service for ClickHouse
Make sure that the computing power and storage size of the cluster are appropriate for the environment where the existing databases are deployed and create a cluster.
Create a task for clickhouse-copier
To launch clickhouse-copier
using ZooKeeper, you need to prepare:
- A ZooKeeper configuration file (
config.xml
). - A file describing the task (
cp-task.xml
).
Instructions for clickhouse-copier
can be found in the ClickHouse documentation.
Prepare a configuration file for ZooKeeper
The configuration file (config.xml
) has to specify:
- In the
<zookeeper>
element — The address of the host where you installed ZooKeeper. - In the
<caConfig>
element — The path to the certificate for connecting to Managed Service for ClickHouse.
You can download a certificate at https://storage.yandexcloud.net/cloud-certs/CA.pem.
Sample configuration:
<yandex>
<zookeeper>
<node>
<host>127.0.0.1</host>
<port>2181</port>
</node>
</zookeeper>
<logger>
<level>trace</level>
<log>log.log</log>
<errorlog>log.err.log</errorlog>
<size>never</size>
</logger>
<openSSL>
<client>
<loadDefaultCAFile>true</loadDefaultCAFile>
<caConfig>CA.pem</caConfig>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
<invalidCertificateHandler>
<name>RejectCertificateHandler</name>
</invalidCertificateHandler>
</client>
</openSSL>
</yandex>
Describe the task
Example of data migration task description (cp-task.xml
):
<yandex>
<tcp_port_secure>9440</tcp_port_secure> <!-- optional -->
<remote_servers>
<source_cluster>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>your-clickhouse-server.com</host>
<port>9440</port>
<user>your-user</user>
<password>password</password>
<secure>1</secure>
</replica>
</shard>
</source_cluster>
<destination_cluster>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>your-cloud-clickhouse.mdb.cloud.yandex.net</host>
<port>9440</port>
<user>your-user</user>
<password>password</password>
<secure>1</secure>
</replica>
</shard>
</destination_cluster>
</remote_servers>
<max_workers>2</max_workers>
<settings_pull>
<readonly>1</readonly>
</settings_pull>
<settings_push>
<readonly>0</readonly>
</settings_push>
<settings>
<connect_timeout>3</connect_timeout>
<insert_distributed_sync>1</insert_distributed_sync>
</settings>
<tables>
<table_hits>
<cluster_pull>source_cluster</cluster_pull>
<database_pull>your-db</database_pull>
<table_pull>your-table</table_pull>
<cluster_push>destination_cluster</cluster_push>
<database_push>your-db</database_push>
<table_push>your-table</table_push>
<engine>
<!-- Description of the table engine, the same as in the CREATE TABLE procedure
in the source cluster. For an existing table, you can
get a description with the query SHOW CREATE table_name -->
ENGINE = MergeTree() PARTITION BY Year ORDER BY (Year, FlightDate) SETTINGS index_granularity=8192
</engine>
<sharding_key>jumpConsistentHash(intHash64(Year), 2)</sharding_key>
<enabled_partitions>
<partition>'2017'</partition>
</enabled_partitions>
</table_hits>
</tables>
</yandex>
Add a task for clickhouse-copier to Zookeeper
To add a task in ZooKeeper, run the following commands:
/opt/zookeeper-3.4.13/bin/zkCli.sh -server localhost:2181 rmr /cp-task.xml/description
/opt/zookeeper-3.4.13/bin/zkCli.sh -server localhost:2181 rmr /cp-task.xml/task_active_workers
/opt/zookeeper-3.4.13/bin/zkCli.sh -server localhost:2181 rmr /cp-task.xml
fc=$(cat ./cp-task.xml)
/opt/zookeeper-3.4.13/bin/zkCli.sh -server localhost:2181 create /cp-task.xml ""
/opt/zookeeper-3.4.13/bin/zkCli.sh -server localhost:2181 create /cp-task.xml/description "$fc"
Launch clickhouse-copier
Warning
If you didn't create the folders that you specified in the --base-dir
flag or for saving logs, clickhouse-copier
might not start.
You can start the copier using the following command (to run it in daemon mode, add the --daemon
flag):
$ clickhouse-copier
--config ./config.xml \
--task-path ./cp-task.xml \
--base-dir ./clickhouse \
--log-level debug
Once the operation is complete, check the logs to make sure that everything was copied successfully.