Delivering data from Greenplum® to ClickHouse
You can migrate a database from Greenplum® to ClickHouse using Yandex Data Transfer.
To transfer a database from Greenplum® to ClickHouse:
If you no longer need these resources, delete them.
Before you begin
We'll create all the required resources for the example in Yandex Cloud. Prepare the infrastructure:
-
Create a Yandex Managed Service for Greenplum® source cluster of any suitable configuration.
-
Create a Yandex Managed Service for ClickHouse target cluster with any configuration with a database called
db1
. -
Make sure that the cluster's security groups have been set up correctly and allow connecting to them:
-
If you don't have Terraform, install and configure it.
-
Download the file with provider settings. Place it in a separate working directory and specify the parameter values.
-
Download the greenplum-clickhouse.tf configuration file to the same working directory.
This file describes:
- Networks and subnets hosting the clusters.
* Security groups for connecting to clusters. - Managed Service for Greenplum® source cluster.
- Managed Service for ClickHouse target cluster.
- Networks and subnets hosting the clusters.
-
In the
greenplum-clickhouse.tf
configuration file, specify the Greenplum® and ClickHouse administrator passwords. -
Run the command
terraform init
in the directory with the configuration file. This command initializes the provider specified in the configuration files and enables you to use the provider resources and data sources. -
Make sure the Terraform configuration files are correct using the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
Create the required infrastructure:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console.
-
Set up the transfer
-
Create a source endpoint of type
Greenplum®
, and within the endpoint, specify the cluster connection settings. -
Create a target endpoint with the
ClickHouse
type. -
Create a transfer of the Snapshot type that will use the created endpoints.
Replication is not available for this endpoint pair, but you can set up regular copying when creating a transfer. To do this, in the Transfer parameters Copy field, select Regular and specify the copy interval. This will activate a transfer automatically after the specified time interval.
Warning
Before configuring regular copy, please make sure that the target endpoint parameters include either a
Drop
or aTruncate
cleanup policy. Otherwise, data on the target will be duplicated.
Activate the transfer
-
Connect to the Managed Service for Greenplum® cluster, create a table called
x_tab
, and populate it with data:CREATE TABLE x_tab ( id NUMERIC, name CHARACTER(5) ); CREATE INDEX ON x_tab (id); INSERT INTO x_tab (id, name) VALUES (40, 'User1'), (41, 'User2'), (42, 'User3'), (43, 'User4'), (44, 'User5');
-
Activate the transfer and wait for its status to change to Completed.
-
To verify that the data has transferred correctly, connect to the Managed Service for ClickHouse target cluster and make sure that the columns of the
x_tab
table in thedb1
database match those of the source databasex_tab
table:SELECT id, name FROM db1.x_tab;
┌─id─┬─name──┐ │ 40 │ User1 │ │ 41 │ User2 │ │ 42 │ User3 │ │ 43 │ User4 │ │ 44 │ User5 │ └────┴───────┘
Check the copy function upon re-activation
-
Connect to the Managed Service for Greenplum® cluster, drop a single row in the
x_tab
table and update another:DELETE FROM x_tab WHERE id = 41; UPDATE x_tab SET name = 'Key3' WHERE id = 42;
-
Re-activate the transfer and wait for its status to change to Completed.
-
Make sure that you see the changes in the
x_tab
table on the ClickHouse target:SELECT id, name FROM db1.x_tab;
┌─id─┬─name──┐ │ 42 │ Key3 │ │ 40 │ User1 │ │ 43 │ User4 │ │ 44 │ User5 │ └────┴───────┘
Delete the resources you created
If you no longer need these resources, delete them:
-
Make sure that the transfer's status is Completed and delete it.
-
Delete the clusters:
ManuallyUsing TerraformIf you created your resources using Terraform:
-
In the terminal window, change to the directory containing the infrastructure plan.
-
Delete the
greenplum-clickhouse.tf
configuration file. -
Make sure the Terraform configuration files are correct using the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
Confirm the update of resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the resources described in the
greenplum-clickhouse.tf
configuration file will be deleted. -
-