Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Blog
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
Yandex Data Transfer
  • Available transfers
  • Getting started
  • Step-by-step guide
  • Practical guidelines
    • All tutorials
    • Data migration
    • Data delivery
      • Delivering data from Yandex Managed Service for Apache Kafka® to Yandex Managed Service for ClickHouse
      • Delivering data from Yandex Managed Service for Apache Kafka® to Yandex Managed Service for YDB
      • Delivering data from Greenplum® to ClickHouse
      • Delivering data from Greenplum® to PostgreSQL
      • Delivering data from Yandex Managed Service for MySQL to Yandex Managed Service for Apache Kafka®
      • Delivering data from Yandex Managed Service for MySQL to Yandex Object Storage
      • Delivering data from Yandex Managed Service for PostgreSQL to Yandex Managed Service for Apache Kafka®
      • Delivering data from Yandex Managed Service for PostgreSQL to Yandex Object Storage
    • Asynchronous replication of data
    • Saving data streams
  • Concepts
  • Troubleshooting
  • Access management
  • Pricing policy
  • API reference
  • Questions and answers
  1. Practical guidelines
  2. Data delivery
  3. Delivering data from Greenplum® to ClickHouse

Delivering data from Greenplum® to ClickHouse

Written by
Yandex Cloud
  • Before you begin
  • Set up the transfer
  • Activate the transfer
  • Check the copy function upon re-activation
  • Delete the resources you created

You can migrate a database from Greenplum® to ClickHouse using Yandex Data Transfer.

To transfer a database from Greenplum® to ClickHouse:

  1. Set up the transfer.
  2. Activate the transfer.
  3. Check the copy function upon re-activation.

If you no longer need these resources, delete them.

Before you begin

We'll create all the required resources for the example in Yandex Cloud. Prepare the infrastructure:

Manually
Using Terraform
  1. Create a Yandex Managed Service for Greenplum® source cluster of any suitable configuration.

  2. Create a Yandex Managed Service for ClickHouse target cluster with any configuration with a database called db1.

  3. Make sure that the cluster's security groups have been set up correctly and allow connecting to them:

    • Managed Service for ClickHouse.
    • Managed Service for Greenplum®.
  1. If you don't have Terraform, install and configure it.

  2. Download the file with provider settings. Place it in a separate working directory and specify the parameter values.

  3. Download the greenplum-clickhouse.tf configuration file to the same working directory.

    This file describes:

    • Networks and subnets hosting the clusters.
      * Security groups for connecting to clusters.
    • Managed Service for Greenplum® source cluster.
    • Managed Service for ClickHouse target cluster.
  4. In the greenplum-clickhouse.tf configuration file, specify the Greenplum® and ClickHouse administrator passwords.

  5. Run the command terraform init in the directory with the configuration file. This command initializes the provider specified in the configuration files and enables you to use the provider resources and data sources.

  6. Make sure the Terraform configuration files are correct using the command:

    terraform validate
    

    If there are errors in the configuration files, Terraform will point to them.

  7. Create the required infrastructure:

    1. Run the command to view planned changes:

      terraform plan
      

      If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.

    2. If you are happy with the planned changes, apply them:

      1. Run the command:

        terraform apply
        
      2. Confirm the update of resources.

      3. Wait for the operation to complete.

    All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console.

Set up the transfer

  1. Create a source endpoint of type Greenplum®, and within the endpoint, specify the cluster connection settings.

  2. Create a target endpoint with the ClickHouse type.

  3. Create a transfer of the Snapshot type that will use the created endpoints.

    Replication is not available for this endpoint pair, but you can set up regular copying when creating a transfer. To do this, in the Transfer parameters Copy field, select Regular and specify the copy interval. This will activate a transfer automatically after the specified time interval.

    Warning

    Before configuring regular copy, please make sure that the target endpoint parameters include either a Drop or a Truncate cleanup policy. Otherwise, data on the target will be duplicated.

Activate the transfer

  1. Connect to the Managed Service for Greenplum® cluster, create a table called x_tab, and populate it with data:

    CREATE TABLE x_tab
    (
        id NUMERIC,
        name CHARACTER(5)
    );
    CREATE INDEX ON x_tab (id);
    INSERT INTO x_tab (id, name) VALUES
    (40, 'User1'),
    (41, 'User2'),
    (42, 'User3'),
    (43, 'User4'),
    (44, 'User5');
    
  2. Activate the transfer and wait for its status to change to Completed.

  3. To verify that the data has transferred correctly, connect to the Managed Service for ClickHouse target cluster and make sure that the columns of the x_tab table in the db1 database match those of the source database x_tab table:

    SELECT id, name FROM db1.x_tab;
    
    ┌─id─┬─name──┐
    │ 40 │ User1 │
    │ 41 │ User2 │
    │ 42 │ User3 │
    │ 43 │ User4 │
    │ 44 │ User5 │
    └────┴───────┘
    

Check the copy function upon re-activation

  1. Connect to the Managed Service for Greenplum® cluster, drop a single row in the x_tab table and update another:

    DELETE FROM x_tab WHERE id = 41;
    UPDATE x_tab SET name = 'Key3' WHERE id = 42;
    
  2. Re-activate the transfer and wait for its status to change to Completed.

  3. Make sure that you see the changes in the x_tab table on the ClickHouse target:

    SELECT id, name FROM db1.x_tab;
    
    ┌─id─┬─name──┐
    │ 42 │ Key3  │
    │ 40 │ User1 │
    │ 43 │ User4 │
    │ 44 │ User5 │
    └────┴───────┘
    

Delete the resources you created

If you no longer need these resources, delete them:

  • Make sure that the transfer's status is Completed and delete it.

  • Delete the source endpoint and the target endpoint.

  • Delete the clusters:

    Manually
    Using Terraform
    • Managed Service for ClickHouse.
    • Managed Service for Greenplum®.

    If you created your resources using Terraform:

    1. In the terminal window, change to the directory containing the infrastructure plan.

    2. Delete the greenplum-clickhouse.tf configuration file.

    3. Make sure the Terraform configuration files are correct using the command:

      terraform validate
      

      If there are errors in the configuration files, Terraform will point to them.

    4. Confirm the update of resources.

      1. Run the command to view planned changes:

        terraform plan
        

        If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.

      2. If you are happy with the planned changes, apply them:

        1. Run the command:

          terraform apply
          
        2. Confirm the update of resources.

        3. Wait for the operation to complete.

      All the resources described in the greenplum-clickhouse.tf configuration file will be deleted.

Was the article helpful?

Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
In this article:
  • Before you begin
  • Set up the transfer
  • Activate the transfer
  • Check the copy function upon re-activation
  • Delete the resources you created