Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Blog
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
Yandex Data Transfer
  • Available transfers
  • Getting started
  • Step-by-step guide
  • Practical guidelines
    • All tutorials
    • Data migration
    • Data delivery
    • Asynchronous replication of data
      • Asynchronously replicating data from Yandex Managed Service for MySQL to Yandex Managed Service for ClickHouse
      • Asynchronously replicating data from Yandex Managed Service for MySQL to Yandex Managed Service for YDB
      • Asynchronously replicating data from PostgreSQL to ClickHouse
      • Asynchronously replicating data from Yandex Managed Service for PostgreSQL to Yandex Managed Service for YDB
    • Saving data streams
  • Concepts
  • Troubleshooting
  • Access management
  • Pricing policy
  • API reference
  • Questions and answers
  1. Practical guidelines
  2. Asynchronous replication of data
  3. Asynchronously replicating data from PostgreSQL to ClickHouse

Asynchronously replicating data from PostgreSQL to ClickHouse

Written by
Yandex Cloud
  • Before you begin
  • Set up the transfer
  • Activate the transfer
  • Test the replication process
  • Select the data from ClickHouse .
  • Delete the resources you created

You can migrate a database from PostgreSQL to ClickHouse using Yandex Data Transfer. To do this:

  1. Set up the transfer.
  2. Activate the transfer.
  3. Test the replication process.
  4. Select the data from the target.

If you no longer need these resources, delete them.

Before you begin

We'll create all the required resources for the example in Yandex Cloud. Prepare the infrastructure:

Manually
Using Terraform
  1. Create a source Managed Service for PostgreSQL cluster in any applicable configuration with publicly available hosts and the following settings:

    • Database name: db1.
    • Username: pg-user.
    • Password: <password for the source>.
  2. Create a Managed Service for ClickHouse target cluster in any applicable configuration with publicly available hosts and the following settings:

    • Number of ClickHouse hosts: At least 2 (to enable replication in the cluster).
    • Database name: db1.
    • Username: ch-user.
    • Password: <password for the target>.
  3. Make sure that the cluster's security groups have been set up correctly and allow connecting to them:

    • Managed Service for ClickHouse.
    • Managed Service for PostgreSQL.
  4. Grant the mdb_replication role to the pg-user in the Managed Service for PostgreSQL cluster.

  1. If you don't have Terraform, install and configure it.

  2. Download the file with provider settings. Place it in a separate working directory and specify the parameter values.

  3. Download the postgresql-to-clickhouse.tf configuration file to the same working directory.

    This file describes:

    • Networks.
    • Subnets.
    • Security groups for making cluster connections.
    • Managed Service for PostgreSQL source cluster.
    • Managed Service for ClickHouse target cluster.
    • Source endpoint.
    • Target endpoint.
    • Transfer.
  4. In the postgresql-to-clickhouse.tf configuration file, specify the PostgreSQL and ClickHouse administrator passwords.

  5. Run the command terraform init in the directory with the configuration file. This command initializes the provider specified in the configuration files and enables you to use the provider resources and data sources.

  6. Make sure the Terraform configuration files are correct using the command:

    terraform validate
    

    If there are errors in the configuration files, Terraform will point to them.

  7. Create the required infrastructure:

    1. Run the command to view planned changes:

      terraform plan
      

      If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.

    2. If you are happy with the planned changes, apply them:

      1. Run the command:

        terraform apply
        
      2. Confirm the update of resources.

      3. Wait for the operation to complete.

    All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console.

Set up the transfer

  1. Connect to the cluster Managed Service for PostgreSQL

  2. In the db1 database, create a table named x_tab, and populate it with data:

    CREATE TABLE x_tab
    (
        id NUMERIC PRIMARY KEY,
        name CHAR(5)
    );
    CREATE INDEX ON x_tab (id);
    INSERT INTO x_tab (id, name) VALUES
      (40, 'User1'),
      (41, 'User2'),
      (42, 'User3'),
      (43, 'User4'),
      (44, 'User5');
    
  3. Create a transfer:

    Manually
    Using Terraform
    1. Create a source endpoint of the PostgreSQL type and specify the cluster connection parameters in it:

      • Connection type: Managed Service for PostgreSQL cluster.
      • Cluster: <PostgreSQL source cluster name> from the drop-down list.
      • Database: db1.
      • Password: pg-user.
      • Password: <user password>.
    2. Create a target endpoint of the ClickHouse type and specify the cluster connection parameters in it.

      • Connection type: Managed Service for ClickHouse cluster.
      • Cluster: <ClickHouse target cluster name> from the drop-down list.
      • Database: db1.
      • User: ch-user.
      • Password: <user password>.
      • Cleanup policy: DROP.
    3. Create a transfer of the Snapshot and increment type that will use the created endpoints.

    1. Set the transfer_enabled variable to 1 in the postgresql-to-clickhouse.tf file.

    2. Make sure the Terraform configuration files are correct using the command:

      terraform validate
      

      If there are errors in the configuration files, Terraform will point to them.

    3. Create the required infrastructure:

      1. Run the command to view planned changes:

        terraform plan
        

        If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.

      2. If you are happy with the planned changes, apply them:

        1. Run the command:

          terraform apply
          
        2. Confirm the update of resources.

        3. Wait for the operation to complete.

Activate the transfer

  1. Activate the transfer and wait for its status to change to Replicating.

  2. To check that the transfer has moved the replicated data to the target, connect to the target Yandex Managed Service for ClickHouse cluster and make sure that the x_tab table in db1 includes the same columns as the x_tab table in the source database, as well as the timestamp columns: __data_transfer_commit_time and __data_transfer_delete_time:

    SELECT * FROM db1.x_tab WHERE id = 41;
    
    ┌─id─┬──name──┬─── __data-transfer_commit_time─┬───__data-transfer-delete_time─┐
    │ 41 │  User2 │   1633417594957267000          │ 0                             │
    └────┴────────┴────────────────────────────────┴───────────────────────────────┘
    

Test the replication process

  1. Connect to the source cluster.

  2. Delete the row with the 41 ID and edit the row with the 42 ID in the x_tab table of the PostgreSQL source database:

    DELETE FROM db1.public.x_tab WHERE id = 41;
    UPDATE db1.public.x_tab SET name = 'Key3' WHERE id = 42;
    
  3. Make sure that you see the changes in the x_tab table on the ClickHouse target:

    SELECT * FROM db1.x_tab WHERE (id >= 41) AND (id <= 42);
    
    ┌─id─┬──name──┬─── __data-transfer_commit_time─┬───__data-transfer-delete_time─┐
    │ 41 │  User2 │   1633417594957267000          │ 1675417594957267000           │
    │ 42 │  Key3  │   1675417594957267000          │ 0                             │
    │ 42 │  User3 │   1633417594957268000          │ 1675417594957267000           │
    └────┴────────┴────────────────────────────────┴───────────────────────────────┘
    

Select the data from ClickHouse .

For table recovery, the ClickHouse target with replication enabled uses the ReplicatedReplacingMergeTree and the ReplacingMergeTree. The following columns are added automatically to each table:

  • __data_transfer_commit_time: The time allowed for the row to change to this value, in TIMESTAMP format.

  • __data_transfer_delete_time: The time allowed for deleting the row, in TIMESTAMP format, if the row has been deleted from the source. If the row hasn't been deleted, 0 is set.

    The __data_transfer_commit_time column is needed for the ReplicatedReplacedMergeTree engine. If a record is deleted or updated, a new row with the value in this column is inserted. The query by a single primary key returns multiple records with different values from the __data_transfer_commit_time column.

If the transfer status is Replicating, the source data can be added or deleted. To ensure the standard behavior of SQL commands when a primary key points to a single record, add a construction that filters data by the __data_transfer_delete_time column to your queries to tables moved to ClickHouse. For example, for the x_tab table:

SELECT * FROM x_tab FINAL
WHERE __data_transfer_delete_time = 0;

To make it easier to run a SELECT, create a view with filtering by the __data_transfer_delete_time column and access it. For example, for the x_tab table:

CREATE VIEW x_tab_view AS SELECT * FROM x_tab FINAL
WHERE __data_transfer_delete_time == 0;

Note

The FINAL keyword noticeably decreases the query efficiency. Avoid it when working with large tables whenever possible.

Delete the resources you created

If you no longer need these resources, delete them:

  • Make sure that the transfer's status is Completed and delete it.

  • Delete the endpoints and clusters:

    Manually
    Using Terraform
    • The source endpoint and the target endpoint.
    • Managed Service for PostgreSQL.
    • Managed Service for ClickHouse.

    If you created your resources using Terraform:

    1. In the terminal window, change to the directory containing the infrastructure plan.

    2. Delete the postgresql-to-clickhouse.tf configuration file.

    3. Make sure the Terraform configuration files are correct using the command:

      terraform validate
      

      If there are errors in the configuration files, Terraform will point to them.

    4. Confirm the update of resources.

      1. Run the command to view planned changes:

        terraform plan
        

        If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.

      2. If you are happy with the planned changes, apply them:

        1. Run the command:

          terraform apply
          
        2. Confirm the update of resources.

        3. Wait for the operation to complete.

      All the resources described in the postgresql-to-clickhouse.tf configuration file will be deleted.

Was the article helpful?

Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
In this article:
  • Before you begin
  • Set up the transfer
  • Activate the transfer
  • Test the replication process
  • Select the data from ClickHouse .
  • Delete the resources you created