Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Blog
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
Yandex Data Transfer
  • Available transfers
  • Getting started
  • Step-by-step guide
  • Practical guidelines
    • All tutorials
    • Data migration
      • Migrating data between Yandex Managed Service for Apache Kafka® clusters
      • Migrating data to Yandex Managed Service for ClickHouse
      • Migrating data to Yandex Managed Service for Greenplum®
      • Migrating data to Yandex Managed Service for MongoDB
      • Migrating data to Yandex Managed Service for MySQL
      • Migrating data from Yandex Managed Service for MySQL to MySQL
      • Migrating data to Yandex Managed Service for PostgreSQL
    • Data delivery
    • Asynchronous replication of data
    • Saving data streams
  • Concepts
  • Troubleshooting
  • Access management
  • Pricing policy
  • API reference
  • Questions and answers
  1. Practical guidelines
  2. Data migration
  3. Migrating data between Yandex Managed Service for Apache Kafka® clusters

Migrating data between Yandex Managed Service for Apache Kafka® clusters

Written by
Yandex Cloud
  • Before you begin
  • Prepare and activate the transfer
  • Test the transfer
  • Delete the resources you created

You can move your data from Apache Kafka® topics between one Managed Service for Apache Kafka® cluster and another in real time. Migration across versions is also supported. For example, you can move topics from Apache Kafka® ver. 2.8 to ver. 3.1.

This method of data migration enables you to:

  • Set up topic replication in the management console interface or in Terraform.
  • Track the migration process using the transfer monitoring.
  • Go without creating an intermediate VM or granting online access to your Managed Service for Apache Kafka® target cluster.

To migrate data:

  1. Prepare and activate the transfer.
  2. Test the transfer.

If you no longer need these resources, delete them.

Before you begin

  1. Prepare the data delivery infrastructure:

    Manually
    Using Terraform
    1. Create a source and target Managed Service for Apache Kafka® cluster with public internet access, in any suitable configuration.
    2. In the source cluster, create a topic named sensors.
    3. In the source cluster, create a user with the ACCESS_ROLE_PRODUCER and ACCESS_ROLE_CONSUMER permissions to the created topic.
    4. In the target cluster, create a user with the ACCESS_ROLE_PRODUCER and ACCESS_ROLE_CONSUMER permissions for all the topics.
    1. If you don't have Terraform, install and configure it.

    2. Download the file with provider settings. Place it in a separate working directory and specify the parameter values.

    3. Download the data-transfer-mkf-mkf.tf configuration file to the same working directory.

      This file describes:

      • Network.
      • Subnet.
      • Security groups and the rule required to connect to a Managed Service for Apache Kafka® cluster.
      • A source Managed Service for Apache Kafka® cluster with public internet access.
      • Managed Service for Apache Kafka® target cluster.
      • Apache Kafka® topic.
      • Transfer.
    4. In the data-transfer-mkf-mkf.tf file, specify the values of parameters:

      • source_kf_version: Apache Kafka® version in the source cluster.
      • source_user_name: Username for establishing a connection to the Apache Kafka® topic.
      • source_user_password: User password.
      • target_kf_version: Apache Kafka® version in the target cluster.
      • transfer_enabled: Set 0 to ensure that no transfer is created before the source and target endpoints are created manually.
    5. Run the command terraform init in the directory with the configuration file. This command initializes the provider specified in the configuration files and enables you to use the provider resources and data sources.

    6. Make sure the Terraform configuration files are correct using the command:

      terraform validate
      

      If there are errors in the configuration files, Terraform will point to them.

    7. Create the required infrastructure:

      1. Run the command to view planned changes:

        terraform plan
        

        If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.

      2. If you are happy with the planned changes, apply them:

        1. Run the command:

          terraform apply
          
        2. Confirm the update of resources.

        3. Wait for the operation to complete.

      All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console.

    The created Apache Kafka® sensors topic in the source cluster will receive test data from car sensors in JSON format, for example:

    {
        "device_id":"iv9a94th6rztooxh5ur2",
        "datetime":"2020-06-05 17:27:00",
        "latitude":"55.70329032",
        "longitude":"37.65472196",
        "altitude":"427.5",
        "speed":"0",
        "battery_voltage":"23.5",
        "cabin_temperature":"17",
        "fuel_level":null
    }
    
  2. Install the utilities:

    • kafkacat to read and write data to Apache Kafka® topics.

      sudo apt update && sudo apt install --yes kafkacat
      

      Check that you can use it to connect to the Managed Service for Apache Kafka® source cluster over SSL.

    • jq for JSON file stream processing.

      sudo apt update && sudo apt-get install --yes jq
      

Prepare and activate the transfer

  1. Create a target endpoint:

    • Database type: Apache Kafka®.

    • Endpoint parameters:

      • Connection settings: Managed Service for Apache Kafka® cluster.

        Select a target cluster from the list and specify the cluster connection settings.

      • Apache Kafka topic settings.

        • Topic full name: measurements.
  2. Create a source endpoint:

    • Database type: Apache Kafka®.
    • Endpoint parameters:
      • Connection settings: Managed Service for Apache Kafka® cluster.

        Select a source cluster from the list and specify the cluster connection settings.

      • Topic full name: sensors.

  3. Create a transfer:

    Manually
    Using Terraform
    1. Create a transfer of the Increment type that will use the created endpoints.
    2. Activate it.
    1. In the data-transfer-mkf-mkf.tf file, specify the values of parameters:

      • source_endpoint_id: ID of the source endpoint.
      • target_endpoint_id: ID of the target endpoint.
      • transfer_enabled: Set 1 to enable transfer creation.
    2. Make sure the Terraform configuration files are correct using the command:

      terraform validate
      

      If there are errors in the configuration files, Terraform will point to them.

    3. Create the required infrastructure:

      1. Run the command to view planned changes:

        terraform plan
        

        If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.

      2. If you are happy with the planned changes, apply them:

        1. Run the command:

          terraform apply
          
        2. Confirm the update of resources.

        3. Wait for the operation to complete.

      Once created, a transfer is activated automatically.

Test the transfer

  1. Wait for the transfer status to change to Replicating.

  2. Make sure that the data from the topic in the source cluster move to the topic in the target Managed Service for Apache Kafka® cluster:

    1. Create a sample.json file with the following test data:

      {
          "device_id": "iv9a94th6rztooxh5ur2",
          "datetime": "2020-06-05 17:27:00",
          "latitude": 55.70329032,
          "longitude": 37.65472196,
          "altitude": 427.5,
          "speed": 0,
          "battery_voltage": 23.5,
          "cabin_temperature": 17,
          "fuel_level": null
      }
      
      {
          "device_id": "rhibbh3y08qmz3sdbrbu",
          "datetime": "2020-06-06 09:49:54",
          "latitude": 55.71294467,
          "longitude": 37.66542005,
          "altitude": 429.13,
          "speed": 55.5,
          "battery_voltage": null,
          "cabin_temperature": 18,
          "fuel_level": 32
      }
      
      {
          "device_id": "iv9a94th6rztooxh5ur2",
          "datetime": "2020-06-07 15:00:10",
          "latitude": 55.70985913,
          "longitude": 37.62141918,
          "altitude": 417.0,
          "speed": 15.7,
          "battery_voltage": 10.3,
          "cabin_temperature": 17,
          "fuel_level": null
      }
      
    2. Send data from the sample.json file to the sensors topic in the source Managed Service for Apache Kafka® cluster using jq and kafkacat:

      jq -rc . sample.json | kafkacat -P \
         -b <broker FQDN in the source cluster>:9091 \
         -t sensors \
         -k key \
         -X security.protocol=SASL_SSL \
         -X sasl.mechanisms=SCRAM-SHA-512 \
         -X sasl.username="<username in the source cluster>" \
         -X sasl.password="<user password in the source cluster>" \
         -X ssl.ca.location=/usr/local/share/ca-certificates/Yandex/YandexCA.crt -Z
      

      The data is sent on behalf of the created user. To learn more about setting up an SSL certificate and working with kafkacat, see Connecting to topics in an Apache Kafka® cluster.

    3. Use the kafkacat utility to make sure that the data from the source cluster has moved to the target Managed Service for Apache Kafka® cluster:

      kafkacat -C \
               -b <broker FQDN in the target cluster>:9091 \
               -t measurements \
               -X security.protocol=SASL_SSL \
               -X sasl.mechanisms=SCRAM-SHA-512 \
               -X sasl.username="<username in the target cluster>" \
               -X sasl.password="<user password in the target cluster>" \
               -X ssl.ca.location=/usr/local/share/ca-certificates/Yandex/YandexCA.crt -Z -K:
      

Delete the resources you created

Note

Before deleting the created resources, disable the transfer.

If you no longer need these resources, delete them:

  1. Delete the transfer.
  2. Delete endpoints for the source and target.

Delete the other resources, depending on the method used to create them:

Manually
Using Terraform

Delete the clusters Managed Service for Apache Kafka®.

  1. In the terminal window, change to the directory containing the infrastructure plan.

  2. Delete the data-transfer-mkf-mkf.tf configuration file.

  3. Make sure the Terraform configuration files are correct using the command:

    terraform validate
    

    If there are errors in the configuration files, Terraform will point to them.

  4. Confirm the update of resources.

    1. Run the command to view planned changes:

      terraform plan
      

      If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.

    2. If you are happy with the planned changes, apply them:

      1. Run the command:

        terraform apply
        
      2. Confirm the update of resources.

      3. Wait for the operation to complete.

    All the resources described in the data-transfer-mkf-mkf.tf configuration file will be deleted.

Was the article helpful?

Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
In this article:
  • Before you begin
  • Prepare and activate the transfer
  • Test the transfer
  • Delete the resources you created