Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Blog
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
Yandex Managed Service for Apache Kafka®
  • Getting started
  • Step-by-step instructions
    • All instructions
    • Information about existing clusters
    • Creating clusters
    • Connecting to a cluster
    • Stopping and starting clusters
    • Upgrading the Apache Kafka® version
    • Changing cluster settings
    • Managing Apache Kafka® hosts
    • Working with topics and partitions
    • Managing Apache Kafka® users
    • Managing connectors
    • Viewing cluster logs
    • Deleting clusters
    • Monitoring the state of clusters and hosts
  • Practical guidelines
    • All tutorials
    • Setting up Kafka Connect to work with Managed Service for Apache Kafka®
    • Using data format schemas with Managed Service for Apache Kafka®
      • Overview
      • Working with the managed schema registry
      • Using Confluent Schema Registry with Managed Service for Apache Kafka®
    • Migrating databases from a third-party Apache Kafka® cluster
    • Moving data between Managed Service for Apache Kafka® clusters using Yandex Data Transfer
    • Delivering data from Managed Service for Apache Kafka® using Debezium
    • Delivering data from Yandex Managed Service for MySQL using Debezium
    • Delivering data from Managed Service for Apache Kafka® with Yandex Data Transfer
    • Delivering data to Managed Service for ClickHouse
    • Data delivery in ksqlDB
    • Delivering data to Yandex Managed Service for YDB using Yandex Data Transfer
  • Concepts
    • Relationships between service resources
    • Topics and partitions
    • Brokers
    • Producers and consumers
    • Managing data schemas
    • Host classes
    • Network in Managed Service for Apache Kafka®
    • Quotas and limits
    • Disk types
    • Connectors
    • Maintenance
    • Apache Kafka® settings
  • Access management
  • Pricing policy
  • API reference
    • Authentication in the API
    • gRPC
      • Overview
      • ClusterService
      • ConnectorService
      • ResourcePresetService
      • TopicService
      • UserService
      • OperationService
    • REST
      • Overview
      • Cluster
        • Overview
        • create
        • delete
        • get
        • list
        • listHosts
        • listLogs
        • listOperations
        • move
        • rescheduleMaintenance
        • start
        • stop
        • streamLogs
        • update
      • Connector
        • Overview
        • create
        • delete
        • get
        • list
        • pause
        • resume
        • update
      • ResourcePreset
        • Overview
        • get
        • list
      • Topic
        • Overview
        • create
        • delete
        • get
        • list
        • update
      • User
        • Overview
        • create
        • delete
        • get
        • grantPermission
        • list
        • revokePermission
        • update
      • Operation
        • Overview
        • get
  • Revision history
  • Questions and answers
  1. Practical guidelines
  2. Migrating databases from a third-party Apache Kafka® cluster

Migrating data to Yandex Managed Service for Apache Kafka®

Written by
Yandex Cloud
,
improved by
Dmitry A.
  • Data migration using Yandex Managed Service for Apache Kafka® Connector
    • Create a cluster and a connector
    • Check the target cluster topic for data
  • Migrating data using MirrorMaker
    • Before you begin
    • Configure MirrorMaker
    • Start replication
    • Check the target cluster topic for data
    • Delete the resources you created

There are two ways to migrate topics from a Apache Kafka® source cluster to a Managed Service for Apache Kafka® target cluster:

  • Using the built-in Yandex Managed Service for Apache Kafka® MirrorMaker connector.

    This method is easy to configure and does not require you to create an intermediate VM.

  • Using the MirrorMaker 2.0 utility.

    This method requires that you install and configure the utility on an intermediate VM. Use this method only if it's not possible to migrate data using the built-in MirrorMaker connector for whatever reason.

Data migration using Yandex Managed Service for Apache Kafka® Connector

  1. Create a connector.
  2. Check the target cluster topic for data.

Create a cluster and a connector

Manually
Using Terraform
  1. Prepare the target cluster:

    • Enable topic management via the Admin API.
    • Create an admin user named admin-cloud.
    • Enable Auto create topics enable.
    • Configure security groups, if required, to connect to the target cluster.
  2. Create a source cluster user named admin-source that is authorized to manage topics via the Admin API.

  3. Make sure that the network hosting the source cluster is configured to allow source cluster connections from the internet.

  4. For the target cluster, create a connector of the MirrorMaker type, configured as follows:

    • Topics: List of topics to migrate. You can also specify a regular expression for selecting topics. To migrate all topics, specify .*.

    • Under Source cluster, specify the parameters for connecting to the source cluster:

      • Alias: A prefix to indicate the source cluster in the connector settings. Defaults to source. Topics in the target cluster are created with the indicated prefix.

      • Bootstrap servers: Comma-separated list of source cluster broker host FQDNs with port numbers, for example:

        FQDN1:9091,FQDN2:9091,...,FQDN:9091
        
      • SASL username, SASL password: Username and password for the previously created admin-source user.

      • SASL mechanism: SCRAM-SHA-512 mechanism for username and password encryption.

      • Security protocol: Select a protocol for connecting the connector:

        • SASL_PLAINTEXT: For connecting to the source cluster without SSL.
        • SASL_SSL: For SSL connections to the source cluster.
    • Under Target cluster, select Use this cluster.

  1. If you don't have Terraform, install it.

  2. Download the file with provider settings. Place it in a separate working directory and specify the parameter values.

  3. Download the kafka-mirrormaker-connector.tf configuration file to the same working directory.

    This file describes:

    • Network.
    • Subnet.
    • Default security group and rules required to connect to the cluster from the internet.
    • Managed Service for Apache Kafka® cluster with topic management via the Admin API, an administrator user named admin-cloud, and the Auto create topics enable setting enabled.
    • MirrorMaker connector.
  4. In kafka-mirrormaker-connector.tf, specify:

    • Usernames and passwords for the source and the target cluster users.
    • Source cluster broker host FQDNs.
    • Source and target cluster aliases.
    • Filter template for the topics to be transferred.
    • Apache Kafka® version.
  5. Run the command terraform init in the directory with the configuration file. This command initializes the provider specified in the configuration files and enables you to use the provider resources and data sources.

  6. Make sure the Terraform configuration files are correct using the command:

    terraform validate
    

    If there are errors in the configuration files, Terraform will point to them.

  7. Create the required infrastructure:

    1. Run the command to view planned changes:

      terraform plan
      

      If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.

    2. If you are happy with the planned changes, apply them:

      1. Run the command:

        terraform apply
        
      2. Confirm the update of resources.

      3. Wait for the operation to complete.

    All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console.

Check the target cluster topic for data

  1. Connect to the receiver cluster topic using the kafkacat utility. Add the prefix source to the name of the source cluster topic: for example, the topic mytopic will be transferred to the receiving cluster as source.mytopic.
  2. Make sure that in the management console displays messages from the source cluster topic.

Migrating data using MirrorMaker

  1. Configure MirrorMaker.
  2. Start replication.
  3. Check the target cluster topic for data.

If you no longer need these resources, delete them.

Before you begin

Prepare the infrastructure

Manually
Using Terraform
  1. Create a Managed Service for Apache Kafka® target cluster:

    • With topic management via the Admin API.
    • With the admin-cloud admin user.
    • With Auto create topics enable activated.
  2. Create a new Linux VM for MirrorMaker on the same network the target cluster is on. To connect to the cluster from the user's local machine instead of the Yandex Cloud cloud network, enable public access when creating it.

  1. If you don't have Terraform, install it.

  2. Download the file with provider settings. Place it in a separate working directory and specify the parameter values.

  3. Download the kafka-mirror-maker.tf configuration file to the same working directory.

    This file describes:

    • Network.
    • Subnet.
    • Default security group and rules required to connect to the cluster and VM from the internet.
    • A Managed Service for Apache Kafka® cluster with topic management enabled via the Admin API, the admin-cloud admin user, and Auto create topics enable.
    • A virtual machine with public internet access.
  4. In kafka-mirror-maker.tf, specify:

    • Managed Service for Apache Kafka® admin user password.
    • ID of the public image with Ubuntu and no GPU. For example, for Ubuntu 20.04 LTS.
    • Username and path to the public key file to use to access to the virtual machine. By default, the specified username is ignored in the image used. Instead, a user with the ubuntu username is created. Use it to connect to the instance.
  5. Run the command terraform init in the directory with the configuration file. This command initializes the providers specified in the configuration files and lets you work with the provider resources and data sources.

  6. Make sure the Terraform configuration files are correct using the command:

    terraform validate
    

    If there are errors in the configuration files, Terraform will point to them.

  7. Create the required infrastructure:

    1. Run the command to view planned changes:

      terraform plan
      

      If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.

    2. If you are happy with the planned changes, apply them:

      1. Run the command:

        terraform apply
        
      2. Confirm the update of resources.

      3. Wait for the operation to complete.

    All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console.

Configure additional settings

  1. Create a source cluster user named admin-source that is authorized to manage topics via the Admin API.

  2. Connect to a virtual machine over SSH.

    1. Install the JDK:

      sudo apt update && sudo apt install --yes default-jdk
      
    2. Download and unpack the Apache Kafka® archive with the same version number as the version installed in the target cluster. For example, for version 2.8:

      wget https://archive.apache.org/dist/kafka/2.8.0/kafka_2.12-2.8.0.tgz && \
      tar -xvf kafka_2.12-2.8.0.tgz
      
    3. Install the kafkacat utility:

      sudo apt update && sudo apt install --yes kafkacat
      

      Make sure that you can use it to connect to the source and target clusters via SSL.

  3. Configure a firewall and security groups, if required, to connect MirrorMaker to the target and the source clusters.

Configure MirrorMaker

  1. Connect to the MirrorMaker VM over SSH.

  2. Download an SSL certificate for connecting to the Managed Service for Apache Kafka® cluster.

  3. In the home directory, create a subfolder called mirror-maker to store Java Keystore certificates and MirrorMaker configuration files.

    mkdir --parents /home/<home directory>/mirror-maker
    
  4. Select a password for and create the certificate store and add an SSL certificate for connecting to the cluster:

    sudo keytool --noprompt -importcert -alias YandexCA \
       -file /usr/local/share/ca-certificates/Yandex/YandexCA.crt \
       -keystore /home/<home directory>/mirror-maker/keystore \
       -storepass <certificate store password, at least 6 characters>
    
  5. In the mirror-maker folder, create a MirrorMaker configuration file called mm2.properties:

    # Kafka clusters
    clusters=cloud, source
    source.bootstrap.servers=<FQDN of source cluster broker>:9092
    cloud.bootstrap.servers=<FQDN of 1 target cluster broker>:9091, ..., <FQDN of N target cluster broker>:9091
    
    # Source and target cluster settings
    source->cloud.enabled=true
    cloud->source.enabled=false
    source.cluster.alias=source
    cloud.cluster.alias=cloud
    
    # Internal topics settings
    source.config.storage.replication.factor=<R>
    source.status.storage.replication.factor=<R>
    source.offset.storage.replication.factor=<R>
    source.offsets.topic.replication.factor=<R>
    source.errors.deadletterqueue.topic.replication.factor=<R>
    source.offset-syncs.topic.replication.factor=<R>
    source.heartbeats.topic.replication.factor=<R>
    source.checkpoints.topic.replication.factor=<R>
    source.transaction.state.log.replication.factor=<R>
    cloud.config.storage.replication.factor=<R>
    cloud.status.storage.replication.factor=<R>
    cloud.offset.storage.replication.factor=<R>
    cloud.offsets.topic.replication.factor=<R>
    cloud.errors.deadletterqueue.topic.replication.factor=<R>
    cloud.offset-syncs.topic.replication.factor=<R>
    cloud.heartbeats.topic.replication.factor=<R>
    cloud.checkpoints.topic.replication.factor=<R>
    cloud.transaction.state.log.replication.factor=<R>
    
    # Topics
    topics=.*
    groups=.*
    topics.blacklist=.*[\-\.]internal, .*\replica, __consumer_offsets
    groups.blacklist=console-consumer-.*, connect-.*, __.*
    replication.factor=<M>
    refresh.topics.enable=true
    sync.topic.configs.enabled=true
    refresh.topics.interval.seconds=10
    
    # Tasks
    tasks.max=<T>
    
    # Source cluster authentication parameters. Comment out if no authentication required
    source.client.id=mm2_consumer_test
    source.group.id=mm2_consumer_group
    source.security.protocol=SASL_PLAINTEXT
    source.sasl.mechanism=SCRAM-SHA-512
    source.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="admin-source" password="<password>";
    
    # Target cluster authentication parameters
    cloud.client.id=mm2_producer_test
    cloud.group.id=mm2_producer_group
    cloud.ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
    cloud.ssl.truststore.location=/home/<home directory>/mirror-maker/keystore
    cloud.ssl.truststore.password=<certificate store password>
    cloud.ssl.protocol=TLS
    cloud.security.protocol=SASL_SSL
    cloud.sasl.mechanism=SCRAM-SHA-512
    cloud.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="admin-cloud" password="<password>";
    
    # Enable heartbeats and checkpoints
    source->target.emit.heartbeats.enabled=true
    source->target.emit.checkpoints.enabled=true
    

    Notes on MirrorMaker configuration:

    • It performs one-way replication (source->cloud.enabled = true, cloud->source.enabled = false).
    • In the topics parameter, list topics you want to migrate. You can also specify a regular expression for selecting topics. To migrate all topics, specify .*. In this configuration, all the topics are replicated.
    • Topic names in the target cluster are the same as in the source.
    • <R> is the parameter that sets the replication factor for MirrorMaker service topics. The value of this parameter should not exceed the smaller of the number of brokers in the source or the number of brokers in the target cluster.
    • <M> is the default replication factor defined for topics in the target cluster.
    • <T> is the number of concurrent MirrorMaker processes. A value of at least 2 is recommended for even replication load distribution. For more information, see the Apache Kafka® documentation.

    You can request Managed Service for Apache Kafka® broker FQDNs with a list of hosts in the cluster.

Start replication

Launch MirrorMaker on the VM as follows:

<Kafka install path>/bin/connect-mirror-maker.sh /home/<home directory>/mirror-maker/mm2.properties

Check the target cluster topic for data

  1. Connect to the receiver cluster topic using the kafkacat utility. Add the prefix source to the name of the source cluster topic: for example, the topic mytopic will be transferred to the receiving cluster as source.mytopic.
  2. Make sure that in the management console displays messages from the source cluster topic.

To learn more about MirrorMaker 2.0, see the Apache Kafka® documentation.

Delete the resources you created

Manually
Using Terraform

If you no longer need these resources, delete them:

  • Delete the Yandex Managed Service for Apache Kafka® cluster.
  • Delete the virtual machine.
  • If you reserved public static IP addresses, release and delete them.

To delete the infrastructure created with Terraform:

  1. In the terminal window, change to the directory containing the infrastructure plan.

  2. Delete the kafka-mirror-maker.tf or the kafka-mirrormaker-connector.tf configuration file.

  3. Make sure the Terraform configuration files are correct using the command:

    terraform validate
    

    If there are errors in the configuration files, Terraform will point to them.

  4. Confirm the update of resources.

    1. Run the command to view planned changes:

      terraform plan
      

      If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.

    2. If you are happy with the planned changes, apply them:

      1. Run the command:

        terraform apply
        
      2. Confirm the update of resources.

      3. Wait for the operation to complete.

    This will delete all the resources described in the kafka-mirror-maker.tf or the kafka-mirrormaker-connector.tf configuration file.

Was the article helpful?

Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
In this article:
  • Data migration using Yandex Managed Service for Apache Kafka® Connector
  • Create a cluster and a connector
  • Check the target cluster topic for data
  • Migrating data using MirrorMaker
  • Before you begin
  • Configure MirrorMaker
  • Start replication
  • Check the target cluster topic for data
  • Delete the resources you created