Migrating data between Yandex Managed Service for Apache Kafka® clusters
You can move your data from Apache Kafka® topics between one Managed Service for Apache Kafka® cluster and another in real time. Migration across versions is also supported. For example, you can move topics from Apache Kafka® ver. 2.8 to ver. 3.1.
This method of data migration enables you to:
- Set up topic replication in the management console interface or in Terraform.
- Track the migration process using the transfer monitoring.
- Go without creating an intermediate VM or granting online access to your Managed Service for Apache Kafka® target cluster.
To migrate data:
If you no longer need these resources, delete them.
Before you begin
-
Prepare the data delivery infrastructure:
ManuallyUsing Terraform- Create a source and target Managed Service for Apache Kafka® cluster with public internet access, in any suitable configuration.
- In the source cluster, create a topic named
sensors
. - In the source cluster, create a user with the
ACCESS_ROLE_PRODUCER
andACCESS_ROLE_CONSUMER
permissions to the created topic. - In the target cluster, create a user with the
ACCESS_ROLE_PRODUCER
andACCESS_ROLE_CONSUMER
permissions for all the topics.
-
If you don't have Terraform, install and configure it.
-
Download the file with provider settings. Place it in a separate working directory and specify the parameter values.
-
Download the data-transfer-mkf-mkf.tf configuration file to the same working directory.
This file describes:
- Network.
- Subnet.
- Security groups and the rule required to connect to a Managed Service for Apache Kafka® cluster.
- A source Managed Service for Apache Kafka® cluster with public internet access.
- Managed Service for Apache Kafka® target cluster.
- Apache Kafka® topic.
- Transfer.
-
In the
data-transfer-mkf-mkf.tf
file, specify the values of parameters:source_kf_version
: Apache Kafka® version in the source cluster.source_user_name
: Username for establishing a connection to the Apache Kafka® topic.source_user_password
: User password.target_kf_version
: Apache Kafka® version in the target cluster.transfer_enabled
: Set0
to ensure that no transfer is created before the source and target endpoints are created manually.
-
Run the command
terraform init
in the directory with the configuration file. This command initializes the provider specified in the configuration files and enables you to use the provider resources and data sources. -
Make sure the Terraform configuration files are correct using the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
Create the required infrastructure:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console.
-
The created Apache Kafka®
sensors
topic in the source cluster will receive test data from car sensors in JSON format, for example:{ "device_id":"iv9a94th6rztooxh5ur2", "datetime":"2020-06-05 17:27:00", "latitude":"55.70329032", "longitude":"37.65472196", "altitude":"427.5", "speed":"0", "battery_voltage":"23.5", "cabin_temperature":"17", "fuel_level":null }
-
Install the utilities:
-
kafkacat to read and write data to Apache Kafka® topics.
sudo apt update && sudo apt install --yes kafkacat
Check that you can use it to connect to the Managed Service for Apache Kafka® source cluster over SSL.
-
jq for JSON file stream processing.
sudo apt update && sudo apt-get install --yes jq
-
Prepare and activate the transfer
-
-
Database type:
Apache Kafka®
. -
Endpoint parameters:
-
Connection settings:
Managed Service for Apache Kafka® cluster
.Select a target cluster from the list and specify the cluster connection settings.
-
Apache Kafka topic settings.
- Topic full name:
measurements
.
- Topic full name:
-
-
-
- Database type:
Apache Kafka®
. - Endpoint parameters:
-
Connection settings:
Managed Service for Apache Kafka® cluster
.Select a source cluster from the list and specify the cluster connection settings.
-
Topic full name:
sensors
.
-
- Database type:
-
Create a transfer:
ManuallyUsing Terraform- Create a transfer of the Increment type that will use the created endpoints.
- Activate it.
-
In the
data-transfer-mkf-mkf.tf
file, specify the values of parameters:source_endpoint_id
: ID of the source endpoint.target_endpoint_id
: ID of the target endpoint.transfer_enabled
: Set1
to enable transfer creation.
-
Make sure the Terraform configuration files are correct using the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
Create the required infrastructure:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
Once created, a transfer is activated automatically.
-
Test the transfer
-
Wait for the transfer status to change to Replicating.
-
Make sure that the data from the topic in the source cluster move to the topic in the target Managed Service for Apache Kafka® cluster:
-
Create a
sample.json
file with the following test data:{ "device_id": "iv9a94th6rztooxh5ur2", "datetime": "2020-06-05 17:27:00", "latitude": 55.70329032, "longitude": 37.65472196, "altitude": 427.5, "speed": 0, "battery_voltage": 23.5, "cabin_temperature": 17, "fuel_level": null } { "device_id": "rhibbh3y08qmz3sdbrbu", "datetime": "2020-06-06 09:49:54", "latitude": 55.71294467, "longitude": 37.66542005, "altitude": 429.13, "speed": 55.5, "battery_voltage": null, "cabin_temperature": 18, "fuel_level": 32 } { "device_id": "iv9a94th6rztooxh5ur2", "datetime": "2020-06-07 15:00:10", "latitude": 55.70985913, "longitude": 37.62141918, "altitude": 417.0, "speed": 15.7, "battery_voltage": 10.3, "cabin_temperature": 17, "fuel_level": null }
-
Send data from the
sample.json
file to thesensors
topic in the source Managed Service for Apache Kafka® cluster usingjq
andkafkacat
:jq -rc . sample.json | kafkacat -P \ -b <broker FQDN in the source cluster>:9091 \ -t sensors \ -k key \ -X security.protocol=SASL_SSL \ -X sasl.mechanisms=SCRAM-SHA-512 \ -X sasl.username="<username in the source cluster>" \ -X sasl.password="<user password in the source cluster>" \ -X ssl.ca.location=/usr/local/share/ca-certificates/Yandex/YandexCA.crt -Z
The data is sent on behalf of the created user. To learn more about setting up an SSL certificate and working with
kafkacat
, see Connecting to topics in an Apache Kafka® cluster. -
Use the
kafkacat
utility to make sure that the data from the source cluster has moved to the target Managed Service for Apache Kafka® cluster:kafkacat -C \ -b <broker FQDN in the target cluster>:9091 \ -t measurements \ -X security.protocol=SASL_SSL \ -X sasl.mechanisms=SCRAM-SHA-512 \ -X sasl.username="<username in the target cluster>" \ -X sasl.password="<user password in the target cluster>" \ -X ssl.ca.location=/usr/local/share/ca-certificates/Yandex/YandexCA.crt -Z -K:
-
Delete the resources you created
Note
Before deleting the created resources, disable the transfer.
If you no longer need these resources, delete them:
- Delete the transfer.
- Delete endpoints for the source and target.
Delete the other resources, depending on the method used to create them:
-
In the terminal window, change to the directory containing the infrastructure plan.
-
Delete the
data-transfer-mkf-mkf.tf
configuration file. -
Make sure the Terraform configuration files are correct using the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
Confirm the update of resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the resources described in the
data-transfer-mkf-mkf.tf
configuration file will be deleted. -