Creating a Apache Kafka® cluster
A cluster in Managed Service for Apache Kafka® is one or more broker hosts where topics and their partitions are located. Producers and consumers can work with these topics by connecting to cluster hosts.
Note
- The number of broker hosts you can create together with a Apache Kafka® cluster depends on the selected disk type and host class.
- Available disk types depend on the selected host class.
Warning
If you create a cluster with more than one host, three dedicated ZooKeeper hosts will be added to the cluster. For more information, see Relationship between resources in Managed Service for Apache Kafka®.
How to create a Managed Service for Apache Kafka® cluster
Prior to creating a cluster, calculate the minimum storage size for topics.
-
In the management console, go to the desired folder.
-
In the list of services, select Managed Service for Apache Kafka®.
-
Click Create cluster.
-
Under Basic parameters:
-
Enter a name for the cluster and, if necessary, a description. The cluster name must be unique within the folder.
-
Select the environment where you want to create the cluster (you can't change the environment once the cluster is created):
PRODUCTION
: For stable versions of your apps.PRESTABLE
: For testing, including the Managed Service for Apache Kafka® service itself. The Prestable environment is first updated with new features, improvements, and bug fixes. However, not every update ensures backward compatibility.
-
Select the Apache Kafka® version.
-
To manage topics via the Apache Kafka® Admin API:
- Enable Manage topics via the API.
- After creating a cluster, create an admin user.
Alert
Once you create a cluster, you cannot change the Manage topics via the API setting.
-
To manage data schemas using Managed Schema Registry, enable the Data Schema Registry setting.
Warning
You cannot edit the Data Schema Registry setting after a cluster is created.
-
-
Under Host class, select the platform, host type, and host class.
The host class defines the technical capabilities of the virtual machines that Apache Kafka® brokers are deployed on. All available options are listed under Host classes.
By changing the host class for a cluster, you also change the characteristics of all the existing instances.
-
Under Storage:
-
Select the disk type.
The selected type determines the increment that you can change your storage size in:
- Local SSD storage for Intel Broadwell and Intel Cascade Lake: In increments of 100 GB.
- Local SSD storage for Intel Ice Lake: In increments of 368 GB.
- Non-replicated SSD storage: In increments of 93 GB.
You can't change the disk type for Managed Service for Apache Kafka® clusters after creation.
-
Select the size of storage to be used for data.
-
-
Under Network settings:
-
Select one or more availability zones to host Apache Kafka® brokers. If you create a cluster with one accessibility zone, you will not be able to increase the number of zones and brokers in the future.
-
Select the network.
-
Select subnets in each availability zone for this network. To create a new subnet, click Create new subnet next to the desired availability zone.
Note
For a cluster with multiple broker hosts, you need to specify subnets in each availability zone even if you plan to host brokers only in some of them. These subnets are required to host three ZooKeeper hosts — one in each availability zone. For more information, see Resource relationships in Managed Service for Apache Kafka®.
-
Select security groups to control the cluster's network traffic.
-
To access broker hosts from the internet, select Public access. In this case, you can only connect to them over an SSL connection. You can't request public access after creating a cluster. For more information, see Connecting to topics in an Apache Kafka® cluster.
-
-
Under Hosts:
-
Specify the number of Apache Kafka® broker hosts to be located in each of the selected availability zones.
When choosing the number of hosts, keep in mind that:
- The Apache Kafka® cluster hosts will be evenly deployed in the selected availability zones. Decide on the number of zones and hosts per zone based on the required fault tolerance model and cluster load.
- Replication is possible if there are at least two hosts in the cluster.
- If you selected
local-ssd
ornetwork-ssd-nonreplicated
under Storage, you need to add at least 3 hosts to the cluster. - Adding more than one host to the cluster automatically adds three ZooKeeper hosts.
-
(Optional) Select groups of dedicated hosts to host the cluster on.
Alert
You cannot edit this setting after you create a cluster. The use of dedicated hosts significantly affects cluster pricing.
-
-
If you specify two or more broker hosts, then under ZooKeeper host class, specify the characteristics of the ZooKeeper hosts to place in each of the selected availability zones.
-
If necessary, configure additional cluster settings:
-
Maintenance window: Settings for the maintenance window:
- To enable maintenance at any time, select arbitrary (default).
- To specify the preferred maintenance start time, select by schedule and specify the desired day of the week and UTC hour. For example, you can choose a time when cluster load is lightest.
Maintenance operations are carried out both on enabled and disabled clusters. They may include updating the DBMS version, applying patches, and so on.
-
Access from Data Transfer: Enable this option to allow access to the cluster from Yandex Data Transfer in Serverless mode.
This will enable you to connect to Yandex Data Transfer running in Kubernetes via a special network. It will also cause other operations to run faster, such as transfer launch and deactivation.
-
Deletion protection: Manages cluster protection from accidental deletion by a user.
Cluster deletion protection will not prevent a manual connection to a cluster to delete data.
-
-
If necessary, configure the Apache Kafka® settings.
-
Click Create cluster.
-
Wait until the cluster is ready: its status on the Managed Service for Apache Kafka® dashboard changes to Running and its state to Alive. This may take some time.
If you don't have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
-
View a description of the CLI's create cluster command:
yc managed-kafka cluster create --help
-
Specify the cluster parameters in the create command (only some of the supported parameters are given in the example):
yc managed-kafka cluster create \ --name <cluster name> \ --environment <environment: prestable or production> \ --version <Apache Kafka® version: 2.8, 3.0, 3.1, or 3.2> \ --network-name <network name> \ --brokers-count <number of brokers in zone> \ --resource-preset <host class> \ --disk-type <network-hdd | network-ssd | local-ssd | network-ssd-nonreplicated> \ --disk-size <storage size, GB> \ --assign-public-ip <public access> \ --security-group-ids <security group ID list> \ --deletion-protection=<cluster deletion protection: true or false>
Tip
If necessary, you can also configure the Apache Kafka® settings here.
Cluster deletion protection will not prevent a manual connection to a cluster to delete data.
-
To set up a maintenance window (including windows for disabled clusters), pass the required value in the
--maintenance-window
parameter when creating your cluster:yc managed-kafka cluster create \ ... --maintenance-window type=<maintenance type: anytime or weekly>,` `day=<day of week for weekly>,` `hour=<hour for weekly>
Where:
type
: Maintenance type:anytime
: Any time.weekly
: On a schedule.
day
: Day of the week inDDD
format forweekly
. For example,MON
.hour
: Hour inHH
format forweekly
. For example,21
.
-
To manage topics via the Apache Kafka® Admin API:
-
When creating a cluster, set the
--unmanaged-topics
parameter totrue
:yc managed-kafka cluster create \ ... --unmanaged-topics true
You cannot edit this setting after you create a cluster.
-
After creating a cluster, create an admin user.
-
-
To allow access to the cluster from Yandex Data Transfer in Serverless mode, pass the
--datatransfer-access
parameter.This will enable you to connect to Yandex Data Transfer running in Kubernetes via a special network. It will also cause other operations to run faster, such as transfer launch and deactivation.
-
To create a cluster hosted on groups of dedicated hosts, specify the host IDs as a comma-separated list in the
--host-group-ids
parameter when creating the cluster:yc managed-kafka cluster create \ ... --host-group-ids <IDs of dedicated host groups>
Alert
You cannot edit this setting after you create a cluster. The use of dedicated hosts significantly affects cluster pricing.
With Terraform, you can quickly create a cloud infrastructure in Yandex Cloud and manage it by configuration files. They store the infrastructure description in HashiCorp Configuration Language (HCL). Terraform and its providers are distributed under the Mozilla Public License.
For more information about the provider resources, see the documentation on the Terraform site or mirror site.
If you change the configuration files, Terraform automatically determines which part of your configuration is already deployed and what should be added or removed.
If you don't have Terraform, install it and configure the provider.
To create a cluster:
-
In the configuration file, describe the parameters of resources that you want to create:
-
Apache Kafka® cluster: Description of a cluster and its hosts. If necessary, you can also configure the Apache Kafka® settings here.
-
Network: Description of the cloud network where a cluster will be located. If you already have a suitable network, you don't have to describe it again.
-
Subnets: Description of the subnets to connect the cluster hosts to. If you already have suitable subnets, you don't have to describe them again.
Example configuration file structure:
terraform { required_providers { yandex = { source = "yandex-cloud/yandex" } } } provider "yandex" { token = "<OAuth or static key of service account>" cloud_id = "<cloud ID>" folder_id = "<folder ID>" zone = "<availability zone>" } resource "yandex_mdb_kafka_cluster" "<cluster name>" { environment = "<environment: PRESTABLE or PRODUCTION>" name = "<cluster name>" network_id = "<network ID>" security_group_ids = ["<list of cluster security group IDs>"] deletion_protection = <cluster deletion protection: true or false> config { assign_public_ip = "<cluster public access: true or false>" brokers_count = <number of brokers> version = "<Apache Kafka® version: 2.8, 3.0, 3.1, or 3.2>" schema_registry = "<data schema management: true or false>" kafka { resources { disk_size = <storage size, GB> disk_type_id = "<disk type>" resource_preset_id = "<host class>" } } zones = [ "<availability zones>" ] } } resource "yandex_vpc_network" "<network name>" { name = "<network name>" } resource "yandex_vpc_subnet" "<subnet name>" { name = "<subnet name>" zone = "<availability zone>" network_id = "<network ID>" v4_cidr_blocks = ["<range>"] }
Cluster deletion protection will not prevent a manual connection to a cluster to delete data.
To set up the maintenance window (for example, for disabled clusters), add the
maintenance_window
section to the cluster description:resource "yandex_mdb_kafka_cluster" "<cluster name>" { ... maintenance_window { type = <maintenance type: ANYTIME or WEEKLY> day = <day of the week for the WEEKLY type> hour = <hour of the day for the WEEKLY type> } ... }
Where:
type
: Maintenance type:anytime
: Anytime.weekly
: By schedule.
day
: Day of the week for theweekly
type inDDD
format. For example,MON
.hour
: Hour of the day for theweekly
type in theHH
format. For example,21
.
-
-
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Create a cluster.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
After this, all the necessary resources will be created in the specified folder and the IP addresses of the VMs will be displayed in the terminal. You can check that the resources are there with the correct settings, using the management console.
-
For more information, see the Terraform provider documentation.
Warning
The Terraform provider limits the amount of time for all Managed Service for Apache Kafka® cluster operations to complete to 60 minutes.
Operations exceeding the set timeout are interrupted.
Add the timeouts
block to the cluster description, for example:
resource "yandex_mdb_kafka_cluster" "<cluster name>" {
...
timeouts {
create = "1h30m" # 1 hour 30 minutes
update = "2h" # 2 hours
delete = "30m" # 30 minutes
}
}
Use the create API method and pass the following information in the request:
-
In the
folderId
parameter, the ID of the folder where the cluster should be placed. -
The cluster name in the
name
parameter.
* Security group identifiers, in thesecurityGroupIds
parameter. -
Settings for the maintenance window (including for disabled clusters) in the
maintenanceWindow
parameter. -
Cluster deletion protection settings in the
deletionProtection
parameter.Cluster deletion protection will not prevent a manual connection to a cluster to delete data.
To manage topics via the Apache Kafka® Admin API:
- Pass
true
for theunmanagedTopics
parameter. You cannot edit this setting after you create a cluster. - After creating a cluster, create an admin user.
To manage data schemas using Managed Schema Registry, pass the true
value for the configSpec.schemaRegistry
parameter. You cannot edit this setting after you create a cluster.
To allow access to the cluster from Yandex Data Transfer in Serverless mode, pass true
for the configSpec.access.dataTransfer
parameter.
This will enable you to connect to Yandex Data Transfer running in Kubernetes via a special network. It will also cause other operations to run faster, such as transfer launch and deactivation.
To create a cluster deployed on groups of dedicated hosts, pass a list of host IDs in the hostGroupIds
parameter.
Alert
You cannot edit this setting after you create a cluster. The use of dedicated hosts significantly affects cluster pricing.
Warning
If you specified security group IDs when creating a cluster, you may also need to configure security groups to connect to the cluster.
Examples
Creating a single-host cluster
Create a Managed Service for Apache Kafka® cluster with test characteristics:
- With the name
mykf
. - In the
production
environment. - With Apache Kafka® version
3.2
. - In the
default
network. - In the security group
enp6saqnq4ie244g67sb
. - With one
s2.micro
host in theru-central1-a
availability zone. - With one broker.
- With a network SSD storage (
network-ssd
) of 10 GB. - With public access.
- With protection against accidental cluster deletion.
Run the following command:
yc managed-kafka cluster create \
--name mykf \
--environment production \
--version 3.2 \
--network-name default \
--zone-ids ru-central1-a \
--brokers-count 1 \
--resource-preset s2.micro \
--disk-size 10 \
--disk-type network-ssd \
--assign-public-ip \
--security-group-ids enp6saqnq4ie244g67sb \
--deletion-protection=true
Create a Managed Service for Apache Kafka® cluster with test characteristics:
- In the cloud with the ID
b1gq90dgh25bebiu75o
. - In the folder with the ID
b1gia87mbaomkfvsleds
. - With the name
mykf
. - In the
PRODUCTION
environment. - With Apache Kafka® version
3.2
. - In the new
mynet
network with the subnetmysubnet
.
* In the new security groupmykf-sg
allowing connection to the cluster from the Internet via port9091
. - With one
s2.micro
host in theru-central1-a
availability zone. - With one broker.
- With a network SSD storage (
network-ssd
) of 10 GB. - With public access.
- With protection against accidental cluster deletion.
The configuration file for the cluster looks like this:
terraform {
required_providers {
yandex = {
source = "yandex-cloud/yandex"
}
}
}
provider "yandex" {
token = "<OAuth or static key of service account>"
cloud_id = "b1gq90dgh25bebiu75o"
folder_id = "b1gia87mbaomkfvsleds"
zone = "ru-central1-a"
}
resource "yandex_mdb_kafka_cluster" "mykf" {
environment = "PRODUCTION"
name = "mykf"
network_id = yandex_vpc_network.mynet.id
security_group_ids = [ yandex_vpc_security_group.mykf-sg.id ]
deletion_protection = true
config {
assign_public_ip = true
brokers_count = 1
version = "3.2"
kafka {
resources {
disk_size = 10
disk_type_id = "network-ssd"
resource_preset_id = "s2.micro"
}
}
zones = [
"ru-central1-a"
]
}
}
resource "yandex_vpc_network" "mynet" {
name = "mynet"
}
resource "yandex_vpc_subnet" "mysubnet" {
name = "mysubnet"
zone = "ru-central1-a"
network_id = yandex_vpc_network.mynet.id
v4_cidr_blocks = ["10.5.0.0/24"]
}
resource "yandex_vpc_security_group" "mykf-sg" {
name = "mykf-sg"
network_id = yandex_vpc_network.mynet.id
ingress {
description = "Kafka"
port = 9091
protocol = "TCP"
v4_cidr_blocks = [ "0.0.0.0/0" ]
}
}