Questions and answers about Managed Service for ClickHouse

General questions

What is Managed Service for ClickHouse?

Managed Service for ClickHouse is a service that helps you create, operate, and scale ClickHouse databases in the cloud infrastructure.

With Managed Service for ClickHouse, you can:

  • Create a database with the required performance characteristics.
  • Scale processing power and storage dedicated for your databases as needed.
  • Get database logs.

Managed Service for ClickHouse takes on labor-intensive Managed Service for ClickHouse infrastructure administration tasks:

  • Monitors resource usage.
  • Automatically creates DB backups.
  • Provides fault tolerance through automatic failover to backup replicas.
  • Keeps the database software updated.

You interact with a database cluster in Managed Service for ClickHouse in the same way as with a regular database in your local infrastructure. This allows you to manage internal database settings to meet your app's requirements.

What part of DB management and maintenance is Managed Service for ClickHouse responsible for?

When creating clusters, Managed Service for ClickHouse allocates resources, installs the DBMS, and creates databases.

For the created and running databases, Managed Service for ClickHouse automatically creates backups and applies fixes and updates to the DBMS.

Managed Service for ClickHouse also provides data replication between database hosts (both inside and between availability zones) and automatically switches the load over to a backup replica in the event of a failure.

For which tasks should I use Managed Service for ClickHouse and for which VMs with databases?

Yandex.Cloud offers two ways to work with databases:

  • Managed Service for ClickHouse allows you to operate template databases with no need to worry about administration.
  • With Yandex Compute Cloud VMs, you can create and configure your own databases. This approach allows you to use any database management systems, access databases via SSH, and so on.

What is a database host and database cluster?

A database host is an isolated database environment in the cloud infrastructure with dedicated computing resources and reserved data storage.

A database cluster is one or more database hosts between which replication can be configured.

How do I get started with Managed Service for ClickHouse?

Managed Service for ClickHouse is available to all registered Yandex.Cloud users.

To create a database cluster in Managed Service for ClickHouse, decide what the characteristics will be:

  • Host class (performance characteristics such as CPUs, memory, and so on).
  • Storage size (reserved in full when you create the cluster).
  • The network your cluster will be connected to.
  • The number of hosts for the cluster and the availability zone for each host.

For detailed instructions, see the section Getting started with Managed Service for ClickHouse.

How many DB hosts can a cluster contain?

For a network-based storage (NBS), the number of hosts in a cluster is limited only by the requested computing resources and the size of the storage for the cluster.

For NVMe SSD storage, the number of hosts is limited during cluster creation: for ClickHouse-clusters, at least three hosts must be created.

How can I access a running DB host?

You can connect to Managed Service for ClickHouse databases using standard DBMS methods.

More about how to connect to clusters.

How many clusters can I create within a single cloud?

For MDB technical and organizational limitations, see the section Quotas and limits.

How do I maintain database clusters?

Maintenance in Managed Service for ClickHouse implies:

  • Automatic installation of DBMS updates and fixes for your database hosts.
  • Changes to the host class and storage size.
  • Other Managed Service for ClickHouse maintenance activities.

Which ClickHouse version does Managed Service for ClickHouse use?

Managed Service for ClickHouse uses the latest stable version of ClickHouse.

What happens when a new DBMS version is released?

When new minor versions are released, the cluster software is updated after a short testing period. The owners of the affected DB clusters receive an advance notice of expected work timing and DB availability.

What happens when a DBMS version becomes deprecated?

One month after the database version becomes deprecated, Managed Service for ClickHouse automatically sends email notifications to the owners of DB clusters created with this version.

New hosts can no longer be created using deprecated DBMS versions. Seven days within such notification for minor versions and one month for major versions, the database clusters are automatically upgraded to the next supported version. Deprecated major versions are upgraded even if you have disabled their automatic updates.

How is the cost of usage calculated for a database host?

In Managed Service for ClickHouse, the usage cost is calculated based on the following parameters:

  • Selected host class.
  • Size of the storage reserved for the database host.
  • Size of the database cluster backups. Backup space in the amount of the reserved storage is free of charge. Storage of backups in excess of this size is charged at special rates.
  • Number of hours of database host operation. Partial hours are rounded to an integer value. The cost per hour of operation for each host class is given in the section Pricing policy for Managed Service for ClickHouse.

How can I change the computing resources and storage size for a database cluster?

You can change the computing resources and storage size in the management console. All you need to do is choose a different host class for the required cluster.

The cluster characteristics change within 30 minutes. During this period, other maintenance activities may also be enabled for the cluster, such as installing updates.

Is DB host backup enabled by default?

Yes, backup is enabled by default. For ClickHouse, a full backup is performed once a day with the possibility to restore it to any saved backup.

By default, backups are stored for seven days.

When is backup performed? Is a DB cluster available during backup?

The backup window is an interval during which a full daily backup of the DB cluster is performed. The backup window is from 01:00 a.m. to 05:00 a.m. (UTC+3).

Clusters remain fully accessible during the backup window.

What metrics and processes can be tracked using monitoring?

For all DBMS types, you can track:

  • CPU, memory, network, or disk usage, in absolute terms.
  • Memory, network, or disk usage as a percentage of the set limits for the corresponding cluster's host class.
  • The amount of data in the DB cluster and the remaining free space in the data storage.

For any DB hosts, you can track metrics specific to the type of the corresponding DBMS. For example, for PostgreSQL, you can track:

  • Average query execution time
  • Number of queries per second
  • Number of errors in logs, etc.

Monitoring can be performed with a minimum granularity of 5 seconds.

Questions about ClickHouse

Why should I use ClickHouse in Managed Service for ClickHouse and not my own installation on a VM?

Managed Service for ClickHouse automates routine database maintenance:

  • Quick DB deployment with the necessary available resources.

  • Data backup.

  • Regular software updates.

  • DB cluster failover.

  • Database usage monitoring and statistics.

When should I use ClickHouse instead of PostgreSQL?

ClickHouse supports only adding and reading data, as it is designed primarily for analytics (OLAP). In other cases, it is probably more convenient to use PostgreSQL.

Is it possible to connect to individual ClickHouse hosts?

Yes. You can connect to the hosts of a ClickHouse cluster via an encrypted connection:

SSH connections are not supported.

How can I load data to ClickHouse?

Use the INSERT query described in the documentation on ClickHouse.

How do I load a very large amount of data to ClickHouse?

Use the CLI for efficient data compression during transmission (the recommended frequency is no more than one INSERT command per second).

Data transfer from physical media is not yet supported.

What happens to a cluster if one of its nodes fails?

DB clusters consist of at least two replicas, so the cluster will continue working if one of its nodes is out.

Data may be lost only if a node with a non-replicated table fails.

Is it possible to deploy a ClickHouse DB cluster in multiple availability zones?

Yes. A database cluster may consist of hosts that reside in different availability zones and even different availability regions.

How can I back up a ClickHouse database?

Backups are created every 24 hours and stored for seven days after being created. You can restore data only as of backup creation time.

How does replication work for ClickHouse?

ZooKeeper is used for replication. Managed Service for ClickHouse creates a separate ZooKeeper cluster for each ClickHouse cluster.

Access to ZooKeeper and its setup are not available to Cloud users.

Why does a ClickHouse cluster take up 3 hosts more than it should?

When creating a ClickHouse cluster with two or more hosts, Managed Service for ClickHouse automatically creates a cluster of three ZooKeeper hosts for managing replication and fault tolerance. These hosts are included when calculating the resource quotas used in the cloud and when calculating the cost of the cluster. By default, ZooKeeper hosts are created with a minimal host class.

For more information about using ZooKeeper, see ClickHouse documentation.