Yandex.Cloud
  • Services
  • Why Yandex.Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Yandex Managed Service for ClickHouse
  • Getting started
  • Step-by-step instructions
    • All instructions
    • Information about existing clusters
    • Creating clusters
    • Connecting to databases
    • Stop and start the cluster
    • SQL queries in the management console
    • Changing cluster and database settings
    • Connecting to DataLens
    • Connecting external dictionaries
    • Adding your own geobase
    • Enabling machine learning models
    • Changing ClickHouse versions
    • Managing ClickHouse hosts
    • Adding ZooKeeper hosts
    • Database management
    • Managing database users
    • Managing backups
    • How to manage shards
    • Deleting clusters
  • Solutions
    • Adding data to the database
    • Migrating ClickHouse data
    • Sharding tables
  • Concepts
    • Relationship between service resources
    • Host classes
    • Network in Yandex Managed Service for ClickHouse
    • Quotas and limits
    • Storage types
    • Backups
    • Replication
    • Dictionaries
    • Sharding
    • Supported clients
  • Access management
  • Pricing policy
    • Current pricing policy
    • Archive
      • Before January 1, 2019
      • From January 1 to March 1, 2019
      • From March 1, 2019 to February 1, 2020
  • API reference
    • Authentication in the API
    • gRPC
      • Overview
      • BackupService
      • ClusterService
      • DatabaseService
      • FormatSchemaService
      • MlModelService
      • ResourcePresetService
      • UserService
      • VersionsService
      • OperationService
    • REST
      • Overview
      • Backup
        • Overview
        • get
        • list
      • Cluster
        • Overview
        • addHosts
        • addShard
        • addZookeeper
        • backup
        • create
        • createExternalDictionary
        • createShardGroup
        • delete
        • deleteExternalDictionary
        • deleteHosts
        • deleteShard
        • deleteShardGroup
        • get
        • getShard
        • getShardGroup
        • list
        • listBackups
        • listHosts
        • listLogs
        • listOperations
        • listShardGroups
        • listShards
        • move
        • rescheduleMaintenance
        • restore
        • start
        • stop
        • streamLogs
        • update
        • updateShard
        • updateShardGroup
      • Database
        • Overview
        • create
        • delete
        • get
        • list
      • FormatSchema
        • Overview
        • create
        • delete
        • get
        • list
        • update
      • MlModel
        • Overview
        • create
        • delete
        • get
        • list
        • update
      • ResourcePreset
        • Overview
        • get
        • list
      • User
        • Overview
        • create
        • delete
        • get
        • grantPermission
        • list
        • revokePermission
        • update
      • Versions
        • Overview
        • list
      • Operation
        • Overview
        • get
  • Questions and answers
    • General questions
    • Questions about ClickHouse
    • All questions on the same page
  1. Solutions
  2. Sharding tables

Sharding ClickHouse tables

  • How to start sharding tables
  • Example of sharding

It makes sense to shard tables when splitting data into shards significantly helps improve DBMS performance or data availability. To increase availability, each shard should consist of 3 or more database hosts.

Data should be split into shards if:

  • Your tables are very big. Consider sharding if your table is 200 GB or larger.
  • The content of your tables is non-uniform. For example, data can be clearly classified as frequently queried and rarely queried.
  • Your tables require high read and write performance. Sharding helps distribute workloads among the hosts to bypass technical limitations.

Ease of use and actual performance improvements depend strongly on the shard key you choose: make sure that the data is logically distributed across shards and isn't linked to data in different shards.

For more information about sharding, see Sharding.

How to start sharding tables

By default, Managed Service for ClickHouse creates the first shard together with the cluster. This shard includes all the hosts in the cluster. To start using sharding, add the number of shards you need and create a table on the Distributed engine. The article under the link describes sharding strategies and guidelines for creating tables in the applicable format, as well as distributed table limits.

Managed Service for ClickHouse automatically creates the shard configuration in the cluster. You can manage this configuration.

Example of sharding

Let's say you already have a sharded Managed Service for ClickHouse cluster hosting the db1 database. Your task is to enable sharding for the hits table. A random number, rand(), is used as a sharding expression in the example:

  1. Connect to the database.

  2. Create a distributed table:

    CREATE TABLE sharding ENGINE = Distributed(logs, db1, hits, rand());
    

After that, you can do SELECT queries and INSERT queries against the created table. The queries will be processed according to the configuration you set.

In this article:
  • How to start sharding tables
  • Example of sharding
Language
Careers
Privacy policy
Terms of use
© 2021 Yandex.Cloud LLC