Yandex.Cloud
  • Services
  • Why Yandex.Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Yandex Managed Service for MongoDB
  • Getting started
  • Step-by-step instructions
    • All instructions
    • Information about existing clusters
    • Creating clusters
    • Database migration to Yandex.Cloud
    • Connecting to databases
    • Stop and start the cluster
    • Changing cluster and database settings
    • MongoDB version upgrade
    • Database management
    • Managing hosts in a cluster
    • Managing database users
    • Managing backups
    • How to manage shards
    • Deleting clusters
  • Solutions
    • Sharding collections
  • Concepts
    • Relationship between service resources
    • Host classes
    • Network in Yandex Managed Service for MongoDB
    • Quotas and limits
    • Storage types
    • Backups
    • Replication
    • Sharding
    • Users and roles
    • Supported clients
  • Access management
  • Pricing policy
    • Current pricing policy
    • Archive
      • Before January 1, 2019
      • From January 1 to March 1, 2019
      • From March 1, 2019 to February 1, 2020
  • API reference
    • Authentication in the API
    • gRPC
      • Обзор
      • BackupService
      • ClusterService
      • DatabaseService
      • ResourcePresetService
      • UserService
      • OperationService
    • REST
      • Overview
      • Backup
        • Overview
        • get
        • list
      • Cluster
        • Overview
        • addHosts
        • addShard
        • backup
        • create
        • delete
        • deleteHosts
        • deleteShard
        • enableSharding
        • get
        • getShard
        • list
        • listBackups
        • listHosts
        • listLogs
        • listOperations
        • listShards
        • move
        • rescheduleMaintenance
        • resetupHosts
        • restartHosts
        • restore
        • start
        • stop
        • streamLogs
        • update
      • Database
        • Overview
        • create
        • delete
        • get
        • list
      • ResourcePreset
        • Overview
        • get
        • list
      • User
        • Overview
        • create
        • delete
        • get
        • grantPermission
        • list
        • revokePermission
        • update
      • Operation
        • Overview
        • get
  • Questions and answers
    • General questions
    • Questions about MongoDB
    • All questions on the same page
  1. Concepts
  2. Sharding

Sharding

  • Benefits of sharding
  • Use of sharding
    • Technical restrictions
    • Geographically distributed consumers
    • Insufficient fault-tolerance
    • Slow query processing

Note

Sharding in Managed Service for MongoDB is available for clusters running MongoDB version 4.0 or higher. If your cluster is deployed with version 3.6, you can update it.

Sharding is a horizontal data scaling strategy that puts parts of MongoDB collections on different hosts in the cluster. Shards (sets of hosts) are linked to data sets using the shard key. MongoDB supports sharding to handle large data volumes and increase DBMS throughput. Sharding is particularly useful when vertical scaling (upgrading server capacity) isn't cost-efficient or possible.

Managed Service for MongoDB supports the core data sharding strategies:

  • Hashed sharding (by a range of hashed shard key values)
  • Ranged sharding (by a shard key value range)

Benefits of sharding

Sharding allows you to distribute loads across database hosts, which lets you overcome the resource restrictions of a single server. This is particularly important when you handle large amounts of data or run compute-intensive jobs.

Horizontal scaling is the distribution of data sets and workloads across multiple nodes. You can increase disk space by adding more servers. While a single machine may be slow or low-capacity, in a horizontally-scaled cluster, each machine handles only part of the total load and stores only part of the total data. This makes the system potentially more efficient than a single server with a large capacity and fast disks.

The downside of sharding is the complexity of the infrastructure, deployment, and maintenance.

More information on MongoDB database sharding can be found in the MongoDB documentation.

Use of sharding

Sharding is often used in the following cases:

  • High frequency of database queries and fast data growth is expected.
  • An app requires more and more resources, but the cluster replica solution can't be expanded using higher-capacity and faster disks, server RAM, or more powerful CPUs.

Sharding can help you solve the following problems:

  • Technical restrictions.
  • Geographically distributed data consumers.
  • Insufficient fault-tolerance.
  • Low query processing speed.

Technical restrictions

The need to work with fairly large data sets may cause your data storage infrastructure to reach the maximum capacity of commercially available hardware (for example, disk subsystem IOPS).

When your apps approach the performance limits, it's a good idea to split the data into shards and distribute the read operations.

Geographically distributed consumers

By distributing your cluster shards across regions, you can:

  • Improve availability for regional users.
  • Comply with the local laws, for example, by storing your data in a particular country or region.

Insufficient fault-tolerance

Sharding lets you isolate individual host or replica set malfunctions. If you don't use sharding, then when one host fails, access to the entire data set it contains is lost completely. But if one shard out of five fails, for example, then 80% of the collection data is still available.

To reduce the risk of a whole shard going offline, we recommend configuring shards as a set of three replicas. Moreover, if you distribute shard hosts across different Yandex.Cloud availability zones, you increase data availability.

Slow query processing

Query processing can slow down when they begin to compete for resources. This usually happens as the number of read operations or CPU time per query grows.

However, in a sharded cluster, where shards query the same collection in parallel, competition for shared resources (CPU, disk subsystem) is eliminated and query processing time is reduced.

In this article:
  • Benefits of sharding
  • Use of sharding
  • Technical restrictions
  • Geographically distributed consumers
  • Insufficient fault-tolerance
  • Slow query processing
Language
Careers
Privacy policy
Terms of use
© 2021 Yandex.Cloud LLC