Sharding MongoDB collections


Sharding in Managed Service for MongoDB is available for clusters running MongoDB version 4.0 or higher. If your cluster is deployed with version 3.6, you can update it.

It makes sense to shard collections when splitting data into shards significantly helps improve DBMS performance or data availability. To increase availability, each shard should consist of 3 or more database hosts.

Ease of use and actual performance improvements depend strongly on the sharding key you choose: make sure that the data of the collection is logically distributed across shards and isn't linked to data in different shards.

You should use sharding if:

  • Significantly large volumes of data. Consider sharding if your collection is 200 GB or more.
  • Collections with non-uniform contents. For example, data can be clearly classified as frequently queried and rarely queried.
  • Collections requiring high read and write speeds. Sharding helps distribute workloads among hosts to bypass technical limitations.

For more information about sharding, see Sharding.

How to enable collection sharding

  1. Open a Managed Service for MongoDB cluster page in the management console.

  2. Go to the Shards tab and click Enable.

  3. Specify the host class, storage type and size, and subnets for shard hosts (mongocfg and mongos).

  4. Connect to the mongos host using the mongo CLI and enable sharding:

    sh.enableSharding(<database name>)
  5. Define an index for the sharded collection:

    db.<collection name>.ensureIndex( { "<index>": "hashed" } )
  6. Enable collection sharding:

    sh.shardCollection( "<collection>", { "<index>": "hashed" } )

    For a detailed description of the shardCollection command, see the MongoDB documentation.

  7. Modify applications that use your database to only use the mongos hosts.

From the MongoDB documentation, you can learn how to solve issues related to sharding:

Example of sharding

Let's say you already have a Managed Service for MongoDB sharded cluster hosting the billing database. Your task is to enable sharding for the payment and addresses collections. In the example, the payment index hash and the value of the addresses field are used as the shard key.

Sequence of operations:

  1. Connect to the database.

  2. Enable billing database sharding:

  3. Define the index for the sharded collection:

    db.payments.ensureIndex( { "_id": "hashed" } )
  4. Create the necessary number of shards in management console.

  5. Shard the collection based on its namespace:

    sh.shardCollection( "billing.payments", { "_id": "hashed" } )

Sharding is now enabled and configured. To make sure, try listing the available shards using the command db.adminCommand( { listShards: 1 } ).