Sharding collections MongoDB
While sharding a Managed Service for MongoDB cluster, the following service hosts are automatically created and billed separately from the main DBMS hosts:
- either
MONGOS
andMONGOCFG
, - or
MONGOINFRA
.
Alert
You can't unshard a cluster: to return a cluster to the state before it was sharded, you have to recreate it from a backup copy.
It makes sense to shard collections when splitting data into shards significantly helps improve DBMS performance or data availability. To increase availability, each shard should consist of 3 or more database hosts.
Ease of use and actual performance improvements significantly depend on the shard key you choose: make sure that the collection data is logically distributed across shards and is not linked to data in different shards.
You should use sharding for:
- Large data volumes. Consider sharding if your collection size is 200 GB or larger.
- Collections with non-uniform contents. For example, data can be clearly classified as frequently queried and rarely queried.
- Collections requiring high read and write speeds. Sharding helps distribute workloads among hosts to bypass technical limitations.
For more information about the sharding concept, see Sharding in Managed Service for MongoDB.
How to enable collection sharding
Warning
Run all your sharding setup commands via the mongosh
CLI from a user granted the mdbShardingManager role in the admin
-
Enable sharding for the cluster.
-
Connect to the
MONGOS
orMONGOINFRA
host using themongosh
CLI and enable sharding:sh.enableSharding("<DB_name>")
You can request the host type with a list of hosts in the cluster.
-
Define an index for the sharded collection:
db.getSiblingDB("<DB_name>").<collection_name>.createIndex( { "<index>": <index_type> } )
-
Enable collection sharding:
sh.shardCollection( "<DB_name>.<collection>", { "<index>": <index_type> } )
For a detailed description of the
shardCollection
command, see the MongoDB documentation . -
Modify applications that access your database to use only the
MONGOS
orMONGOINFRA
hosts.
Sharding heterogeneous data
If a collection includes documents with heterogeneous data types_id
key values of a single type using Type Bracketing_id
values of different types.
Useful links
You can learn how to solve issues related to sharding in the MongoDB documentation:
- Sharding overview: Sharding
. - Choosing a shard key and sharding strategies: Shard Keys
.
Example of sharding
Let's say you already have a Managed Service for MongoDB sharded cluster hosting the billing
database. Your task is to enable sharding for the payment
and addresses
collections. In the example, the payment
index hash and the value of the addresses
field are used as the shard key.
Sequence of operations:
-
Connect to the
billing
database. Make sure that the user connecting to the database has the mdbShardingManager role in the admin database. -
Enable
billing
database sharding:sh.enableSharding("billing")
-
Define the index for the sharded collection:
db.payments.ensureIndex( { "_id": "hashed" } )
-
Create the necessary number of shards in the management console
. -
Shard the collection based on its namespace:
sh.shardCollection( "billing.payments", { "_id": "hashed" } )
Sharding is now enabled and configured. To make sure, try listing the available shards using the command sh.status()
.