Migrating databases to Managed Service for PostgreSQL
To migrate your database to Managed Service for PostgreSQL, you need to transfer the data directly, close the old database for writing, and then transfer the load to the database cluster in Yandex.Cloud.
There are two ways to migrate data to a Managed Service for PostgreSQL cluster:
- Logical replication is the recommended method (subscriptions). The subscription mechanism that logical replication is built on lets data be transferred to the Managed Service for PostgreSQL cluster with minimal downtime.
- Restoring a database from a dump using
pg_dump
.
Below, the DBMS server you are migrating data from is called the source server, and the Managed Service for PostgreSQL cluster that you are migrating to is the destination server.
The instructions assume that you are familiar with basic Linux administration.
Logical replication
Logical replication is supported as of PostgreSQL version 10. In addition to migrating data between the same DBMS versions, logical replication lets you migrate to later PostgreSQL versions: set up replication from the source server to the destination server running a later PostgreSQL version following the migration steps.
In Managed Service for PostgreSQL clusters, subscriptions can be used by the database owner (a user created simultaneously with the cluster) and users with the mdb.admin
role for the cluster.
Migration stages:
- Configure the source server.
- Export the database schema on the source.
- Create a destination cluster and restore the database schema.
- Create a publication and subscription PostgreSQL.
- Move the PostgreSQL sequence after replication.
- Disable replication and transfer the load.
Configure the data source server
-
Specify the necessary SSL and WAL settings in the
postgresql.conf
file. In Debian and Ubuntu, the default path to this file is/etc/postgresql/10/main/postgresql.conf
.-
We recommend using SSL for migrating data: this will not only help encrypt data, but also compress it. For more information, see SSL Support and Database Connection Control Functions in the PostgreSQL documentation.
To enable SSL, set the desired value in the configuration:
ssl = on # on, off
-
Change the logging level for Write Ahead Log (WAL) to add the information needed for logical replication. To do this, set the value of the wal_level as
logical
.The setting can be changed in
postgresql.conf
. Find the line with thewal_level
setting, comment it out if necessary, and set the value aslogical
:wal_level = logical # minimal, replica, or logical
-
-
Configure host authentication on the source. To do this, add the cluster hosts in Yandex.Cloud to the file
pg_hba.conf
(on Debian and Ubuntu distributions, the default path is/etc/postgresql/10/main/pg_hba.conf
).For this the lines that will allow incoming connections to the database from the specified hosts should be added.
-
If you use SSL:
hostssl all all <host address> md5 hostssl replication all <host address> md5
-
If you don't use SSL:
host all all <host address> md5 host replication all <host address> md5
-
-
If the source server has a firewall, allow incoming connections from the Managed Service for PostgreSQL cluster hosts. For example, for Ubuntu 18:
sudo ufw allow from <host address> to any port 5432
-
Restart the database server to apply all these settings:
sudo systemctl restart postgresql
-
Check the PostgreSQL status after restarting:
sudo systemctl status postgresql
Export the DB schema from the source
Using the pg_dump
utility, create a file with the database schema to be applied in the Managed Service for PostgreSQL cluster.
pg_dump -h <DBMS server address> - U <username> -p <port> --schema-only --no-privileges --no-subscriptions -d <database name> -Fd -f /tmp/db_dump
This command excludes all data associated with privileges and roles in order to avoid conflicts with the database settings in Yandex.Cloud. If your database requires additional users, create them.
Create a Managed Service for PostgreSQL cluster and restore the database schema
If you don't have a PostgreSQL cluster in Yandex.Cloud, create one Managed Service for PostgreSQL. When creating a cluster, specify the same database name that's on the source server.
Restore the schema in the new cluster:
pg_restore -Fd -v --single-transaction -s --no-privileges \
-h <receiving address> \
-U <username> \
-p 6432 \
-d <database name> /tmp/db_dump
Create a publication and subscription
For logical replication to work, you need to define a publication (a group of logically replicated tables) on the source server and a subscription (a description of a connection to another database) on the receiving server.
-
On the source server, create a publication for all database tables. When migrating multiple databases, you need to create a separate publication for each of them.
Note
You need superuser rights to create publications to all tables, but not to transfer the selected tables. For more information about creating publications, see the documentation PostgreSQL.
Request:
CREATE PUBLICATION p_data_migration FOR ALL TABLES;
-
On the Managed Service for PostgreSQL cluster host, create a subscription with a connection string to the publication. Learn more about creating
subscriptions in the PostgreSQL documentation.Request with SSL enabled:
CREATE SUBSCRIPTION s_data_migration CONNECTION 'host=<source server address> port=<port> user=<username> sslmode=verify-full dbname=<database name>' PUBLICATION p_data_migration;
Without SSL:
CREATE SUBSCRIPTION s_data_migration CONNECTION 'host=<source server address> port=<port> user=<username> sslmode=disable dbname=<database name>' PUBLICATION p_data_migration;
-
You can monitor replication status from folders
pg_subscription_rel
. You can get the general replication status of the receiving server usingpg_stat_subscription
and on the source server usingpg_stat_replication
.select * from pg_subscription_rel;
First of all, you should monitor replication status on the receiving server with the
srsubstate
field.r
in thesrsubstate
field means that synchronization has ended and the databases are ready to be replicated.
Migrate PostgreSQL sequences after replication
To fully complete synchronization between the source and the destination, disable writing new data on the source server and transfer the PostgreSQL sequences to the Managed Service for PostgreSQL cluster:
-
Export PostgreSQL sequences from the source:
pg_dump -h <DBMS server address> -U <username> -p <port> -d <database name> \ --data-only -t '*.*_seq' > /tmp/seq-data.sql
Pay attention to the pattern used: if the database you're migrating has sequences that don't match the
*.*_seq
pattern, you have to enter a different pattern to upload them. For more information about patterns, see the documentation PostgreSQL. -
Restore sequences on the Managed Service for PostgreSQL host:
psql -h <DBMS server address> -U <username> -p 6432 -d <database name> \ < /tmp/seq-data.sql
Disable replication and transfer the load
When replication is complete and you move the sequences, remove the subscription on the destination server (in the Managed Service for PostgreSQL cluster):
DROP SUBSCRIPTION s_data_migration;
Afterwards, the load can be transferred to the receiving server. Since transferring sequences is a relatively quick and easily automated process, migrating to Managed Service for PostgreSQL is possible with minimal downtime.
Restore a database from a dump
To migrate data from an existing PostgreSQL database to Managed Service for PostgreSQL, use pg_dump
and pg_restore
: create a dump of a working database and restore it to the Yandex.Cloud PostgreSQL cluster.
Note
To use pg-restore
, you may need to expand the pg_repack
database.
Before trying to import your data, check whether the DBMS versions of the existing database and your cluster in Yandex.Cloud match. If not, you won't be able to restore the created dump. To migrate to PostgreSQL version 11 or 12, you can use logical replication.
Migration stages:
- Create a dump of the database you want to migrate.
- (optional) Create a virtual machine in Yandex.Cloud and upload the database dump to it.
- Create a cluster Managed Service for PostgreSQL.
- Restore data from the dump to the cluster.
Create a database dump
Use pg_dump to create a database dump.
-
Before creating a dump, we recommend switching the database to
read-only
to avoid losing data that might appear while the dump is created. The database dump itself is created using the following command:$ pg_dump -h <DBMS server address> -U <username> -Fd -d <DB name> -f ~/db_dump
-
To speed up the process, you can start dumping with multiple processor cores. To do this, set the
-j
flag with the number equal to the number of cores available to the DBMS:$ pg_dump -h <DBMS server address> -U <username> -j 4 -Fd -d <database name> ~/db_dump
-
Archive the dump:
$ tar -cvzf db_dump.tar.gz ~/db_dump
For more information about pg_dump
, see the documentation PostgreSQL.
(optional) Create a virtual machine in Yandex.Cloud and upload a dump to it
Transfer your data to an intermediate VM in Compute Cloud if:
- Your Managed Service for PostgreSQL cluster is not accessible from the internet.
- Your hardware or connection to the cluster in Yandex.Cloud is not very reliable.
The required amount of RAM and processor cores depends on the amount of data to migrate and the required migration speed.
To prepare the virtual machine to restore the dump:
-
In the management console, create a new virtual machine from the Ubuntu 18.04 image. The VM parameters depend on the size of the database you want to migrate. The minimum configuration (1 core, 2 GB RAM, 10 GB disk space) should be sufficient to migrate a database that's up to 1 GB in size. The bigger the database being migrated, the more RAM and storage space you need (at least twice as large as the size of the database).
The virtual machine must be in the same network and availability zone as the PostgreSQL cluster. Additionally, the VM must be assigned an external IP address so that you can load the dump from outside Yandex.Cloud.
-
Set up the PostgreSQL apt repository.
-
Install the PostgreSQL client and additional utilities for working with the DBMS:
$ sudo apt install postgresql-client-common # For PostgreSQL 10 $ sudo apt install postgresql-client-10 # For PostgreSQL 11 $ sudo apt install postgresql-client-11 # For PostgreSQL 12 $ sudo apt install postgresql-client-12
-
Move the DB dump to the VM. For example, you can use
scp
:scp ~/db_dump.tar.gz <VM username>@<VM public address>:/tmp/db_dump.tar.gz
-
Unpack the dump:
tar -xzf /tmp/db_dump.tar.gz
Create a cluster Managed Service for PostgreSQL
Make sure that the computing power and storage size of the cluster are appropriate for the environment where the existing databases are deployed and create a cluster.
Restore data in the new environment
Use the pg_restore utility to restore your DB dump.
The version of pg_restore
must match the pg_dump
version, and the major version must be at least as high as on the DB where the dump is deployed.
For example, to restore a PostgreSQL 10, PostgreSQL 11, or PostgreSQL 12 database from a dump, use pg_restore 10
, pg_restore 11
, or pg_restore 12
, respectively.
If you only need to restore a single schema, add the -n <schema name>
flag (without it, the command only runs on behalf of the database owner). Best practice is to restore data with the --single-transaction
flag to avoid an inconsistent state of the database if an error occurs:
pg_restore -Fd \
-v \
-h <pgsql_host_address> \
-U <username>
-d <database_name> \
-p 6432 \
/tmp/db_dump \
--single-transaction \
--no-privileges
For more information about pg_restore
, see the documentation PostgreSQL.