Migrating databases to Managed Service for PostgreSQL

To migrate your database to Managed Service for PostgreSQL, you need to transfer the data directly, close the old database for writing, and then transfer the load to the database cluster in Yandex.Cloud.

There are two ways to migrate data to a Managed Service for PostgreSQL cluster:

  • Logical replication is the recommended method (subscriptions). The subscription mechanism that logical replication is built on lets data be transferred to the Managed Service for PostgreSQL cluster with minimal downtime.
  • Restoring a database from a dump using pg_dump.

Below, the DBMS server you are transferring data from is called the source server, and the Managed Service for PostgreSQL cluster that you are migrating to is the destination server.

The instructions assume that you are familiar with basic Linux administration.

Logical replication

Logical replication is supported as of PostgreSQL ver. 10. In addition to migrating data between the same DBMS versions, logical replication allows you to migrate from PostgreSQL version 10 to 11: after configuring replication from the source server with PostgreSQL 10 to the destination server with PostgreSQL 11, just follow the migration steps.

In Managed Service for PostgreSQL clusters, subscriptions can be used by the database owner (a user created simultaneously with the cluster) and users with the mdb.admin role for the cluster.

Migration stages:

  1. Configure the source server.
  2. Export the database schema on the source.
  3. Create a destination cluster and restore the database schema.
  4. Create a PostgreSQL publication and subscription.
  5. Transfer the PostgreSQL sequence after replication.
  6. Disable the replication and transfer the load.

Configure the server with the data source

  1. Specify the necessary SSL and WAL settings in the postgresql.conf file. In Debian and Ubuntu, the default path to this file is /etc/postgresql/10/main/postgresql.conf.

    1. We recommend using SSL for migrating data: this will not only help encrypt data, but also compress it. For more information, see SSL Support and Database Connection Control Functions in the PostgreSQL documentation.

      To enable SSL, set the desired value in the configuration:

      ssl = on                   # on, off
      
    2. Change the logging level for Write Ahead Log (WAL) to add the information needed for logical replication. To do this, set the value of the wal_level as logical.

      The setting can be changed in postgresql.conf. Find the line with the wal_level setting, comment it out if necessary, and set the value as logical:

      wal_level = logical                    # minimal, replica, or logical
      
  2. Configure host authentication on the source. To do this, add the cluster hosts in Yandex.Cloud to the file pg_hba.conf (on Debian and Ubuntu distributions, the default path is /etc/postgresql/10/main/pg_hba.conf).

    For this the lines that will allow incoming connections to the database from the specified hosts should be added.

    • If you use SSL:

      hostssl         all            all             <host addres>      md5
      hostssl    replication         all             <host address>      md5
      
    • If you don't use SSL:

      host         all            all             <host address>      md5
      host    replication         all             <host address>      md5
      
  3. If the source server has a firewall, allow incoming connections from the Managed Service for PostgreSQL cluster hosts. For example, for Ubuntu 18:

    sudo ufw allow from <host address> to any port 5432
    
  4. Restart the database server to apply all these settings:

    sudo systemctl restart postgresql
    
  5. Check the PostgreSQL status after restarting:

    sudo systemctl status postgresql
    

Export the database schema on the source

Using the pg_dump utility, create a file with the database schema to be applied in the Managed Service for PostgreSQL cluster.

pg_dump -h <DBMS server address> - U <username> -p <port> --schema-only --no-privileges --no-subscriptions-d <database name> -Fd-f/tmp/db_dump

This command excludes all data associated with privileges and roles in order to avoid conflicts with the database settings in Yandex.Cloud. If your database requires additional users, create them.

Create a Managed Service for PostgreSQL cluster and restore the database schema

If you don't have a PostgreSQL cluster in the Cloud yet, create a Managed Service for PostgreSQL cluster. When creating a cluster, specify the same database name that's on the source server.

Restore the schema in the new cluster:

pg_restore -Fd -v --single-transaction -s --no-privileges \
          -h <source address> \
          -u <username> \
          -p 6432 \
          -d <database name> /tmp/db_dump

Create a publication and subscription

For logical replication to work, you need to define a publication (a group of logically replicated tables) on the source server and a subscription (a description of a connection to another database) on the destination server.

  1. On the source server, create a publication for all database tables. When migrating multiple databases, you need to create a separate publication for each of them.

    Note

    You need superuser rights to create publications to all tables, but not to transfer the selected tables. For more information about creating publications, see the PostgreSQL documentation.

    Request:

    CREATE PUBLICATION p_data_migration FOR ALL TABLES;
    
  2. On the Managed Service for PostgreSQL cluster host, create a subscription with a connection string to the publication. For more information about creating subscriptions, see the PostgreSQL documentation.

    Request with SSL enabled:

    CREATE SUBSCRIPTION s_data_migration CONNECTION 'host=<source server address> port=<port> user=<username> sslmode=verify-full dbname=<database name>' PUBLICATION p_data_migration;
    

    Without SSL:

    CREATE SUBSCRIPTION s_data_migration CONNECTION 'host=<source server address> port=<port> user=<username> sslmode=disable dbname=<database name>' PUBLICATION p_data_migration;
    
  3. You can monitor replication status from folders pg_subscription_rel. You can get the general replication status of the destination server using pg_stat_subscription and on the source server using pg_stat_replication.

    select * from pg_subscription_rel;
    

    First of all, you should monitor replication status on the destination server with the srsubstate field. r in the srsubstate field means that synchronization has ended and the databases are ready to be replicated.

Migrate PostgreSQL sequences after replication

To fully complete synchronization of the source and destination server, disable writing new data on the source server and transfer PostgreSQL sequences to the Managed Service for PostgreSQL cluster:

  1. Export PostgreSQL sequences on the source:

    pg_dump -h <DBMS server address> -U <username> -p <port> -d <database name> \
            --data-only -t '*.*_seq' > /tmp/seq-data.sql
    

    Pay attention to the pattern used: if the migrating database has sequences that don't match the *.*_seq pattern, you have to enter a different pattern to upload them. For more information about patterns, see the PostgreSQL documentation.

  2. Restore sequences on the Managed Service for PostgreSQL host:

    psql -h <DBMS server address> -U <username> -p 6432 -d <database name> \
         < /tmp/seq-data.sql
    

Disable replication and transfer the load

When replication is complete and you move the sequences, remove the subscription on the destination server (in the Managed Service for PostgreSQL cluster):

DROP SUBSCRIPTION s_data_migration;

Afterwards, the load can be transferred to the destination server. Since transferring sequences is a relatively quick process and easy automated, migrating to Managed Service for PostgreSQL is possible with minimal downtime.

Restore a database from a dump

To transfer data from an existing PostgreSQL database to Managed Service for PostgreSQL, use pg_dump and pg_restore: create a dump of a working database and restore it in the Yandex.Cloud PostgreSQL cluster.

Before trying to import your data, check whether the DBMS versions of the existing database and your cluster in Yandex.Cloud match. If not, you won't be able to restore the created dump. To migrate from version 10 to 11, you can use logical replication.

Migration stages:

  1. Create a dump of the migrated database.
  2. Create a virtual machine in Yandex.Cloud and upload the database dump to it (optional).
  3. Create a Managed Service for PostgreSQL cluster.
  4. Restore data from the dump to the cluster.

Create a database dump

Use pg_dump to create a database dump.

  1. Before creating a dump, it is recommended to switch the database to read-only to avoid losing data that might appear while the dump is created. The database dump itself is created using the following command:

    $ pg_dump -h <DBMS server address> -U <username> -Fd -d <DB name> -f ~/db_dump
    
  2. To speed up the process, you can start dumping with multiple processor cores. To do this, set the -j flag with the number equal to the number of cores available to the DBMS:

    $ pg_dump -h <DBMS server address> -U <username> -j 4 -Fd -d <database name> ~/db_dump
    
  3. Archive the dump:

    $ tar -cvzf db_dump.tar.gz ~/db_dump
    

Fore more information about pg_dump, see the PostgreSQL documentation.

(optional) Create a virtual machine in Yandex.Cloud and upload a dump to it

Transfer data to an intermediate virtual machine in Compute Cloud if:

  • Your Managed Service for PostgreSQL cluster is not accessible from the internet.
  • Your hardware or connection to the cluster in Yandex.Cloud is not very reliable.

The required amount of RAM and processor cores depends on the amount of data to migrate and the required migration speed.

To prepare the virtual machine to restore the dump:

  1. In the management console, create a new virtual machine from the Ubuntu 18.04 image. The VM parameters depend on the size of the database you want to migrate. The minimum configuration (1 core, 2 GB RAM, 10 GB disk space) should be sufficient to migrate a database that's up to 1 GB in size. The bigger the migrating database, the more RAM and the bigger the storage space you need (at least twice as large as the size of the database).

    The virtual machine must be in the same network and availability zone as the PostgreSQL cluster. Additionally, the VM must be assigned an external IP address so that you can load the dump from outside Yandex.Cloud.

  2. Install the PostgreSQL client and additional utilities for working with the DBMS:

    $ sudo apt install postgresql-client-common
    
    $ sudo apt install postgresql-client-10 # For PostgreSQL 10
    
    $ sudo apt install postgresql-client-11 # For PostgreSQL 11
    
  3. Move the DB dump to the VM. For example, you can use the scp utility:

    scp ~/db_dump.tar.gz <VM username>@<VM public address>:/tmp/db_dump.tar.gz
    
  4. Unpack the dump:

    tar -xzf /tmp/db_dump.tar.gz
    

Create a Managed Service for PostgreSQL cluster

Make sure that the computing power and storage size of the cluster are appropriate for the environment, where the existing databases are deployed, and create a cluster.

Restore data in the new environment

Use the pg_restore utility to restore your DB dump.

The version of pg_restore must match the pg_dump version, and the major version must be at least as high as on the DB where the dump is deployed. For example, pg_restore 10 should be used with PostgreSQL 10, but if you need to deploy the dump for PostgreSQL 11, use pg_restore 11 instead.

Best practice is to restore data with the --single-transaction flag to avoid an inconsistent state of the database if an error occurs:

pg_restore -Fd \
           -v \
           -h <pgsql_host_address> \
           -U <username>
           -d <database_name> \
           -p 6432 \
           /tmp/db_dump \
           --single-transaction \
           --no-privileges

If you only need to restore a single schema, add the -n <schema name> flag.

Fore more information about pg_restore, see the PostgreSQL documentation.