Transferring data from a PostgreSQL source endpoint

Written by

Improved by

Updated at April 12, 2024

Scenarios for transferring data from PostgreSQL
Preparing the source database
Configuring the PostgreSQL source endpoint
Configuring the data target
Operations with the database during transfer
Troubleshooting data transfer issues

Yandex Data Transfer enables you to migrate data from a PostgreSQL database and implement various scenarios of data transfer, processing and transformation. To implement a transfer:

Explore possible data transfer scenarios.
Prepare the PostgreSQL database for the transfer.
Set up an endpoint source in Yandex Data Transfer.
Set up one of the supported data targets.
Create a transfer and start it.
Perform required operations with the database and control the transfer.
In case of any issues, use ready-made solutions to resolve them.

Scenarios for transferring data from PostgreSQL

Migration: Moving data from one repository to another; it often suggests transferring a database to the cloud, from outdated local databases to managed cloud ones.
Data change capture means tracking changes to a database and delivering those changes to consumers. It is used for applications that are sensitive to real-time data changes.

Uploading data to data marts is a process of transferring prepared data to storage for subsequent visualization.
- Loading data from PostgreSQL to the ClickHouse® data mart.
Uploading data to scalable Object Storage storage allows you to save on data storage and simplifies the exchange with contractors.
- Loading PostgreSQL data to the Object Storage scalable storage.

For a detailed description of possible Yandex Data Transfer data transfer scenarios, see Tutorials.

Preparing the source database

Note

When performing a transfer from PostgreSQL to a target of any type, objects of the large object type will not get transferred.

When transferring data of the TIMESTAMP WITHOUT TIME ZONE type, the time zone set in the timezone parameter of the PostgreSQL source database applies.

Example

The column with the TIMESTAMP WITHOUT TIME ZONE data type contains the 1970-01-01 00:00:00 value. The way the value will be transferred depends on the timezone parameter in the database:

If the parameter is set to Etc/UTC, the time will be transferred as 1970-01-01 00:00:00+00.
If the parameter is set to Europe/Moscow, the time will be transferred as 1970-01-01 00:00:00+03.

Data stored in a MATERIALIZED VIEW is not transferred. To transfer MATERIALIZED VIEW data, create an ordinary VIEW that refers to the MATERIALIZED VIEW to be transferred.

If the definition of the VIEW to be transferred contains an invocation of the VOLATILE function, the transfer reads data from this VIEW with the READ UNCOMMITTED isolation level. No consistency between the VIEW data and the data of other objects being transferred is guaranteed. Reading data from a MATERIALIZED VIEW in the VIEW definition are equivalent to invoking the VOLATILE function.

Large objects in the TOAST storage system and those of the bytea type get transferred without restrictions.

Managed Service for PostgreSQL

PostgreSQL

Configure the user the transfer will use to connect to the source:
1. Create a user.
2. For the Replication and Snapshot and increment transfer types, assign the mdb_replication role to this user.
3. Connect to the database you want to migrate as the database owner and configure privileges:
  - SELECT for all the database tables to be transferred.
  - SELECT for all the database sequences to be transferred.
  - USAGE for the schemas of these tables and sequences.
  - ALL PRIVILEGES (CREATE and USAGE) to the __consumer_keeper and __data_transfer_mole_finder housekeeping table schema defined by the endpoint parameter if the endpoint is going to be used for the Replication or Snapshot and increment transfer types.
If the replication source is a cluster, enable the pg_tm_aux extension for it. This will allow replication to continue even after changing the master host. In some cases, a transfer may end in an error after you replace a master in your cluster. For more information, see Troubleshooting.
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred. To make sure the transfer is running properly when migrating a database with such tables:
- Do not transfer tables without primary keys. For this purpose, add them to the list of excluded tables in source endpoint settings.
- Add the replica ID to tables without primary keys:
  - For tables with an index, set REPLICA IDENTITY by unique key:
```
ALTER TABLE MY_TBL REPLICA IDENTITY USING INDEX MY_IDX;
```
  - For tables without an index, change REPLICA IDENTITY:
```
ALTER TABLE MY_TBL REPLICA IDENTITY FULL;
```
If there are no primary keys in a table, logical replication will not include any changes in the rows (UPDATE or DELETE).
- If, while trying to transfer tables from PostgreSQL to PostgreSQL, you do not exclude a table without primary keys on the transfer source, the following error message will be returned:
```
 failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed: "public"."MY_TBL": no key columns found
```
- If, while trying to transfer tables from PostgreSQL to a different database, you add a table without primary keys, the following error message will be returned:
```
failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed:
"public"."MY_TBL": no key columns found
```
Disable the transfer of external keys when creating a source endpoint. Recreate them once the transfer is completed.
Find and terminate DDL queries that are running for too long. To do this, make a selection from the PostgreSQL pg_stat_activity housekeeping table:
```
SELECT NOW() - query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle' ORDER BY 1 DESC;
```
This will return a list of queries running on the server. Pay attention to queries with a high duration value.
Deactivate trigger transfer at the transfer initiation stage and reactivate it at the completion stage (for the Replication and the Snapshot and increment transfer types). For more information, see the description of additional endpoint settings for the PostgreSQL source.
To enable parallel data reads from the table, set its primary key to serial mode.

Then specify the number of jobs and threads in the transfer parameters under Runtime environment.
Configure WAL monitoring. For Replication and Snapshot and increment transfers, logical replication is used. To perform it, the transfer creates a replication slot with the slot_name equal to the transfer ID that you can get by selecting the transfer from the list of your transfers. The WAL size may increase for different reasons: due to a long-running transaction or a transfer issue. Therefore, we recommend setting up WAL monitoring on the source side.
1. To monitor the size of the used storage or disk, set up an alert using the monitoring tools (see the disk.used_bytes description).
2. Set the maximum WAL size for replication in the Max slot wal keep size setting. The value of this setting can be edited as of PostgreSQL version 13. To urgently disable a transfer to perform data reads, delete the replication slot.
  
  Warning
  
  If set to -1 (unlimited size), you will not be able to delete WAL files due to open logical replication slots that information is not read from. As a result, the WAL files will take up the entire disk space and you will not be able to connect to the cluster.
3. Set up an alert with the Yandex Monitoring tools for the metric used for Total size of WAL files. Make sure the threshold values are less than those specified for the disk.used_bytes metric because, apart from the data, the disk stores temporary files, the WAL, and other types of data. You can monitor the current slot size by making a DB request with the correct slot_name equal to the transfer ID:
```
SELECT slot_name, pg_size_pretty(pg_current_wal_lsn() - restart_lsn), active_pid, catalog_xmin, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots WHERE slot_name = '<transfer_ID>'
```

Make sure the settings for the network hosting the cluster allow public connections from IP addresses used by Data Transfer.
Create a user account the transfer will use to connect to the source:
- For the Snapshot transfer type, create a user with the following command:
```
CREATE ROLE <username> LOGIN ENCRYPTED PASSWORD '<password>';
```
- For the Replication and Snapshot and increment transfer types, create a user with the REPLICATION privilege using this command:
```
CREATE ROLE <username> WITH REPLICATION LOGIN ENCRYPTED PASSWORD '<password>';
```
Grant the created user the privilege to perform SELECT on all the database tables to be transferred and the USAGE privilege for the schemas of these tables:
```
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <username>;
GRANT USAGE ON SCHEMA <schema_name> TO <username>;
```
Grant the created user the privileges to the __consumer_keeper and __data_transfer_mole_finder housekeeping table schema defined by the endpoint parameter if the endpoint is going to be used for the Replication or Snapshot and increment transfer types.
```
GRANT ALL PRIVILEGES ON SCHEMA <schema_name> TO <username>;
```
Install and enable the wal2json extension.

Installation
- Linux
  1. Add the PostgreSQL official repository for your distribution.
  2. Update the list of available packages and install the wal2json package for the active version of PostgreSQL:
- Windows 10, 11
  1. If you do not have Microsoft Visual Studio installed yet, download and install it. To build the wal2json extension, the Community Edition is sufficient. During installation, select the following components:
    - MSBuild
    - MSVC v141 x86/x64 build tools
    - C++\CLI support for v141 build tools
    - MSVC v141 - VS 2017 C++ x64\x86 build tools
    - MSVC v141 - VS 2017 C++ x64\x86 Spectre-mitigated libs
    - The latest version of the Windows SDK for the active OS version
    - Other dependencies that are installed automatically for selected components
    Take note of the version number of the installed Windows SDK. You will need it when you are asked to specify the wal2json build parameters.
  2. Download the wal2json source code from the project page.
  3. Unpack the archive with the source code to the C:\wal2json\ folder.
  4. Go to C:\wal2json.
  5. In one PowerShell session, make changes to the wal2json.vcxproj file:
    - Replace the lines C:\postgres\pg103 with the path to the installed PostgreSQL version, for example:
      (Get-Content .\wal2json.vcxproj).replace('C:\postgres\pg103', 'C:\PostgreSQL\14') | ` Set-Content .\wal2json.vcxproj
    - Replace the /MP extension parameter to /MT, for example:
      (Get-Content .\wal2json.vcxproj).replace('/MP', '/MT') | Set-Content .\wal2json.vcxproj
    - Specify the version number of the installed Windows SDK in <WindowsTargetPlatformVersion>:
      (Get-Content .\wal2json.vcxproj).replace('<WindowsTargetPlatformVersion>8.1', '<WindowsTargetPlatformVersion><installed_version_of_Windows_SDK>') | ` Set-Content .\wal2json.vcxproj
    1. Enter the value of the extension variable required for building wal2json. For example, for Visual Studio Community Edition 2022:
      $VCTargetsPath='C:\Program Files\Microsoft Visual Studio\2022\Comminuty\MSBuild\Microsoft\VC\v150'
    2. Run the build:
      & 'C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Current\Bin\MSBuild.exe' /p:Configuration=Release /p:Platform=x64
    3. Copy wal2json.dll from the build/release folder to the lib folder of the installed PostgreSQL version.
Configuration
1. In the postgresql.conf file, change the value of the wal_level parameter to logical:
```
wal_level = logical
```
2. Restart PostgreSQL.
If the replication source is a cluster, install and enable the pg_tm_aux extension on its hosts. This will allow replication to continue even after changing the master host. In some cases, a transfer may end in an error after you replace a master in your cluster. For more information, see Troubleshooting.
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred. To make sure the transfer is running properly when migrating a database with such tables:
- Do not transfer tables without primary keys. For this purpose, add them to the list of excluded tables in source endpoint settings.
- Add the replica ID to tables without primary keys:
  - For tables with an index, set REPLICA IDENTITY by unique key:
```
ALTER TABLE MY_TBL REPLICA IDENTITY USING INDEX MY_IDX;
```
  - For tables without an index, change REPLICA IDENTITY:
```
ALTER TABLE MY_TBL REPLICA IDENTITY FULL;
```
If there are no primary keys in a table, logical replication will not include any changes in the rows (UPDATE or DELETE).
- If, while trying to transfer tables from PostgreSQL to PostgreSQL, you do not exclude a table without primary keys on the transfer source, the following error message will be returned:
```
 failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed: "public"."MY_TBL": no key columns found
```
- If, while trying to transfer tables from PostgreSQL to a different database, you add a table without primary keys, the following error message will be returned:
```
failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed:
"public"."MY_TBL": no key columns found
```
Disable the transfer of external keys when creating a source endpoint. Recreate them once the transfer is completed.
Find and terminate DDL queries that are running for too long. To do this, make a selection from the PostgreSQL pg_stat_activity housekeeping table:
```
SELECT NOW() - query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle' ORDER BY 1 DESC;
```
This will return a list of queries running on the server. Pay attention to queries with a high duration value.
Deactivate trigger transfer at the transfer initiation stage and reactivate it at the completion stage (for the Replication and the Snapshot and increment transfer types). For more information, see the description of additional endpoint settings for the PostgreSQL source.
To enable parallel data reads from the table, set its primary key to serial mode.

Then specify the number of jobs and threads in the transfer parameters under Runtime environment.
If replication via Patroni is configured on the source, add an ignore_slots block to the source configuration:
```
ignore_slots:
  - database: <database>
    name: <replication_slot>
    plugin: wal2json
    type: logical
```
Where:
- database: Name of the database the transfer is configured for.
- name: Replication slot name.
The database and the replication slot names must match the values specified in the source endpoint settings. By default, the replication slot name is the same as the transfer ID.

Otherwise, the start of the replication phase will fail:
```
Warn(Termination): unable to create new pg source: Replication slotID <replication_slot_name> does not exist.
```
Configure WAL monitoring. For Replication and Snapshot and increment transfers, logical replication is used. To perform it, the transfer creates a replication slot with the slot_name equal to the transfer ID that you can get by selecting the transfer from the list of your transfers. The WAL size may increase for different reasons: due to a long-running transaction or a transfer issue. Therefore, we recommend setting up WAL monitoring on the source side.
1. Set up alerts following the disk usage recommendations.
2. Set the maximum WAL size. This feature is available starting with PostgreSQL version 13.
3. You can track the current slot size by making a DB request with the correct slot_name equal to the transfer ID:
```
SELECT slot_name, pg_size_pretty(pg_current_wal_lsn() - restart_lsn), active_pid, catalog_xmin, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots WHERE slot_name = '<transfer_ID>'
```

Note

For things to note about data transfer from PostgreSQL to ClickHouse® using Replication and Snapshot and increment transfers, see Asynchronously replicating data from PostgreSQL to ClickHouse®.

Configuring the PostgreSQL source endpoint

When creating or editing an endpoint, you can define:

Yandex Managed Service for PostgreSQL cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
Additional parameters.

Before you get started, check the Service specifics for PostgreSQL sources and targets.

Managed Service for PostgreSQL cluster

Warning

To create or edit an endpoint of a managed database, you need to have the managed-postgresql.viewer role or the viewer primitive role assigned for the folder where this managed database cluster resides.

Connecting to the database with the cluster ID specified in Yandex Cloud.

Management console

CLI

Terraform

API

Managed Service for PostgreSQL cluster: Specify ID of the cluster to connect to.
Database: Specify the name of the database in the selected cluster.
User: Specify the username that Data Transfer will use to connect to the database.
Password: Enter the user's password to the database.
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

Endpoint type: postgres-source.

--cluster-id: ID of the cluster you need to connect to.
--database: Database name.
--user: Username that Data Transfer will use to connect to the database.
To set a user password to access the DB, use one of the following parameters:
- --raw-password: Password as text.
- --password-file: The path to the password file.

Endpoint type: postgres_source.

connection.mdb_cluster_id: ID of cluster to connect to.
database — Database name.
user: Username that Data Transfer will use to connect to the database.
password.raw: Password in text form.

Here is an example of the configuration file structure:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    postgres_source {
      security_groups = ["<list_of_security_group_IDs>"]
      connection {
        mdb_cluster_id = "<cluster_ID>"
      }
      database = "<migrated_database_name>"
      user     = "<username_for_connection>"
      password {
        raw = "<user_password>"
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

mdbClusterId: ID of the cluster you need to connect to.
database: Database name.
user: Username that Data Transfer will use to connect to the database.
password.raw: Database user password (in text form).

Custom installation

For OnPremise, all fields are filled in manually.

Management console

CLI

Terraform

API

Host: Specify the IP address or FQDN of the master host. If the hosts are on different ports, you can specify multiple host values in the host:port format. If you choose this format, the value of the Port field will be disregarded.
Port: Set the number of the port that Data Transfer will use for the connection.
CA certificate: Upload the certificate file or add its contents as text if transmitted data must be encrypted, for example, to meet PCI DSS requirements.
Subnet ID: Select or create a subnet in the required availability zone.

If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
Database: Specify the name of the database in the selected cluster.
User: Specify the username that Data Transfer will use to connect to the database.
Password: Enter the user's password to the database.
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

Endpoint type: postgres-source.

--host: IP address or FQDN of the master host you want to connect to.
--port: Number of the port that Data Transfer will use for the connection.
--ca-certificate — If the transmitted data needs to be encrypted, for example, to meet the requirements of PCI DSS.
--subnet-id: ID of the subnet the host resides in.
--database: Database name.
--user: Username that Data Transfer will use to connect to the database.
To set a user password to access the DB, use one of the following parameters:
- --raw-password: Password as text.
- --password-file: The path to the password file.

Endpoint type: postgres_source.

security_groups: Security groups for network traffic.

Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the VM with the database. For more information, see Networking in Yandex Data Transfer.

Security groups must belong to the same network as the subnet_id subnet, if the latter is specified.

Note

In Terraform, it is not required to specify a network for security groups.
on_premise.hosts: List of IPs or FQDNs of hosts to connect to. Since only single-item lists are supported, specify the master host address.
on_premise.port: Port number that Data Transfer will use for connections.
on_premise.tls_mode.enabled.ca_certificate: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements.
on_premise.subnet_id: ID of the subnet where the host is located.
database — Database name.
user: Username that Data Transfer will use to connect to the database.
password.raw: Password in text form.

Here is an example of the configuration file structure:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    postgres_source {
      security_groups = ["<list_of_security_group_IDs>"]
      connection {
        on_premise {
          hosts = ["<list_of_hosts>"]
          port  = <port_for_connection>
        }
      }
      database = "<migrated_database_name>"
      user     = "<username_for_connection>"
      password {
        raw = "<user_password>"
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

onPremise: Database connection parameters:
- hosts — IP address or FQDN of the master host to connect to.
- port: The number of the port that Data Transfer will use for the connection.
- tlsMode: Parameters of encryption of transmitted data if it is required, for example, to meet the requirements of PCI DSS.
- subnetId: ID of the subnet the host resides in.
database: Database name.
user: Username that Data Transfer will use to connect to the database.
password.raw: Database user password (in text form).

Additional settings

Management console

CLI

Terraform

API

Table filter:
- Included tables: Data is only transferred from listed tables.
  
  When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
- Excluded tables: Data from the listed tables is not transferred.
The lists include the name of the schema (description of DB contents, structure, and integrity constraints) and the table name. Both lists support expressions in the following format:
- <schema_name>.<table_name>: Fully qualified table name.
- <schema_name>.*: All tables in the specified schema.
Table names must match this regular expression:
```
^"?[-_a-zA-Z0-9.]+"?\\."?[$-_a-zA-Z0-9.*]+"?$
```
Double quotes within a table name are not supported. Outer quotes are only used as delimiters and will be deleted when processing paths.

Warning

The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
Schema migration: If required, select the DB schema elements to transfer when activating or deactivating a transfer.
Advanced settings:
- Maximum WAL size for a replication slot: Maximum size of Write-Ahead Log kept in replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB. This setting does not prevent disk overflow in the source database. You can only use it for PostgreSQL version below 13, and we recommend monitoring the WAL slot value in the source database.
- Database schema for auxiliary tables: Specify the name of the schema to store service tables (__consumer_keeper and __data_transfer_mole_finder).
The schema name must match this regular expression:
```
^[-_a-zA-Z0-9]*$
```
Double quotes in a schema name are not supported.
- Merge inherited tables: Select to merge the contents of tables. For more information, see Specifics of working with endpoints.
- Parallel table copying settings: If necessary, set detailed settings for parallel copying of tables (if parallel copying parameters are set in the transfer).

--include-table: List of included tables. If this is on, the data will only be transferred from the tables in this list.

When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
--exclude-table: List of excluded tables. Data from the listed tables will not be transferred.

The lists include the name of the schema (description of DB contents, structure, and integrity constraints) and the table name. Both lists support expressions in the following format:
- <schema_name>.<table_name>: Fully qualified table name.
- <schema_name>.*: All tables in the specified schema.
Warning

The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
--slot-lag-limit: Maximum size of the write-ahead log kept in the replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB.
--service-schema: Name of the DB schema for service tables.
Schema transfer settings:
- --transfer-before-data: At the initial stage of the transfer.
- --transfer-after-data: At the final stage of the transfer.

include_tables: List of included tables. If this is on, the data will only be transferred from the tables in this list.

When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
exclude_tables: List of excluded tables. Data from the listed tables will not be transferred.

The lists include the name of the schema (description of DB contents, structure, and integrity constraints) and the table name. Both lists support expressions in the following format:
- <schema_name>.<table_name>: Fully qualified table name.
- <schema_name>.*: All tables in the specified schema.
Warning

The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
slot_gigabyte_lag_limit: Maximum size of Write-Ahead Log kept in replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB.
service_schema: DB schema name for housekeeping tables.
object_transfer_settings: Schema transfer settings:
- sequence: Sequences.
- sequence_owned_by: User sequences.
- table: Tables.
- primary_key: Primary keys.
- fk_constraint: Foreign keys.
- default_values: Default values.
- constraint: Constraints.
- index: Indexes.
- view: Views.
- function: Functions.
- trigger: Triggers.
- type: Types.
- rule: Rules.
- collation: Collation rules.
- policy: Policies.
- cast: Type casts.
You can specify one of the following values for each entity:
- BEFORE_DATA: Move when activating the transfer.
- AFTER_DATA: Move when deactivating the transfer.
- NEVER: Do not move.

For more information, see the Terraform provider documentation.

includeTables: List of included tables. If this is on, the data will only be transferred from the tables in this list.

When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
excludeTables: List of excluded tables. Data from the listed tables will not be transferred.

The lists include the name of the schema (description of DB contents, structure, and integrity constraints) and the table name. Both lists support expressions in the following format:
- <schema_name>.<table_name>: Fully qualified table name.
- <schema_name>.*: All tables in the specified schema.
Warning

The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
slotByteLagLimit: Maximum size of the write-ahead log kept in the replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB.
serviceSchema: Name of the DB schema for service tables.
objectTransferSettings: Settings for transferring the schema at the initial and final stages of the transfer (BEFORE_DATA and AFTER_DATA values, respectively).

Settings for transferring a DB schema when enabling and disabling a transfer

Note

The default settings of the source endpoint allow you to successfully perform a transfer for most databases. Change the settings of the initial and final stages of the transfer only if it is necessary.

During a transfer, the database schema is transferred from the source to the target. The transfer is performed in two stages:

At the activation stage.

This step is performed when the transfer is activated before copying or replicating data to create a schema on the target. You can choose which parts of the schema will be migrated. By default, this is TABLE, VIEW, PRIMARY KEY, SEQUENCE, SEQUENCE OWNED BY, RULE, TYPE, FUNCTION, and DEFAULT.
At the deactivation stage.

This step is performed at the end of the transfer operation when it is deactivated. If the transfer keeps running in replication mode, the final stage of the transfer will be performed only when replication stops. You can choose which parts of the schema will be migrated.

At the final stage, it is assumed that when the transfer is deactivated, there is no writing load on the source. You can ensure this by switching to read-only mode. At this stage, the database schema on the target is brought to a state where it will be consistent with the schema on the source.

It's recommended to include resource-intensive operations like index migration in the final stage of migration. Migrating indexes at the beginning of the transfer can slow it down.

The transfer of the schema at both the initial and final stages is performed using the pg_dump utility.

Note

When editing the settings of an endpoint of the transfer in the Replicating status, the table schemas on the target are preserved. In this case, only the schemas of the tables that are missing on the target at the time of restart are transferred to the target.

Replication cannot guarantee that sequence values are preserved, so we recommend updating the sequences on the target.

Configuring the data target

Configure one of the supported data targets:

For a complete list of supported sources and targets in Yandex Data Transfer, see Available Transfers.

After configuring the data source and target, create and start the transfer.

Operations with the database during transfer

Tip

The PostgreSQL replication protocol does not support transferring schema changes. Avoid changing the data schema in the source and target databases during the transfer. If this cannot be avoided, perform explicit checks on the target.

For Snapshot and Snapshot and increment transfers:

In the Copying status, changes to the data schema on the source and target are not allowed.
In the Replicating status, any changes to the data schema on the source should be manually applied to the target, otherwise the transfer will not be able to continue.

For example, suppose we added a new column to the test_table table on the source:
```
ALTER TABLE test_table ADD COLUMN val2 TEXT;
```
If writing to this table continues, the transfer will not be able to insert data on the target. For replication to continue, run a similar query to change the data schema on the target:
```
ALTER TABLE test_table ADD COLUMN val2 TEXT;
```
After that, the transfer can continue.

Troubleshooting data transfer issues

Known issues when using a PostgreSQL endpoint:

Stopping a transfer's master transaction session.
Exceeding the connection time-to-live quota.
VIEW transfer error.
Error when adding a table entry by constraint.
Error when migrating all schema tables.
Low transfer speed.
Unable to transfer child tables.
Insufficient replication slots in a source database.
No data transfer after changing a source endpoint.
Transfer error when changing a master host.
Error when transferring nested transactions.
Error transferring tables with deferred constraints.
Cannot create a replication slot at the activation step.
Excessive WAL size increase.
Error when replicating from an external source.
Error when transfering tables without primary keys.
Error when dropping a table under the Drop cleanup policy.

See a full list of recommendations in the Troubleshooting section.

Stopping a transfer's master transaction session

Error message:

Cannot set transaction snapshot:
ERROR: invalid snapshot identifier: "<snapshot_ID>" (SQLSTATE 22023).

Possible causes include:

The source is running a cron job or other application that periodically terminates sessions that are too long.
Someone manually terminated the master transaction.
The source's CPU resources are insufficient to execute a query.

Solution: Disable this cron job and add more CPU resources to the source. After you make the changes, re-activate the transfer.

Exceeding the connection time-to-live quota

Yandex Managed Service for PostgreSQL has a connection time-to-live quota of 12 hours.

Solution: If transferring a database requires more time, [change the Session duration timeout setting of the Yandex Managed Service for PostgreSQL cluster](../../../../managed-postgresql/operations/update.md#change-postgresql-config).

VIEW transfer error

Error message:

[ERROR] "... _view": no key columns found

Can't replicate new data from Views. Under PostgreSQL — PostgreSQL transfers, only those views are transferred which are listed in the Table filter → Included tables source endpoint parameter.

Solution: manually exclude all views from the transfer, list them in the Table filter → Included tables source endpoint parameter, and then reactivate the transfer.

Error when adding a table entry by constraint

Solution: prepare the source as described in Preparing for the transfer.

Error when transferring all schema tables

Error message:

Unable to apply DDL of type 'TABLE', name '<schema>'.'<table>', error:
ERROR: schema "<schema name>" does not exist (SQLSTATE 3F000)

A transfer fails if tables of a specific schema are listed as <schema>.*. This occurs due to the specifics of pg_dump that is used for a schema transfer. When specifying tables of the entire schema in <schema>.* format in the source endpoint parameter Table filter → Included tables, the PostgreSQL types from this schema aren't extracted even if it contains tables that depend on these types.

Solution: Create PostgreSQL types in the target database manually.

Low transfer speed

Snapshot or Snapshot and increment transfers from PostgreSQL to PostgreSQL may slow down.

Possible cause:

Write protocol.

Under normal operation, a transfer is performed via the fast copy protocol. However, if there's a batch write conflict, it switches to slow line-by-line writes. The more write conflicts, the lower the transfer speed.

Solution: set the target endpoint's cleanup policy type to Drop and exclude other writing processes.
Parallel table reading.

Parallel reading is only available for tables that contain a primary key in serial mode.

Solution: Set up parallel copy and reactivate the transfer.

Unable to transfer child tables

Child tables are either not transferred or transferred without data if a table is partitioned.

Solution: set the following source endpoint parameters:

Select Merge inherited tables in advanced settings.
In the Included tables field, specify all child tables to transfer data from.
Make sure the user has access to the child tables.

To speed up the transfer of child tables, set up parallel copy.

Insufficient replication slots in a source database

Error message:

Warn(Activate): failed to create a replication slot "<transfer_ID>" at source:
failed to create a replication slot:
failed to create a replication slot:
ERROR: all replication slots are in use
(SQLSTATE 53400)

Solution: Increase the number of replication slots in the source database (by default, 10).

No data transfer after changing a source endpoint

After adding tables to the List of included tables in the source endpoint parameters, a data transfer restarted and stopped.

Solution:

Create tables in the target database manually.

1. Create new tables with a Primary key and without a Foreign key in the target database.
2. Add the new tables to the List of included tables in the source endpoint parameters.
3. Transfer a dump with historical data to the target database.
4. If there are errors in the logs, fix them depending on a specific error.
5. If there are no errors but the logs are empty, contact technical support or your account manager to dump goroutines. This may help resume replication without restarting the transfer.
Deactivate and reactivate the transfer.
Create a separate transfer of the Snapshot type for the new tables. Deactivating the original transfer is not required.

Transfer error when changing a master host

The error occurs in Replication or Snapshot and increment transfers due to required Write-Ahead Log (WAL) parts missing. This occurs when the Write Ahead Log (WAL) logical replication lag between the current master and a replica is greater than the allowable WAL size on other hosts, so, when switching from the master to the replica, the replication slot can't sync to the WAL on the new master.

Solution: set a limit in the additional target endpoint parameter Maximum WAL size for the replication slot and reactivate the transfer.

Error when transferring nested transactions

Transfers of PostgreSQL below version 14 don't support a transfer of tables with committed transactions that are nested more than 1024 times with replication changes at each nesting level. The degree of nesting depends on the number of nested begin; .. commit; statements.

Solution:

Use PostgreSQL 14 or higher.
Exclude transactions with a nesting level like this from a transfer.

Error transferring tables with deferred constraints

The error arises in the transfers of type Replication or Snapshot and increment because updating tables and transactions with DEFERRABLE constraints isn't supported by Data Transfer. To learn more about deferred constraints, see the PostgreSQL documentation.

Solution: Change the constraint type in such tables to IMMEDIATE and activate the transfer again.

Could not create a replication slot at the activation step

In the beginning of the transfer, one or more replication slots are created in the source database. The database objects are locked at this point. If some object is locked by another transaction, this results in a competing lock, which will terminate the transfer with an error.

Solution:

Get the PID of the process that competes for locks with the transfer:

/* Get PID of the transfer */
SELECT active_pid
  FROM pg_replication_slots
  WHERE slot_name = '<transfer_ID>';

/* search the PID of the locking process */
SELECT pid, pg_blocking_pids(pid) as blocked_by
  FROM pg_stat_activity
  WHERE cardinality(pg_blocking_pids(pid)) > 0;

        pid      | blocked_by
-----------------+-------------------
 <transfer_PID> | {<locking_transaction_PID>}
(1 row)

Look up the locking query:

SELECT query, username
  FROM pg_stat_activity
  WHERE pid = <locking_transaction_PID>;

(Optional) Stop the transaction by the command:

SELECT pg_terminate_backend(<locking_transaction_PID>);

Reactivate the transfer.

Excessive WAL size increase

In the PostgreSQL source database, the size of the Write-Ahead Log (WAL) may reach up to dozens of GB due to the queries running for over five minutes. These queries block the WAL and do not allow it to move forward and reset.

You can see the WAL size has gone up when:

The space used by the source database has increased.
The Read buffer size chart in Data Transfer monitoring is rising.

Solution:

Find the query sessions running for more than five minutes:

SELECT now()-xact_start, query, pid FROM pg_stat_activity
WHERE (now()-xact_start)>'5minute'::interval AND STATE != 'idle'
ORDER BY 1 DESC;

Terminate the sessions found. Make sure to avoid such queries moving forward.

Error when replicating data from an external source

Error message:

[XX000] ERROR: could not connect to the publisher:
SSL error: certificate verify failed FATAL:
no pg_hba.conf entry for replication connection
from host "<PostgreSQL_host_IP_address>", user "postgres", SSL off

Solution: Prepare the source as described in Preparing for the transfer.

Error when transfering tables without primary keys

Error message:

Primary key check failed: 14 Tables errors: Table no key columns found

For Replication and Snapshot and increment transfers, tables without primary keys are not transferred.

Solution: Prepare the source as described in Preparing for the transfer.

Error when dropping a table under the Drop cleanup policy

Error message:

ERROR: cannot drop table <table_name> because other objects depend on it (SQLSTATE 2BP01)

With the Drop cleanup policy, a transfer drops tables in multiple iterations:

The transfer tries to drop all tables one by one. Cascade delete is not used, as it may delete tables that are not included in the transfer. If a table cannot be dropped, for example, due to its links to external keys, an error occurs, but the transfer will continue dropping tables.
During the next iteration, the transfer will try to drop the remaining tables. If the blocking child tables were dropped in the previous iteration, the table linked to external keys will also be dropped. In this case, the error is fixed while Data Transfer is running and no other actions are required.
If the transfer does not drop any table during another iteration, table dropping stops. In which case:
- The transfer will continue running if all tables were dropped.
- The transfer will be aborted with an error if there are any tables left.

Solution:

If child tables are not involved in other transfers and their transfer is not in conflict with the transfer goals, add these tables:
- To the list of included tables in the source endpoint parameters.
- To the list of objects to be transferred in the transfer parameters.
Delete the blocking child tables in the target database manually.
Use the Truncate cleanup policy.
Recreate the target database.

Warning

This will lead to losing all data in the database.

Transferring data from a PostgreSQL source endpoint

Scenarios for transferring data from PostgreSQLScenarios for transferring data from PostgreSQL

Preparing the source databasePreparing the source database

Configuring the PostgreSQL source endpointConfiguring the PostgreSQL source endpoint

Managed Service for PostgreSQL clusterManaged Service for PostgreSQL cluster

Custom installationCustom installation

Additional settingsAdditional settings

Settings for transferring a DB schema when enabling and disabling a transferSettings for transferring a DB schema when enabling and disabling a transfer

Configuring the data targetConfiguring the data target

Operations with the database during transferOperations with the database during transfer

Troubleshooting data transfer issuesTroubleshooting data transfer issues

Stopping a transfer's master transaction sessionStopping a transfer's master transaction session

Exceeding the connection time-to-live quotaExceeding the connection time-to-live quota

VIEW transfer errorVIEW transfer error

Error when adding a table entry by constraintError when adding a table entry by constraint

Error when transferring all schema tablesError when transferring all schema tables

Low transfer speedLow transfer speed

Unable to transfer child tablesUnable to transfer child tables

Insufficient replication slots in a source databaseInsufficient replication slots in a source database

No data transfer after changing a source endpointNo data transfer after changing a source endpoint

Transfer error when changing a master hostTransfer error when changing a master host

Error when transferring nested transactionsError when transferring nested transactions

Error transferring tables with deferred constraintsError transferring tables with deferred constraints

Could not create a replication slot at the activation stepCould not create a replication slot at the activation step

Excessive WAL size increaseExcessive WAL size increase

Error when replicating data from an external sourceError when replicating data from an external source

Error when transfering tables without primary keysError when transfering tables without primary keys

Error when dropping a table under the Drop cleanup policyError when dropping a table under the Drop cleanup policy

Was the article helpful?

Scenarios for transferring data from PostgreSQL

Preparing the source database

Configuring the PostgreSQL source endpoint

Managed Service for PostgreSQL cluster

Custom installation

Additional settings

Settings for transferring a DB schema when enabling and disabling a transfer

Configuring the data target

Operations with the database during transfer

Troubleshooting data transfer issues

Stopping a transfer's master transaction session

Exceeding the connection time-to-live quota

VIEW transfer error

Error when adding a table entry by constraint

Error when transferring all schema tables

Low transfer speed

Unable to transfer child tables

Insufficient replication slots in a source database

No data transfer after changing a source endpoint

Transfer error when changing a master host

Error when transferring nested transactions

Error transferring tables with deferred constraints

Could not create a replication slot at the activation step

Excessive WAL size increase

Error when replicating data from an external source

Error when transfering tables without primary keys

Error when dropping a table under the Drop cleanup policy