Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Blog
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
Yandex Data Transfer
  • Available transfers
  • Getting started
  • Step-by-step guide
    • All instructions
    • Preparing for the transfer
    • Configuring endpoints
      • Endpoint management
      • Configuring source endpoints
        • Apache Kafka®
        • AWS CloudTrail
        • BigQuery
        • ClickHouse
        • Eventhub
        • Greenplum®
        • MongoDB
        • MySQL
        • Oracle
        • PostgreSQL
        • S3
        • Yandex Data Streams
        • Yandex Managed Service for YDB
      • Configuring target endpoints
    • Managing the transfer process
    • Working with databases during the transfer
    • Monitoring the transfer status
  • Practical guidelines
  • Concepts
  • Troubleshooting
  • Access management
  • Pricing policy
  • API reference
  • Questions and answers
  1. Step-by-step guide
  2. Configuring endpoints
  3. Configuring source endpoints
  4. Greenplum®

Configuring a Greenplum® source endpoint

Written by
Yandex Cloud
  • Managed Service for Greenplum® cluster
  • Custom installation
  • Additional settings
  • Specifics of working with the Greenplum source
    • Snapshot consistency

When creating or editing an endpoint, you can define:

  • Yandex Managed Service for Greenplum® cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
  • Additional parameters.

Managed Service for Greenplum® cluster

Connecting to the database with the cluster ID specified in Yandex Cloud. Available only for clusters deployed in Yandex Managed Service for Greenplum®.

Management console
  • MDB cluster ID: Select the cluster to connect to.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

    This will let you apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Network in Yandex Data Transfer.

  • Database: Specify the name of the database in the selected cluster.

  • Username: Specify the username that Data Transfer will use to connect to the database.

  • Password: Enter the user's password to the database.

  • CA certificate: Upload the certificate file or add its contents as text if encryption of the transmitted data is required, for example, to meet the PCI DSS requirements.

  • Subnet ID: Select or create a subnet in the desired availability zone.

    If the source and target are geographically close, connecting over the selected subnet speeds up the transfer.

Custom installation

Connecting to the database with explicitly specified network addresses and ports.

Management console
  • Coordinator host: Specify the IP or FQDN of the primary master host to connect to.

  • Coordinator port: Specify the port for Data Transfer to use to connect to the primary master host.

  • Coordinator mirror host: Specify the IP or FQDN of the backup master host to connect to (optional if there is only one master host).

  • Coordinator mirror port: Specify the port for Data Transfer to use to connect to the backup master host (optional if there is only one master host).

  • Greenplum® cluster segments: Specify segment host connection information. If you omit these, segment host addresses will be retrieved automatically from the master host housekeeping table.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

    This will let you apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Network in Yandex Data Transfer.

  • Database: Specify the name of the database in the selected cluster.

  • Username: Specify the username that Data Transfer will use to connect to the database.

  • Password: Enter the user's password to the database.

  • CA certificate: Upload the certificate file or add its contents as text if encryption of the transmitted data is required, for example, to meet the PCI DSS requirements.

  • Subnet ID: Select or create a subnet in the desired availability zone.

    If the source and target are geographically close, connecting over the selected subnet speeds up the transfer.

Additional settings

Management console
  • List of included tables: Data is only transferred from the listed tables.

    If a table is partitioned, you can use this field to specify both the entire table and individual partitions.

    Make sure that, for tables to be included in the list, all the necessary privileges are granted to the user on whose behalf data will be transferred.

    When you add new tables when editing an endpoint used in Snapshot and increment or Increment transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects to be transferred field in the transfer settings.

  • List of excluded tables: Data from these listed tables is not transferred.

    If a table is partitioned, to exclude it from the list, make sure to list all of its partitions.

    Both lists support expressions in the following format:

    • <schema name>.<table name>: Fully qualified table name.
    • <schema name>.*: All tables in the specified schema.
    • <table name>: Table in the default schema.
  • Snapshot consistency: When enabled, Data Transfer will apply additional steps to the source to assure snapshot consistency.

  • Auxiliary object schema: A schema for placing auxiliary objects of the transfer.

Specifics of working with the Greenplum source

Data Transfer only supports Greenplum® version 6. Greenplum® versions 4 and 5 are not supported.

The service performs operations with a Greenplum® cluster with the READ COMMITTED level of isolation.

Data Transfer supports operation with activated sharded copy for a Greenplum® source.

During operation with enabled sharded copy, Data Transfer maintains an open transaction on the Greenplum® master host. If this transaction is interrupted, a transfer will return an error.

With sharded copy disabled, a transfer will move data from such Greenplum® objects as TABLE, VIEW, FOREIGN TABLE, and EXTERNAL TABLE. Data from these objects will be treated as data from ordinary tables and processed by the target accordingly. With activated sharded copy, a transfer will only move tables (TABLE objects). Tables with the DISTRIBUTED REPLICATED allocation policy are not transferred.

Snapshot consistency

When starting a transfer with disabled sharded copy (default), the service creates the copy working only with the Greenplum® cluster's master host. The tables being copied are accessed in ACCESS SHARE lock mode. Snapshot consistency is achieved through Greenplum® mechanisms.

When starting a transfer with sharded copy enabled, the service will create the copy working both with the Greenplum® cluster's master host and segment hosts in utility mode. Access to the tables to be copied locks the tables in ACCESS SHARE or SHARE mode depending on the "Snapshot consistency" setting.

To guarantee snapshot consistency, transfers with sharded copy enabled need to assure that data in the tables being transferred remains static. For ACCESS SHARE locks (default), the service does not guarantee that the data will remain static: this must be assured externally. For SHARE locks, the Greenplum® mechanisms guarantee that data in the source tables remains static.

Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc in the United States and/or other countries.

Was the article helpful?

Language / Region
Yandex project
© 2023 Yandex.Cloud LLC
In this article:
  • Managed Service for Greenplum® cluster
  • Custom installation
  • Additional settings
  • Specifics of working with the Greenplum source
  • Snapshot consistency