Creating a Greenplum® cluster
A Managed Service for Greenplum® cluster consists of master hosts that accept client queries and segment hosts that provide data processing and storage capability.
Available disk types depend on the selected host class.
For more information, see Resource relationships.
How to create a Managed Service for Greenplum® cluster
-
In the management console
, select the folder where you want to create a database cluster. -
Select Managed Service for Greenplum.
-
Click Create cluster.
-
Enter a name for the cluster. It must be unique within the folder.
-
(Optional) Enter a cluster description.
-
Select the environment where you want to create the cluster (you cannot change the environment once the cluster is created):
PRODUCTION
: For stable versions of your apps.PRESTABLE
: For testing purposes. The prestable environment is similar to the production environment and likewise covered by the SLA, but it is the first to get new functionalities, improvements, and bug fixes. In the prestable environment, you can test compatibility of new versions with your application.
-
Select the Greenplum® version.
-
(Optional) Select groups of dedicated hosts to host the cluster on.
Alert
You cannot edit this setting after you create a cluster. The use of dedicated hosts significantly affects cluster pricing.
-
Under Network settings:
-
Select the cloud network for the cluster.
-
In the Security groups parameter, specify the security group that contains the rules allowing all incoming and outgoing traffic over any protocol from any IP address.
Alert
For a Managed Service for Greenplum® cluster to work properly, at least one of its security groups must have rules allowing all incoming and outgoing traffic from any IP address.
- Select the availability zone and subnet for the cluster. To create a new subnet, click Create new next to the availability zone.
- Select Public access to allow accessing the cluster from the internet.
-
-
(Optional) For clusters with Greenplum® version 6.25 or higher, enable the Hybrid storage option.
It activates the Yezzey extension
from Yandex Cloud. This extension is used to export AO and AOCO tables from disks within the Managed Service for Greenplum® cluster to cold storage in Object Storage. This way, the data will be stored in a service bucket in a compressed and encrypted form. This is a more cost-efficient storage method.You cannot disable this option after you save your cluster settings.
Note
This functionality is at the Preview stage and is free of charge.
-
Specify the admin user settings. This special user is required for managing the cluster and cannot be deleted. For more information, see Users and roles in Managed Service for Greenplum®.
-
Username may contain Latin letters, numbers, hyphens, and underscores, and may not start with a hyphen. It must be from 1 to 32 characters long.
Note
Such names as
admin
,gpadmin
,mdb_admin
,mdb_replication
,monitor
,none
,postgres
,public
, andrepl
are reserved for Managed Service for Greenplum®. You cannot create users with these names. -
Password must be from 8 to 128 characters long.
-
-
Configure additional cluster settings, if required:
-
Backup start time (UTC): Time interval during which the cluster backup starts. Time is specified in 24-hour UTC format. The default time is
22:00 - 23:00
UTC. -
Maintenance window: Maintenance window settings:
- To enable maintenance at any time, select arbitrary (default).
- To specify the preferred maintenance start time, select by schedule and specify the desired day of the week and UTC hour. For example, you can choose a time when the cluster is least loaded.
Maintenance operations are carried out both on enabled and disabled clusters. They may include updating the DBMS, applying patches, and so on.
-
DataLens access: Allows you to analyze cluster data in Yandex DataLens.
-
Data Transfer access: Enable this option to allow access to the cluster from Yandex Data Transfer in Serverless mode.
This will enable you to connect to Yandex Data Transfer running in Kubernetes via a special network. It will also cause other operations to run faster, such as transfer launch and deactivation.
-
Deletion protection: Manages cluster protection from accidental deletion by a user.
Cluster deletion protection will not prevent a manual connection to delete the contents of a database.
-
-
(Optional) Configure the operating mode and connection pooler parameters under Connection pooler:
- Mode:
SESSION
(default) orTRANSACTION
. - Size: Maximum number of client connections.
- Client Idle Timeout: Client idle time (in ms), after which the connection will be terminated.
- Mode:
-
(Optional) Under Managing background processes, edit the parameters of scheduled maintenance operations:
- Start time (UTC):
VACUUM
start time. The default value is19:00 UTC
. Once theVACUUM
operation is completed, theANALYZE
operation starts. - VACUUM timeout: Maximum
VACUUM
execution time, in seconds. Valid values: from7,200
to86,399
, with36,000
by default. As soon as this period expires,VACUUM
will be forced to terminate. - ANALYZE timeout: Maximum
ANALYZE
execution time, in seconds. Valid values: from7,200
to86,399
, with36,000
by default. As soon as this period expires, theANALYZE
operation will be forced to terminate.
The combined
VACUUM
andANALYZE
execution time may not exceed 24 hours. - Start time (UTC):
-
Specify the master host parameters on the Master tab. For the recommended configuration, see Calculating the cluster configuration.
-
Host class: Defines technical properties of the virtual machines on which the cluster master hosts will be deployed.
-
Under Storage:
-
Select the disk type.
The selected type determines the increment that you can change your storage size in:
- Non-replicated SSD storage: In increments of 93 GB.
- Local SSD storage:
- For Intel Cascade Lake: In increments of 100 GB.
- For Intel Ice Lake: In increments of 368 GB.
- Network SSD and HDD storage: In increments of 1 GB.
-
-
-
Specify the parameters of segment hosts on the Segment tab. For the recommended configuration, see Calculating the cluster configuration.
- Number of segment hosts.
- Number of segments per host. The maximum value of this parameter depends on a host class.
- Host class: Defines technical properties of the virtual machines on which the cluster segment hosts will be deployed.
- Under Storage:
-
Select the disk type.
The selected type determines the increment that you can change your storage size in:
- Non-replicated SSD storage: In increments of 93 GB.
- Local SSD storage:
- For Intel Cascade Lake: In increments of 100 GB.
- For Intel Ice Lake: In increments of 368 GB.
- Network SSD and HDD storage: In increments of 1 GB.
-
-
If required, configure DBMS cluster-level settings.
-
Click Create.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To create a cluster:
-
Check whether the folder has any subnets for the cluster hosts:
yc vpc subnet list
If there are no subnets in the folder, create the required subnets in VPC.
-
View a description of the create cluster CLI command:
yc managed-greenplum cluster create --help
-
Specify cluster parameters in the create command (the list of supported parameters in the example is not exhaustive):
yc managed-greenplum cluster create <cluster_name> \ --greenplum-version=<Greenplum_version> \ --environment=<environment> \ --network-name=<network_name> \ --user-name=<username> \ --user-password=<user_password> \ --master-config resource-id=<host_class>,` `disk-size=<storage_size_GB>,` `disk-type=<disk_type> \ --segment-config resource-id=<host_class>,` `disk-size=<storage_size_GB>,` `disk-type=<disk_type> \ --zone-id=<availability_zone> \ --subnet-id=<subnet_ID> \ --assign-public-ip=<public_access_to_hosts> \ --security-group-ids=<list_of_security_group_IDs> \ --deletion-protection=<cluster_deletion_protection>
Note
The cluster name must be unique within a folder. It may contain Latin letters, numbers, hyphens, and underscores. The name may be up to 63 characters long.
Where:
-
--greenplum-version
: Greenplum® version, 6.19. -
--environment
: Environment:PRODUCTION
: For stable versions of your apps.PRESTABLE
: For testing purposes. The prestable environment is similar to the production environment and likewise covered by the SLA, but it is the first to get new functionalities, improvements, and bug fixes. In the prestable environment, you can test compatibility of new versions with your application.
-
--network-name
: Network name. -
--user-name
: Username. It may contain Latin letters, numbers, hyphens, and underscores, and must start with a letter, a number, or an underscore. It must be from 1 to 32 characters long. -
--user-password
: Password. It must be from 8 to 128 characters long. -
--master-config
and--segment-config
: Master and segment host configurations:resource-id
: Host class.disk-size
: Storage size in GB.disk-type
: Disk type:network-hdd
network-ssd
local-ssd
network-ssd-nonreplicated
-
zone-id
: Availability zone. -
subnet-id
: Subnet ID. Specify if two or more subnets are created in the selected availability zone. -
--assign-public-ip
: Flag to be set if public access to the host is needed,true
orfalse
. -
--security-group-ids
: List of security group IDs. -
--deletion-protection
: Cluster deletion protection,true
orfalse
.Cluster deletion protection will not prevent a manual connection to delete the contents of a database.
-
-
To set a backup start time, provide the required value in
HH:MM:SS
format in--backup-window-start
:yc managed-greenplum cluster create <cluster name> \ ... --backup-window-start=<backup_start_time>
-
To create a cluster hosted on groups of dedicated hosts, specify their IDs separated by commas in the
--host-group-ids
parameter:yc managed-greenplum cluster create <cluster name> \ ... --host-group-ids=<IDs_of_groups_of_dedicated_hosts>
Alert
You cannot edit this setting after you create a cluster. The use of dedicated hosts significantly affects cluster pricing.
-
To set up a maintenance window (including for disabled clusters), provide the required value in the
--maintenance-window
parameter when creating your cluster:yc managed-greenplum cluster create <cluster name> \ ... --maintenance-window type=<maintenance_type>,` `day=<day_of_week>,` `hour=<hour> \
Where
type
is the maintenance type:anytime
(default): Any time.weekly
: On a schedule. If setting this value, specify the day of week and the hour:day
: Day of week inDDD
format:MON
,TUE
,WED
,THU
,FRI
,SAT
, orSUN
.hour
: Hour (UTC) inHH
format:1
to24
.
-
To allow access from Yandex DataLens or Yandex Data Transfer, provide the
true
value in the corresponding parameters when creating a cluster:yc managed-greenplum cluster create <cluster name> \ ... --datalens-access=<access_from_DataLens> \ --datatransfer-access=<access_from_Data_Transfer>
Where:
--datalens-access
: Access from Yandex DataLens, true or false.--datatransfer-access
: Access from Yandex Data Transfer, true or false.
{% include [terraform-definition](../../_tutorials/_tutorials_includes/terraform-definition.md) %}
To create a cluster:
-
Using the command line, navigate to the folder that will contain the Terraform configuration files with an infrastructure plan. Create the directory if it does not exist.
-
If you don't have Terraform, install it and configure the Yandex Cloud provider.
-
Create a configuration file describing the cloud network and subnets.
The cluster is hosted on a cloud network. If you already have a suitable network, you do not need to describe it again.
Cluster hosts are located on subnets of the selected cloud network. If you already have suitable subnets, you do not need to describe them again.
Example structure of a configuration file that describes a cloud network with a single subnet:
resource "yandex_vpc_network" "<network_name_in_Terraform>" { name = "<network_name>" } resource "yandex_vpc_subnet" "<subnet_name_in_Terraform>" { name = "<subnet_name>" zone = "<availability_zone>" network_id = yandex_vpc_network.<network_name_in_Terraform>.id v4_cidr_blocks = ["<subnet>"] }
-
Create a configuration file with a description of the cluster and its hosts.
Here is an example of the configuration file structure:
resource "yandex_mdb_greenplum_cluster" "<cluster_name_in_Terraform>" { name = "<cluster_name>" environment = "<environment>" network_id = yandex_vpc_network.<network_name_in_Terraform>.id zone = "<availability_zone>" subnet_id = yandex_vpc_subnet.<subnet_name_in_Terraform>.id assign_public_ip = <public_access_to_cluster_hosts> deletion_protection = <cluster_deletion_protection> version = "<Greenplum_version>" master_host_count = <number_of_master_hosts> segment_host_count = <number_of_segment_hosts> segment_in_host = <number_of_segments_per_host> master_subcluster { resources { resource_preset_id = "<host_class>" disk_size = <storage_size_in_GB> disk_type_id = "<disk_type>" } } segment_subcluster { resources { resource_preset_id = "<host_class>" disk_size = <storage_size_in_GB> disk_type_id = "<disk_type>" } } pxf_config { connection_timeout = <read_request_timeout> upload_timeout = <write_request_timeout> max_threads = <maximum_number_of_Apache_Tomcat®_threads> pool_allow_core_thread_timeout = <whether_timeout_for_streaming_threads_is_permitted> pool_core_size = <number_of_streaming_threads> pool_queue_capacity = <capacity_of_pool_queue_for_streaming_threads> pool_max_size = <maximum_number_of_streaming_threads> xmx = <initial_size_of_JVM_heap> xms = <maximum_size_of_JVM_heap> } user_name = "<username>" user_password = "<password>" security_group_ids = ["<list_of_security_group_IDs>"] }
Where:
-
assign_public_ip
: Public access to cluster hosts, true or false. -
deletion_protection
: Cluster deletion protection, true or false. -
version
: Greenplum® version. -
master_host_count
: Number of master hosts, 1 or 2. -
segment_host_count
: Number of segment hosts, between 2 and 32. -
pxf_config
: Greenplum® Platform Extension Framework (PXF) settings. This is a software platform to access the data in external DBMS's.pxf_config
settings match those in the Greenplum® pxf-application.properties configuration file. It describes the PXF features. To configure them, use the Yandex Cloud tools rather than edit the file.PXF settings:
-
connection_timeout
: Timeout for connection to the Apache Tomcat® server when making read requests, in seconds. The values may range from5
to600
. -
upload_timeout
: Timeout for connection to the Apache Tomcat® server when making write requests, in seconds. The values may range from5
to600
. -
max_threads
: Maximum number of the Apache Tomcat® threads. The values may range from1
to1024
.To prevent situations when requests get stuck or fail due to running out of memory or malfunctioning of the Java garbage collector, specify the number of the Apache Tomcat® threads. See more information on how to adjust the number of threads in the VMware Greenplum® Platform Extension Framework
documentation. -
pool_allow_core_thread_timeout
: Determines whether a timeout for core streaming threads is permitted or not. The default value isfalse
. -
pool_core_size
: Number of core streaming threads per pool. The parameter takes positive integer values. -
pool_queue_capacity
: Maximum number of requests you can add to a pool queue for core streaming threads. The values may range from zero upward. If0
, no pool queue is generated. -
pool_max_size
: Maximum allowed number of core streaming threads. The values may range from1
to1024
. -
xmx
: Initial size of the JVM heap for the PXF daemon. The values may range from64
to16384
. -
xms
: Maximum size of the JVM heap for the PXF daemon. The values may range from64
to16384
.
-
Cluster deletion protection will not prevent a manual connection to delete the contents of a database.
For more information about the resources you can create with Terraform, see the provider documentation
. -
-
Check the Terraform configuration files for errors:
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Create a cluster:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
To create a cluster, use the create REST API method for the Cluster resource or the ClusterService/Create gRPC API call and provide the following in the request:
-
ID of the folder where the cluster should be placed, in the
folderId
parameter. -
Cluster name in the
name
parameter. -
Cluster environment in the
environment
parameter. -
Greenplum® version in the
config.version
parameter. -
Username in the
userName
parameter. -
User password in the
userPassword
parameter. -
Network ID in the
networkId
parameter. -
Security group IDs in the
securityGroupIds
parameter. -
Configuration of master hosts in the
masterConfig
parameter. -
Configuration of segment hosts in the
segmentConfig
parameter.
Provide additional cluster settings, if required:
-
Public access settings in the
assignPublicIp
parameter. -
Backup window settings in the
config.backupWindowStart
parameter. -
Settings for access from Yandex DataLens, in the
config.access.dataLens
parameter. -
Settings for access from Yandex Data Transfer, in the
config.access.dataTransfer
parameter. -
Settings for the maintenance window (including those for disabled clusters) in the
maintenanceWindow
parameter. -
DBMS settings in the
configSpec.greenplumConfig_<version>
parameter. -
Scheduled maintenance operations settings in the
configSpec.backgroundActivities.analyzeAndVacuum
parameter. -
Cluster deletion protection settings in the
deletionProtection
parameter.Cluster deletion protection will not prevent a manual connection to delete the contents of a database.
Examples
Creating a cluster
Create a Managed Service for Greenplum® cluster with the following test characteristics:
-
Name:
gp-cluster
-
Version:
6.19
-
Environment:
PRODUCTION
-
Network:
default
-
User:
user1
-
Password:
user1user1
-
With master and segment hosts:
- Class:
S2.medium
- Local SSD storage (
local-ssd
): 100 GB
- Class:
-
Availability zone:
ru-central1-a
; subnet:b0rcctk2rvtr8efcch64
-
Public access to hosts: Allowed
-
Security group:
enp6saqnq4ie244g67sb
-
Protection against accidental cluster deletion: Enabled
Run the following command:
yc managed-greenplum cluster create \
--name=gp-cluster \
--greenplum-version=6.19 \
--environment=PRODUCTION \
--network-name=default \
--user-name=user1 \
--user-password=user1user1 \
--master-config resource-id=s2.medium,`
`disk-size=100,`
`disk-type=local-ssd \
--segment-config resource-id=s2.medium,`
`disk-size=100,`
`disk-type=local-ssd \
--zone-id=ru-central1-a \
--subnet-id=b0rcctk2rvtr8efcch64 \
--assign-public-ip=true \
--security-group-ids=enp6saqnq4ie244g67sb \
--deletion-protection=true
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.