Expanding a cluster
You can expand a Managed Service for Greenplum® cluster by adding segment hosts to it. You need to add at least two hosts.
When a cluster is being expanded, its data is automatically redistributed across new segments. This is done for each table in sequence during the timeout specified in the expansion settings. The table is not available for read and write operations during data redistribution.
The gp_expand
utility is used to expand a cluster. For more information about the utility and its modes of operation, see the Greenplum® documentation
Add segment hosts
-
Go to the folder page
and select Managed Service for Greenplum. -
Select a cluster and open the
Hosts tab. -
Click Expand cluster in the top-right corner.
-
Specify cluster expansion settings:
- Number of new segment hosts: Number of segment hosts to add. The minimum value is
2
. - Add segments to host: Number of segments being added per host. The maximum value depends on the host class.
- Redistribution timeout: Timeout for data redistribution across the new segments, in seconds. For
0
(recommended value), the timeout will be selected automatically based on cluster configuration and the amount of data.
- Number of new segment hosts: Number of segment hosts to add. The minimum value is
-
Click Save.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To add segment hosts to a Greenplum® cluster:
-
View the description of the CLI cluster expand command:
yc managed-greenplum cluster expand --help
-
Specify the cluster expansion parameters in the command:
yc managed-greenplum cluster expand <cluster_name_or_ID> \ --segment-host-count <number_of_added_segment_hosts> \ --add-segments-per-host-count <number_of_segments_added_to_host> \ --duration-seconds <data_redistribution_timeout_in_seconds>
Where:
--segment-host-count
: Number of segment hosts to add. The minimum (default) value is2
.--add-segments-per-host-count
: Number of segments added per host. The maximum value depends on the host class. The default value is0
.--duration-seconds
: Timeout for data redistribution across the new segments, in seconds. For0
(recommended value set by default), the timeout will be selected automatically based on cluster configuration and the amount of data.
You can get the cluster ID and name with a list of clusters in the folder.
To add segment hosts, use the expand REST API method for the Cluster resource or the ClusterService/Expand gRPC API call and provide the following in the request:
- Cluster ID in the
clusterId
parameter. - Number of segment hosts to add in
segmentHostCount
. - Number of segments per host to add in
addSegmentsPerHostCount
. - The data redistribution timeout (in seconds) in the
duration
parameter. For0
(recommended value set by default), the timeout will be selected automatically based on cluster configuration and the amount of data.
You can get the cluster ID with a list of clusters in the folder.
Warning
A low redistribution timeout value (less than two hours) may be insufficient to redistribute the data of all cluster tables. In this case, restart redistribution.
Data redistribution monitoring
To monitor data redistribution to the new segments, connect to the postgres
database and execute the following statement on behalf of the user with the mdb_admin
role:
SELECT dbname, fq_name, status, expansion_started, source_bytes FROM gpexpand.status_detail;
Result:
dbname | fq_name | status | expansion_started | source_bytes
-----------+-------------------------------------+-------------+----------------------------+-------------
diskquota | diskquota_namespace.database_list | NOT STARTED | | 0
postgres | public.rnd_nocomp_distrnd_ao_res3 | NOT STARTED | | 52558742480
postgres | public.rnd_nocomp_distrnd_ao1 | COMPLETED | 2022-09-06 12:44:36.71759 | 13013536
postgres | public.rnd_nocomp_distrnd_ao_res2 | IN PROGRESS | 2022-09-06 13:03:29.231359 | 63070490912
(4 rows)
The current redistribution status will be specified in the status
column.
Table redistribution priority
To specify the tables whose data should be redistributed first of all, increase their priority. To do this, connect to the postgres
database and run the following query as the user with the mdb_admin
role:
UPDATE gpexpand.status_detail SET rank=1 WHERE fq_name IN (<table_list>);
Starting data redistribution
Unknown error: Partially Distributed Data
in cluster runtime logs means that some table data was not redistributed to all new segments. This may occur if the specified redistribution timeout expired before completing redistribution of data from all tables. To fix this error:
-
Find the tables that were redistributed partially:
SELECT count(*) FROM gp_distribution_policy WHERE numsegments != <segment_count>;
Where
segment_count
is the total number of segments in all Greenplum® cluster segment hosts. -
Start data redistribution:
-
For ordinary tables:
ALTER TABLE ONLY <table_name> EXPAND TABLE;
-
For partitioned tables:
ALTER TABLE <table_name> SET WITH (REORGANIZE=true) <distribution_policy>;
Where
distribution_policy
is the Greenplum® distribution policy.To get the Greenplum® distribution policy for the selected table's partition, call the embedded function:
SELECT pg_get_table_distributedby(<partition_OID>) AS distribution_policy;
-
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.