Creating clusters Data Proc
In management console, select the folder where you want to create a cluster.
Click Create resource and select Data Proc cluster from the drop-down list.
Enter the cluster name in the Cluster name field. The cluster name must be unique within the folder.
Select a relevant image version and the components you want to use in the cluster.
Note that some components require other components to work. For example, to use Spark, you need YARN.
Enter the public part of your SSH key in the Public key field. For information about how to generate and use SSH keys, see the Yandex Compute Cloud documentation.
Select or create a service account that you want to grant access to the cluster.
Select the availability zone for the cluster.
If necessary, set the properties of Hadoop and its components, for example:
hdfs:dfs.replication : 2 hdfs:dfs.blocksize : 1073741824 spark:spark.driver.cores : 1
The available properties are listed in the official documentation for the components:
Select or create a network for the cluster.
Configure subclusters: no more than one main subcluster with a Master host and subclusters for data storage or computing.
DATANODEsubcluster roles are different: you can deploy data storage components on
DATANODEsubclusters and data processing components on
COMPUTENODEsubclusters. Storage on a
COMPUTENODEsubcluster is only used to temporarily store processed files.
For each subcluster, you can configure:
- The number of hosts.
- The host class is the platform and computing resources available to the host.
- Storage size and type.
- The subnet of the network where the cluster is located.
Once you have configured all of the subclusters you need, click Create cluster.
Data Proc runs the create cluster operation. After the cluster status changes to Running, you can connect to any active subcluster using the specified SSH key.