Yandex.Cloud
  • Services
  • Why Yandex.Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Yandex Data Proc
  • Use cases
    • Configuring networks for Data Proc clusters
    • Using Apache Hive
    • Running Spark applications
    • Running applications from a remote host
    • Copying files from Yandex Object Storage
  • Step-by-step instructions
    • All instructions
    • Creating clusters
    • Connecting to clusters
    • Updating subclusters
    • Managing subclusters
    • Deleting clusters
  • Concepts
    • Data Proc overview
    • Host classes
    • Hadoop and component versions
    • Component interfaces and ports
    • Component web interfaces
    • Auto scaling
    • Decommissioning subclusters and hosts
    • Network in Data Proc
    • Quotas and limits
  • Access management
  • Pricing policy
  • API reference
    • Authentication in the API
    • gRPC
      • Overview
      • ClusterService
      • JobService
      • ResourcePresetService
      • SubclusterService
      • OperationService
    • REST
      • Overview
      • Cluster
        • Overview
        • create
        • delete
        • get
        • list
        • listHosts
        • listOperations
        • listUILinks
        • start
        • stop
        • update
      • Job
        • Overview
        • create
        • get
        • list
        • listLog
      • ResourcePreset
        • Overview
        • get
        • list
      • Subcluster
        • Overview
        • create
        • delete
        • get
        • list
        • update
  • Questions and answers
  1. Step-by-step instructions
  2. Creating clusters

Creating Data Proc clusters

    Management console
    1. In the management console, select the folder where you want to create a cluster.

    2. Click Create resource and select Data Proc cluster from the drop-down list.

    3. Enter the cluster name in the Cluster name field. The cluster name must be unique within the folder.

    4. Select a relevant image version and the components you want to use in the cluster.

      Note

      Note that some components require other components to work. For example, to use Spark, you need YARN.

    5. Enter the public part of your SSH key in the Public key field. For information about how to generate and use SSH keys, see the Yandex Compute Cloud documentation.

    6. Select or create a service account that you want to grant access to the cluster.

    7. Select the availability zone for the cluster.

    8. If necessary, set the properties of Hadoop and its components, for example:

      hdfs:dfs.replication : 2
      hdfs:dfs.blocksize : 1073741824
      spark:spark.driver.cores : 1
      

      The available properties are listed in the official documentation for the components:

      • Hadoop
      • HDFS
      • YARN
      • MapReduce
      • Spark
      • Flume 1.8.0
      • HBASE
      • HIVE
      • SQOOP
      • Tez 0.9.1
      • Zeppelin 0.7.3
      • ZooKeeper 3.4.6
    9. Select or create a network for the cluster.

    10. Enable the UI Proxy option to access the web interfaces of the components Data Proc.

    11. Configure subclusters: no more than one main subcluster with a Master host and subclusters for data storage or computing.

      The roles of the Compute and Data subclusters are different: you can deploy data storage components on Data and data processing components on Compute subclusters. Storage on a Compute subcluster is only used to temporarily store processed files.

    12. For each subcluster, you can configure:

      • The number of hosts.
      • The host class is the platform and computing resources available to the host.
      • Storage size and type.
      • The subnet of the network where the cluster is located.
    13. For Compute subclusters, you can specify the auto scaling parameters.

    14. After you configure all the subclusters you need, click Create cluster.

    Data Proc runs the create cluster operation. After the cluster status changes to Running, you can connect to any active subcluster using the specified SSH key.

    Language
    Careers
    Privacy policy
    Terms of use
    © 2021 Yandex.Cloud LLC