Yandex.Cloud
  • Services
  • Why Yandex.Cloud
  • Solutions
  • Pricing
  • Documentation
  • Contact us
Get started
Yandex Data Proc
  • Use cases
    • Configuring networks for Data Proc clusters
    • Using Apache Hive
    • Running Spark applications
    • Running applications from a remote host
    • Copying files from Yandex Object Storage
  • Step-by-step instructions
    • All instructions
    • Creating clusters
    • Connecting to clusters
    • Updating subclusters
    • Managing subclusters
    • Deleting clusters
  • Concepts
    • Data Proc overview
    • Host classes
    • Hadoop and component versions
    • Component interfaces and ports
    • Component web interfaces
    • Auto scaling
    • Decommissioning subclusters and hosts
    • Network in Data Proc
    • Quotas and limits
  • Access management
  • Pricing policy
  • API reference
    • Authentication in the API
    • gRPC
      • Overview
      • ClusterService
      • JobService
      • ResourcePresetService
      • SubclusterService
      • OperationService
    • REST
      • Overview
      • Cluster
        • Overview
        • create
        • delete
        • get
        • list
        • listHosts
        • listOperations
        • listUILinks
        • start
        • stop
        • update
      • Job
        • Overview
        • create
        • get
        • list
        • listLog
      • ResourcePreset
        • Overview
        • get
        • list
      • Subcluster
        • Overview
        • create
        • delete
        • get
        • list
        • update
  • Questions and answers
  1. Step-by-step instructions
  2. Connecting to clusters

Connecting to Data Proc clusters

  • Connecting to a host via SSH Data Proc

After you create a Data Proc cluster, you can connect to the host of the main subcluster.

Cluster hosts cannot be assigned a public IP address, so use a VM from the same cloud network to connect to them.

  1. Create a new VM if necessary.
  2. Connect to the VM via SSH.
  3. You should also use SSH to connect to the host of the main subcluster from your VM.

Connecting to a host via SSH Data Proc

To connect to a Data Proc host from your VM, make sure the SSH key that you specified when creating the Data Proc cluster is accessible on it. You can copy the key to the VM or connect to it with an SSH agent.

  1. Run the SSH agent locally:

    $ eval `ssh-agent -s`
    
  2. Add the required key to the list of those available to the agent:

    $ ssh-add ~/.ssh/example-key
    
  3. Open an SSH connection to the Data Proc host for the root user, for example:

    $ ssh root@rc1b-dataproc-m-fh4y4nur0i0uqqkz.mdb.yandexcloud.net
    
    root@rc1b-dataproc-m-fh4y4nur0i0uqqkz:~#
    
  4. Make sure that Hadoop commands are executed, for example:

    ~# hadoop version
    
    Hadoop 2.8.5
    Subversion https://github.yandex-team.ru/mdb/bigtop.git -r 78508f2a4b4f3dc8b3d295ccb50a45a4d24e81b5
    Compiled by robot-pgaas-ci on 2019-04-16T10:35Z
    Compiled with protoc 2.5.0
    From source with checksum 9942ca5c745417c14e318835f420733
    This command was run using /usr/lib/hadoop/hadoop-common-2.8.5.jar
    
Language / Region
Careers
Privacy policy
Terms of use
Brandbook
© 2021 Yandex.Cloud LLC