Connecting to Data Proc clusters
After you create a Data Proc cluster, you can connect to the host of the main subcluster.
Cluster hosts cannot be assigned a public IP address, so use a VM from the same cloud network to connect to them.
- Create a new VM if necessary.
- Connect to the VM via SSH.
- You should also use SSH to connect to the host of the main subcluster from your VM.
Connecting to a host via SSH Data Proc
To connect to a Data Proc host from your VM, make sure the SSH key that you specified when creating the Data Proc cluster is accessible on it. You can copy the key to the VM or connect to it with an SSH agent.
-
Run the SSH agent locally:
$ eval `ssh-agent -s`
-
Add the required key to the list of those available to the agent:
$ ssh-add ~/.ssh/example-key
-
Open an SSH connection to the Data Proc host for the
root
user, for example:$ ssh root@rc1b-dataproc-m-fh4y4nur0i0uqqkz.mdb.yandexcloud.net root@rc1b-dataproc-m-fh4y4nur0i0uqqkz:~#
-
Make sure that Hadoop commands are executed, for example:
~# hadoop version Hadoop 2.8.5 Subversion https://github.yandex-team.ru/mdb/bigtop.git -r 78508f2a4b4f3dc8b3d295ccb50a45a4d24e81b5 Compiled by robot-pgaas-ci on 2019-04-16T10:35Z Compiled with protoc 2.5.0 From source with checksum 9942ca5c745417c14e318835f420733 This command was run using /usr/lib/hadoop/hadoop-common-2.8.5.jar