Working with component network interfaces
Data Proc enables you to create clusters accessible from the internet or only from a cloud network. However, we recommend making service component interfaces inaccessible from outside Yandex Cloud in any configuration. To connect externally to components like HDFS NameNode and YARN ResourceManager, you can route traffic via an intermediate VM with a public IP address.
Port forwarding
To access the network interface of a component from the web, create an intermediate virtual machine in Yandex Compute Cloud.
Requirements for an intermediate VM:
- An assigned public IP address.
- Hosted in the same network as the required Data Proc cluster.
- Security group settings that allow traffic exchange with the cluster via the corresponding components' ports.
For step-by-step instructions on how to configure security groups for port forwarding, see Connecting to clusters Data Proc.
To connect to the desired Data Proc host port, run the following command:
ssh -A -J <VM public IP address> -L <port number>:<FQDN of Data Proc host>:<port number> root@<FQDN of Data Proc host>
You can find the FQDN of the Data Proc host on the Data Proc cluster page, in the Hosts tab, under the Hostname column.
The port numbers used for Data Proc components are given below.
Components and ports
Service | Port |
---|---|
HBase Master | 16010 |
HBase REST | 8085 |
HDFS Name Node | 9870 |
Hive Server2 | 10002 |
MapReduce Application History | 19888 |
Oozie | 11000 |
Spark History | 18080 |
YARN Application History | 8188 |
YARN Resource Manager | 8088 |
Zeppelin | 8890 |