Component interfaces and ports in Data Proc
Custom web interfaces of some Data Proc components, such as Hadoop, Spark, Yarn, and Zeppelin, are available on the master cluster host. These interfaces can be used:
- To manage and monitor cluster resources: YARN Resource Manager and HDFS Name Node.
- To view job statuses and debug jobs: Spark History and JobHistory.
- For collaboration, experiments, or ad-hoc operations: Apache Zeppelin.
Data Proc enables you to create clusters accessible from the internet or only from a cloud network. However, we recommend making service component interfaces inaccessible from outside Yandex Cloud in any configuration. You can connect to Data Proc component interfaces either using UI Proxy or an intermediate virtual machine.
UI proxy is a mechanism that lets you proxy cluster component interfaces with HTTP traffic encryption and authentication via the Yandex Cloud IAM. To access the interfaces, the user must be logged into Yandex Cloud, have cluster view permissions and the dataproc.user
role.
UI Proxy is disabled by default. To take advantage of UI Proxy, enable it when creating or configuring a cluster and view a list of web interfaces available for connection.
Warning
You may need to additionally set up security groups to use UI Proxy (this feature is in the Preview stage).
Security groups are at the Preview stage. If they are unavailable on your network, all incoming and outgoing traffic will be allowed for the resources. No additional setup is required.
To enable security groups, request access to this feature from the support team.
Components and ports
Service | Port |
---|---|
HBase Master | 16010 |
HBase REST | 8085 |
HDFS Name Node | 9870 |
Hive Server2 | 10002 |
MapReduce Application History | 19888 |
Oozie | 11000 |
Spark History | 18080 |
YARN Application History | 8188 |
YARN Resource Manager | 8088 |
Zeppelin | 8890 |