Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
© 2022 Yandex.Cloud LLC
Yandex Data Proc
  • Practical guidelines
    • Working with jobs
      • Overview
      • Working with Hive jobs
      • Working with MapReduce jobs
      • Working with PySpark jobs
      • Working with Spark jobs
      • Using Apache Hive
      • Running Spark applications
      • Running applications from a remote host
    • Configuring networks for Data Proc clusters
    • Using Yandex Object Storage in Data Proc
  • Step-by-step instructions
    • All instructions
    • Information about existing clusters
    • Creating clusters
    • Connecting to clusters
    • Editing clusters
    • Updating subclusters
    • Managing subclusters
    • Sqoop usage
    • Managing jobs
      • All jobs
      • Spark jobs
      • PySpark jobs
      • Hive jobs
      • MapReduce jobs
    • Deleting clusters
    • Monitoring the state of a cluster and hosts
    • Working with logs
  • Concepts
    • Data Proc overview
    • Host classes
    • Hadoop and component versions
    • Component interfaces and ports
    • Component web interfaces
    • Jobs in Data Proc
    • Autoscaling
    • Decommissioning subclusters and hosts
    • Network in Data Proc
    • Quotas and limits
    • Storage in Data Proc
    • Component properties
    • Logs in Data Proc
    • Initialization scripts
  • Access management
  • Pricing policy
  • API reference
    • Authentication in the API
    • gRPC
      • Overview
      • ClusterService
      • JobService
      • ResourcePresetService
      • SubclusterService
      • OperationService
    • REST
      • Overview
      • Cluster
        • Overview
        • create
        • delete
        • get
        • list
        • listHosts
        • listOperations
        • listUILinks
        • start
        • stop
        • update
      • Job
        • Overview
        • cancel
        • create
        • get
        • list
        • listLog
      • ResourcePreset
        • Overview
        • get
        • list
      • Subcluster
        • Overview
        • create
        • delete
        • get
        • list
        • update
  • Releases
    • Images
  • Questions and answers
  1. Practical guidelines
  2. Configuring networks for Data Proc clusters

Configuring networks for Data Proc

Written by
Yandex Cloud
  • Egress NAT
  • Using a NAT instance and static routes

According to the network concept in Yandex Cloud, hosts without public IP addresses in Data Proc clusters can't access resources outside of the VPC virtual network. To interact with nodes from other networks, Yandex Cloud service interfaces, and internet nodes, you need to set up a public IP address for the host or use egress NAT for the subnet.

Egress NAT

The ability to set up egress NAT for VPC subnets is at the Preview stage. If this feature isn't available to you for now, the management console lets you fill out a request to access it.

To enable egress NAT for a subnet:

  1. Log in to management console.
  2. Select VPC in the proper folder.
  3. Click on the line of the subnet you need.
  4. Click in the subnet line and select Enable egress NAT.

Using a NAT instance and static routes

When using a NAT instance, all traffic passes through an additional VM (the NAT instance):

  • You can monitor all outbound traffic, and when necessary, you can also deploy a VPN between the Yandex Cloud subnet and the resources you need.
  • This creates additional expenses on the NAT instance and the bandwidth limit on a single VM port can become substantial.

To use a NAT configuration, you need two virtual subnets: one subnet hosts the Data Proc clusters and the second hosts the VM with the public IP address.

Examples of subnets:

  • dataproc-net — Subnet hosting the Data Proc clusters with CIDR 192.168.1.0/24.
  • dataproc-nat-net — Subnet for the NAT instance with CIDR 192.168.100.0/24.

To get access from the dataproc-net subnet to the external resources:

  1. In the dataproc-nat-net network, create a virtual machine based on the NAT instance image with a public IP address.

  2. Copy the internal IP address of this VM.

  3. On the subnet network page, create a routing table named nat.

  4. Add a static route to the routing table:

    • Destination prefix — 0.0.0.0/0
    • Next hop — Internal IP address of the NAT instance.
  5. On the network page, link the nat routing table to the dataproc-net subnet.

  6. Enable routing on the NAT instance by adding the following lines to the file /etc/sysctl.conf:

    net.ipv4.ip_forward = 1
    net.ipv4.conf.all.accept_redirects = 1
    net.ipv4.conf.all.send_redirects = 1
    
  7. To enable execution of /etc/rc.local on boot, use the commands:

    $ sudo systemctl enable rc-local
    $ sudo touch /etc/rc.local
    $ sudo chmod 755 /etc/rc.local
    
  8. Add the following code to the end of /etc/rc.local:

    #!/bin/sh
    
    iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
    
  9. Restart the VM:

    $ sudo reboot -f
    

To check whether you configured NAT correctly, run the following command on the NAT instance:

$ curl ifconfig.co

If the configuration is correct, the command outputs the public IP address of the NAT instance.

Was the article helpful?

Language / Region
© 2022 Yandex.Cloud LLC
In this article:
  • Egress NAT
  • Using a NAT instance and static routes