Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
© 2022 Yandex.Cloud LLC
Yandex Data Proc
  • Practical guidelines
    • All practical guidelines
    • Working with jobs
      • Overview
      • Working with Hive jobs
      • Working with MapReduce jobs
      • Working with PySpark jobs
      • Working with Spark jobs
      • Running Apache Hive jobs
      • Running Spark applications
      • Running jobs from a remote host
    • Configuring networks for Data Proc
    • Using Yandex Object Storage in Data Proc
    • Exchanging data with Yandex Managed Service for ClickHouse
    • Importing data from Yandex Managed Service for MySQL clusters using Sqoop
    • Importing data from Yandex Managed Service for PostgreSQL clusters using Sqoop
    • Using initialization scripts to configure GeeseFS in Data Proc
  • Step-by-step instructions
    • All instructions
    • Information about existing clusters
    • Creating clusters
    • Connecting to a cluster
    • Updating clusters
    • Managing subclusters
    • Updating subclusters
    • Connecting to component interfaces
    • How to use Sqoop
    • Managing jobs
      • All jobs
      • Spark jobs
      • PySpark jobs
      • Hive jobs
      • MapReduce jobs
    • Deleting clusters
    • Working with logs
    • Monitoring the state of clusters and hosts
  • Concepts
    • Relationships between service resources
    • Host classes
    • Runtime environment
    • Data Proc component interfaces and ports
    • Jobs in Data Proc
    • Automatic scaling
    • Decommissioning subclusters and hosts
    • Network in Data Proc
    • Maintenance
    • Quotas and limits
    • Storage in Data Proc
    • Component properties
    • Logs in Data Proc
    • Initialization scripts
  • Access management
  • Pricing policy
  • API reference
    • Authentication in the API
    • gRPC
      • Overview
      • ClusterService
      • JobService
      • ResourcePresetService
      • SubclusterService
      • OperationService
    • REST
      • Overview
      • Cluster
        • Overview
        • create
        • delete
        • get
        • list
        • listHosts
        • listOperations
        • listUILinks
        • start
        • stop
        • update
      • Job
        • Overview
        • cancel
        • create
        • get
        • list
        • listLog
      • ResourcePreset
        • Overview
        • get
        • list
      • Subcluster
        • Overview
        • create
        • delete
        • get
        • list
        • update
  • Revision history
    • Service updates
    • Images
  • Questions and answers
  1. Concepts
  2. Runtime environment

Runtime environment

Written by
Yandex Cloud
  • Current images
  • Deprecated images

When creating a Data Proc cluster, you can choose the image version (versions of components).

Below is a list of current and deprecated Data Proc images. Each image version includes conda, pip (Python environment managers), and a collection of pre-installed libraries.

Note

Data Proc does not support automatic OS or software updates. For stable and reliable cluster performance, check for and install updates manually on a regular basis. This requires connecting to your cluster hosts over SSH.

Current images

Components Image 1.4 Image 2.0
Component versions
Hadoop 2.10.0 3.2.2
Tez 0.9.2 0.10.0
Hive 2.3.6 3.1.2
Zookeeper 3.4.14 3.4.14
HBase 1.3.5 2.2.7
Sqoop 1.4.7 —
Oozie 5.2.0 5.2.1
Spark 2.4.6 3.0.2
Flume 1.9.0 —
Zeppelin 0.8.2 0.9.0
Livy 0.7.0 0.8.0
Versions of Python and machine learning libraries
Python 3.7.9 3.8.10
PyArrow 0.13.0 1.0.1
ipykernel 5.1.3 5.3.4
TensorFlow 1.15.0 —
CatBoost 0.20.2 —
PyHive 0.6.1 0.6.1
LightGBM 2.3.0 —
XGBoost 0.90 —
scikit-learn 0.21.3 0.23.2
pandas 0.25.3 1.1.3
IPython 7.9.0 7.19.0
Matplotlib 3.1.1 3.2.2

Deprecated images

Note

These images are deprecated. We recommend using the latest image versions.
Existing clusters will continue to run, but you won't be able to create new clusters with deprecated versions.

Components Image 1.0 Image 1.1 Image 1.2 Image 1.3
Component versions
Hadoop 2.8.5 2.10.0 2.10.0 2.10.0
Tez 0.9.1 0.9.2 0.9.2 0.9.2
Hive 2.3.4 2.3.6 2.3.6 2.3.6
Zookeeper 3.4.6 3.4.14 3.4.14 3.4.14
HBase 1.3.3 1.3.5 1.3.5 1.3.5
Sqoop 1.4.6 1.4.7 1.4.7 1.4.7
Oozie 4.3.0 4.3.1 5.2.0 5.2.0
Spark 2.2.1 2.4.4 2.4.6 2.4.6
Flume 1.8.0 1.8.0 1.9.0 1.9.0
Zeppelin 0.7.3 0.8.2 0.8.2 0.8.2
Livy — — — 0.7.0
Versions of Python and machine learning libraries
Python 3.7 3.7.5 3.7.7 3.7.9
PyArrow 0.11.1 0.13.0 0.13.0 0.13.0
ipykernel 5.1.0 5.1.3 5.1.3 5.1.3
TensorFlow 1.13.1 1.15.0 1.15.0 1.15.0
CatBoost 0.14.2 0.20 0.20 0.20.2
PyHive — 0.6.1 0.6.1 0.6.1
LightGBM 2.2.3 2.3.0 2.3.0 2.3.0
XGBoost 0.82 0.90 0.90 0.90
scikit-learn 0.21.1 0.21.3 0.21.3 0.21.3
pandas 0.24.2 0.25.3 0.25.3 0.25.3
IPython 7.5.0 7.9.0 7.9.0 7.9.0
Matplotlib 3.0.3 3.1.1 3.1.1 3.1.1

Was the article helpful?

Language / Region
© 2022 Yandex.Cloud LLC
In this article:
  • Current images
  • Deprecated images