Yandex Data Proc release notes
Written by
Updated at January 24, 2024
This section contains Yandex Data Proc release notes.
Labels next to the revision description let you see in what interface it is supported: the management console, CLI, API, Terraform, or SQL.
Q2 2023
Q3 2022
- Added support for new settings
in theDataprocCreateClusterOperator
Airflow operator. - Added
cpu-optimized
host classes with 2:1 GB RAM to vCPU ratio. The new configurations are only available for Intel Ice Lake. - Published a guide for using initialization scripts to set up GeeseFS.
Q2 2022
- Image version 2.1 available.
- Added the ability to enable public internet access for subclusters of all types.
Management console
CLI
API
- Lightweight Spark is available starting with image version 2.0.39. You can now create a cluster without data storage subclusters because YARN and SPARK services are no longer dependent on HDFS.
- Added support for initialization scripts in the CLI.
CLI
Q1 2022
- You can now create clusters on non-replicated network drives up to 8 TB. Non-replicated drives are much simpler than standard network SSD storage, which makes them perform several times faster.
- Added the ability to cancel a job.
Management console
CLI
- Added the build number in image version Yandex Data Proc.
- Added the ability to pass the
packages
,repositories
, andexclude_packages
parameters for Spark and PySpark jobs. By using these parameters, you can download additional dependencies and packages from external repositories.Management console
CLI