Yandex Data Proc releases
Labels next to the revision description let you see in what interface it is supported: the management console, CLI, API, Terraform, or SQL.
- Added support for new settings in the
cpu-optimizedhost classes with 2:1 GB RAM to vCPU ratio. The new configurations are only available for Intel Ice Lake.
- Published a guide for using initialization scripts to set up GeeseFS.
- Image version 2.1 available.
- Added the ability to enable public internet access for subclusters of all types.
- Lightweight Spark is available starting with image version 2.0.39. You can now create a cluster without data storage subclusters because YARN and SPARK services are no longer dependent on HDFS.
- Added support for initialization scripts in the CLI.
- You can now create clusters on non-replicated network drives up to 8 TB. Non-replicated drives are much simpler than standard network SSD storage, which makes them perform several times faster.
- Added the ability to cancel a job.
- Added the build number in image version Data Proc.
- Added the ability to pass the
exclude_packagesparameters for Spark and PySpark jobs. By using these parameters, you can download additional dependencies and packages from external repositories.