Added the ability to keep user-defined properties of the Zeppelin interpreter when restarting a cluster. The values of the spark.submit.deployMode, spark.driver.cores, spark.driver.memory, spark.executor.cores, spark.executor.memory, spark.files, spark.jars, and spark.jars.packages properties are not preserved, they are overwritten from Spark properties.

2.0.56

Optimized requests to the metadata service when interacting with s3.

2.0.55

Improved logging in the initialization scripts.

2.0.54

Fixed errors in the TEZ component configuration.

2.0.53

Fixed an error that occurred in the cores/memory configuration for Spark/Yarn when specifying the spark:spark.submit.deployMode cluster property.
Fixed the spark-defaults.yaml configuration file update when updating the cluster properties.

2.0.52

Added a script to hosts for adjusting the initialization script status manually.

2.0.50

The results of running user scenarios are now sent to the masternode by default.

2.0.49

Fixed an error when user-defined settings were ignored in Hive Metastore Server.

2.0.48

Added the ability to use Apache Spark Thrift Server. For more information, see Using Apache Spark Thrift Server.
Corrected the YandexMetadataCredentialsProvider does not implement AWSCredentialsProvider error that might have appeared on lightweight Apache Spark configurations.

2.0.47

Corrected a TCP session leak with the metadata service on high-load clusters. The leak could have resulted in an IAM token not updating for authorization in Object Storage and other services.
Corrected the YandexMetadataCredentialsProvider does not implement AWSCredentialsProvider error that caused tables from Hive Metastore not to load.

2.0.46

Some Spark properties are also used in Zeppelin, such as spark.submit.deployMode, spark.driver.cores, spark.driver.memory, spark.executor.cores, spark.executor.memory, spark.files, spark.jars, and spark.jars.packages.

2.0.45

Fixed an error with the MapReduce Application History Server not being hosted on the cluster master host.
Enabled the HIVE configuration without YARN.
Allowed running HiveServer2 with MapReduce only.

2.0.43

Unified cores/memory calculations for Spark/YARN.

2.0.42

Upgraded Apache Spark to version 3.0.3 and built it with the hadoop-cloud profile to use Magic Committer and Parquet format.
Fixed an error when the hive.metastore.uris settings were ignored for Spark while using external Hive metastore.

2.0.41

Added hive-site.xml to classpath for Spark apps.
Fixed an error when system Python was used instead of a Conda environment while running PySpark.

2.0.40

Fixed an error when user scenarios failed to run.

2.0.39

Added support for lightweight clusters (without HDFS and data storage subclusters).

2.0.38 and 1.4.35

Adapted images to be used in subnets with a user-defined DNS zone.

2.0.37

Added the YC CLI to PATH for initialization scripts.

2.0.36

The YC CLI is installed on all cluster hosts by default.
Added the following values to environment variables for initialization scripts: CLUSTER_ID, S3_BUCKET, ROLE, CLUSTER_SERVICES, MIN_WORKER_COUNT, and MAX_WORKER_COUNT.

2.0.35

Added support for cluster initialization scripts.

2.0

Base components

The following components have been updated:

HBase — 2.2.7.
Hadoop — 3.2.2.
Hive — 3.1.2.
Livy — 0.8.0.
Oozie — 5.2.1.
Spark — 3.0.2.
Tez — 0.10.0.
Zeppelin — 0.9.0.

Deprecated components have been removed:

Flume
Sqoop

Python and machine learning libraries

Python has been updated to version 3.8.10.

The following libraries have been updated:

IPython — 7.19.0.
ipykernel — 5.3.4.
Matplotlib — 3.2.2.
pandas — 1.1.3.
PyArrow — 1.0.1.
PyHive — 0.6.1.
scikit-learn — 0.23.2.

The following libraries have been deleted:

CatBoost
LightGBM
TensorFlow
XGBoost

1.4

Base components

The following components have been updated:

HBase — 1.3.5.
Hadoop — 2.10.0.
Hive — 2.3.6.
Flume — 1.9.0.
Livy — 0.7.0.
Oozie — 5.2.0.
Spark — 2.4.6.
Sqoop — 1.4.7.
Tez — 0.9.2.
Zeppelin — 0.8.2.
Zookeeper — 3.4.14.

Python and machine learning libraries

Python has been updated to version 3.7.9.

The following libraries have been updated:

CatBoost — 0.20.2.
IPython — 7.9.0.
ipykernel — 5.1.3.
LightGBM — 2.3.0.
Matplotlib — 3.1.1.
pandas — 0.25.3.
PyArrow — 0.13.0.
PyHive — 0.6.1.
scikit-learn — 0.21.3.
TensorFlow— 1.15.0.
XGBoost — 0.90.

Yandex Data Proc image release notes

2.0.692.0.69

2.0.662.0.66

2.0.642.0.64

2.0.622.0.62

2.0.612.0.61

2.0.592.0.59

2.0.582.0.58

2.0.562.0.56

2.0.552.0.55

2.0.542.0.54

2.0.532.0.53

2.0.522.0.52

2.0.502.0.50

2.0.492.0.49

2.0.482.0.48

2.0.472.0.47

2.0.462.0.46

2.0.452.0.45

2.0.432.0.43

2.0.422.0.42

2.0.412.0.41

2.0.402.0.40

2.0.392.0.39

2.0.38 and 1.4.352.0.38 and 1.4.35

2.0.372.0.37

2.0.362.0.36

2.0.352.0.35

2.02.0

Base componentsBase components

Python and machine learning librariesPython and machine learning libraries

1.41.4

Base componentsBase components

Python and machine learning librariesPython and machine learning libraries

Was the article helpful?

2.0.69

2.0.66

2.0.64

2.0.62

2.0.61

2.0.59

2.0.58

2.0.56

2.0.55

2.0.54

2.0.53

2.0.52

2.0.50

2.0.49

2.0.48

2.0.47

2.0.46

2.0.45

2.0.43

2.0.42

2.0.41

2.0.40

2.0.39

2.0.38 and 1.4.35

2.0.37

2.0.36

2.0.35

2.0

Base components

Python and machine learning libraries

1.4

Base components

Python and machine learning libraries