Subcluster runtime environment
When creating a Data Proc cluster, you can choose the image version (versions of Hadoop and additional components). Each image version also includes Conda (a Python environment management system) and a set of machine learning tools.
Version 1.1
Hadoop and component versions
- Hadoop 2.10.0
- Tez 0.9.2
- Hive 2.3.6
- ZooKeeper 3.4.14
- HBase 1.3.5
- Sqoop 1.4.7
- Oozie 4.3.1
- Spark 2.4.4
- Flume 1.8.0
- Zeppelin 0.8.2
Python and machine learning library versions:
- Python 3.7.5
- PyArrow 0.13.0
- ipykernel 5.1.3
- TensorFlow 1.15.0
- CatBoost 0.20
- PyHive 0.6.1
- LightGBM 2.3.0
- XGBoost 0.90
- scikit-learn 0.21.3
- pandas 0.25.3
- IPython 7.9.0
- Matplotlib 3.1.1
Version 1.0
Hadoop and component versions:
- Hadoop 2.8.5
- Tez 0.9.1
- Hive 2.3.4
- ZooKeeper 3.4.6
- HBase 1.3.3
- Sqoop 1.4.6
- Oozie 4.3.0
- Spark 2.2.1
- Flume 1.8.0
- Zeppelin 0.7.3
Python and machine learning library versions:
- Python 3.7
- PyArrow 0.11.1
- ipykernel 5.1.0
- TensorFlow 1.13.1
- CatBoost 0.14.2
- LightGBM 2.2.3
- XGBoost 0.82
- scikit-learn 0.21.1
- pandas 0.24.2
- IPython 7.5.0
- Matplotlib 3.0.3