Creating an Apache Airflow™ cluster
Every Managed Service for Apache Airflow™ cluster consists of a set of Apache Airflow™ components, each of which can be represented in multiple instances. The instances may reside in different availability zones.
Before creating a cluster
- In the folder where you want to create a cluster, create a service account with the
storage.viewer
role. - Create a static access key for the service account.
- Create a Yandex Object Storage bucket to store DAG files.
Create a cluster
-
In the management console
, select the folder where you want to create a cluster. -
Select Managed Service for Apache Airflow™.
-
Click Create a cluster.
-
Under Basic parameters:
- Enter a name for the cluster. The name must be unique within the folder.
- (Optional) Enter a cluster description.
- (Optional) Create labels:
- Click Add label.
- Enter a label in
key: value
format. - Click Enter.
-
Under Access settings, set a password for the admin user. The password must be not less than 8 characters long and contain at least:
- One uppercase letter
- One lowercase letter
- One digit
- One special character
Note
Save the password locally or memorize it. The service does not show passwords after the registry is created.
-
Under Network settings, select:
-
Availability zones for the cluster
-
Cloud network
-
Subnet in each of the selected availability zones
-
Security group for the cluster network traffic
Security group settings do not affect access to the Apache Airflow™ web interface.
-
-
Set the number of instances and resources for the Managed Service for Apache Airflow™ components:
-
Web server
-
Scheduler
-
Workers
Note
If the minimum and maximum number of workers are the same, a fixed amount of workers will be created. If the minimum value is smaller than the maximum one, then, if there are no jobs in the queue, the number of workers will be equal to the minimum value; otherwise, the number of workers will be increasing without exceeding the specified maximum value.
-
(Optional) Triggerer services
-
-
(Optional) Under Dependencies, specify the names of pip packages to install additional libraries and applications for running DAG files in the cluster.
To specify multiples packages, click Add.
If required, you can set version restrictions for the installed packages, for example:
pandas==2.0.2 scikit-learn>=1.0.0 clickhouse-driver~=0.2.0
The package name format and version are defined by the install command:
pip install
for pip packages.Warning
To install pip packages from public repositories, specify a network with configured egress NAT under Network settings.
-
Under DAG file storage, specify:
- Name of the previously created bucket that will store DAG files.
- Parameters of a static access key for the service account.
-
(Optional) Under Advanced settings, enable cluster deletion protection.
-
(Optional) Under Airflow configuration, specify Apache Airflow™ additional properties
, e.g.,api.maximum_page_limit
as a key and150
as its value. Fill out the fields manually or import the settings from a configuration file (see sample configuration file ). -
Click Create.