Data Platform announcements and launches

Introducing Yandex Query, a new version of Yandex DataSphere, more features in Yandex Data Transfer, and updates to Managed Database services.

With service consumption increasing 2.5X in just the first half of 2022, Data Platform continues to be one of our fastest growing services. Managed platform services help create analytical dashboards, train ML models, and handle complex data transfer, processing, and storage scenarios.

In addition to the launch of Yandex Query and Yandex DataSphere updates, we introduced a number of improvements to our managed database services. DataLens, our service for visualizing and analyzing data, also saw the arrival of new features like new connectors and graphs. With each new update, Data Transfer streamlines and simplifies the delivery and migration of cloud-based data.

Yandex Query

Yandex Cloud has received a new managed service for performing analytical and streaming queries when working with unstructured and semi-structured data — Yandex Query. With Yandex Query, businesses will be able to streamline and accelerate the entire process of working with massive data sets. Integration of analytics and streaming processing further simplifies real-time data analysis. The service is now in preview mode and can be used for free.

Yandex DataSphere

We have released a new version of Yandex DataSphere, a fully functional integrated workstation for data scientists that goes beyond being a simple interface update and extends the functionality of industry-standard tools like JupyterLab. Now, data scientists can use our service to collaborate, save, and replicate their research results. The interface was designed using data from user experience research and customer feedback.

Yandex Data Transfer

Yandex DataSphere

We’re currently developing a service for logical data transfer between DBMS, object storage, and message brokers. We are pleased to announce the release of new Data Transfer features, new pairs, scenarios, and transfer types.

New Features:

  • Support for security groups.

  • A separate role for managing transfers via internet.

  • Partitioned tables migration support.

  • Snapshot acceleration. Sharded activations.

  • Data transformation.

  • Regular snapshots and incremental loading.

New pairs

New scenarios

Data Transfer employs the CDC (change data capture) method to transfer data. This functionality is now available to all users. You can now subscribe to the public format “debezium” within the service to receive change event streams in YDS or Kafka topics for your MySQL and PostgreSQL databases. Your applications can also subscribe to these topics and use them for their own production-related purposes.

New Transfer Types

Previously, we had the following transfer types:

  • Copying

  • Replication

  • Copying and replication

You can now make Copying regular while the transfer is running, as well as add lines that were not included in the previous iteration, and thereby manage data transfer with minimal delays and without access to replication logs.

Transformers

The service now accepts an additional configuration option, a transformer, which allows any arbitrary function to be invoked on the changeItems stream.

You can now, for example, filter a field or rename a table.

Sharded activations

Data Transfer now supports a more parallel transfer of historical data, which speeds up the process. MongoDB databases can be parallelized across collections, Greenplum databases — across shards, and PostgreSQL databases — across tables and within tables if certain conditions are met.

Yandex DataLens

Yandex Data Transfer

In the past year, the DataLens user base has grown 6.9X, with over 20,000 users within Yandex and over 25,000 active external instances.

New integrations:

  • Bitrix24

  • Yandex Query

  • YDB

  • New file connector

  • Yandex Monitoring

  • Prometheus

New chart features:

  • Combination charts

  • Split section

  • Linear indicators in tables

  • Parent-child trees

  • Conditional formatting

  • Gradient shading

  • Line shape

  • Column width in tables

  • Text wrapping in tables

  • Totals in tables

  • Totals in pivot tables

  • Map clusters

  • Query Inspector

New dashboard features:

  • Parameterization

  • Auto updates

  • Operators in selectors

  • Mobile layout setup

  • No-materialization method for publications

  • Embedded public dashboards

  • Support for night mode

Check out these and other DataLens features on the demo dashboard (in Russian).

Updates in data platform managed services

Yandex Data Proc

A service for processing multi-terabyte data sets using open source tools such as Apache Spark, Apache Hadoop®, Apache HBase, Apache Hive, Apache Zeppelin and other services in the Apache® ecosystem. This year saw the addition of:

  • Developing a master node with a public IP address.

  • Clustering non-replicated network disks up to 8TB in size.

  • Ability to cancel tasks with non-relevant results.

  • Lightweight Apache Spark clusters running without HDFS and DataNodes. As these clusters are deployed more quickly and at a lower cost, they can be used for machine learning tasks and storefront preparation.

  • Image version 2.1 is available for testing with Hadoop 3.3.2, Spark 3.2.1 and other component updates.

  • Initialization scripts are now supported and can be used for automatically installation or updates of task-required software.

Yandex Managed Service for Greenplum®

Service for managing massively parallel DBMS Greenplum® clusters [is now available] (https://cloud.yandex.ru/blog/posts/2022/03/managed-greenplum-ga) for all users.

  • New version 6.19 with previously reported bugs fixed.

  • Improved backup process thanks to unique append-only segment management.

  • Service management via CLI and Terraform.

  • Storage size optimization, including fast local storage, and ability to add segments to the cluster.

  • Working with external data sources via PXF. Connect to tables from external sources such as Apache Hive, ClickHouse, HBase, HDFS, MySQL, Oracle, PostgreSQL, SQL Server, and Object Storage buckets.

  • Support of pgcrypto, diskquota and other extensions.

Yandex Managed Service for ClickHouse
  • Hybrid storage allows ClickHouse data to be stored in cold S3 storage significantly reducing the system cost. New storage policies, “local,” and “object_storage” are now available for hybrid storage clusters. The “local” policy restricts data storage to local or network drives, and the “object_storage” policy allows only object storage. The hybrid storage option is now available to all users.

  • ClickHouse Keeper is a new service for replicating data and executing distributed DDL queries that uses a ZooKeeper-compatible client-server protocol. ClickHouse Keeper, unlike ZooKeeper, runs on its own hosts, which not only makes configuration fault-tolerant but also cost-efficient.

  • ClickHouse updates: 22.3LTS, 22.5.

  • Restore an entire sharded cluster from a backup copy.

Yandex Managed Service for PostgreSQL
  • New extensions: pg_cron, pgcompacttable, clickhouse_fdw, orafce, pg_qualstats, and hypopg.

  • New PG14 and 14-1C versions and previous versions updates.

  • Release of Odyssey 1.3 and the ability to deploy a new database from a template.

Yandex Managed Service for MySQL®
  • Host priorities for backups.

  • In the event of a master host change, you can now prioritize the selection of a new master host to take its place.

  • Manage the performance diagnostic service settings in all interfaces: CLI, Terraform, and UI console.

  • Accelerated process of restoring replicas from backups as a result of multithreading when compressing / encrypting backups.

Yandex Managed Service for Apache Kafka®
  • Kafka 3.0 and 3.1 updates.

  • Topic management with Terraform.

  • New connector: S3 Sink.

Yandex Managed Service for MongoDB
  • MongoDB 5.0. support
Yandex Managed Service for Redis
  • Management of persistence settings. If persistence is disabled, a cluster’s performance is higher but so is the risk of losing data.

Yandex DataLens

Pricing

View prices and calculate costs

Contact us

Start using Yandex Cloud

About the company

More about the Yandex Cloud platform
Data Platform announcements and launches
Sign in to save this post