Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
© 2022 Yandex.Cloud LLC
  • Contents
  • Getting started
    • Overview
    • Creating a database
    • Authentication
    • YDB command line interface (CLI)
    • YQL query language
    • YDB SDK
    • Self-deployment
      • Overview
      • Docker
      • Binary file
      • Minikube
    • Useful links
    • Amazon DynamoDB-compatible Document API
      • Setting up AWS tools
      • Working with data from the HTTP interface
      • Working with the AWS CLI
        • Overview
        • Creating a table
        • Adding data to a table
        • Reading data from a table
        • Updating data
        • Data selections
        • Deleting created resources
      • Working with the AWS SDK
        • Overview
        • Creating a table
        • Uploading data to a table
        • Managing records in a table
          • Creating a record
          • Reading a record
          • Updating a record
          • Deleting a record
        • Searching and extracting data
        • Deleting a table
  • Practical guidelines
    • Deploying a web application
    • Developing a Slack bot
    • Connecting to YDB from a Python function in Yandex Cloud Functions
    • Connecting to a YDB database from a Yandex Cloud Functions function in Node.js
    • Converting a video to a GIF in Python
    • Developing a skill for Alice and a website with authorization
  • Concepts
    • Overview
    • Terms and definitions
    • Connecting to and authenticating with a database
    • Data model and schema
    • Serverless and Dedicated operation modes
    • Transactions
    • Secondary indexes
    • Time to Live (TTL)
    • Scan queries
    • Database limits
    • YDB cluster
      • Overview
      • General YDB schema
      • Disk subsystem of a cluster
    • Quotas and limits
  • Step-by-step instructions
    • Overview
    • Renaming
    • Using a query plan and AST
    • Reading and writing data
    • Working with secondary indexes
  • Recommendations
    • Overview
    • Schema design
    • Partitioning tables
    • Secondary indexes
    • Paginated output
    • Loading large data volumes
    • Using timeouts
  • Managing databases
    • Overview
    • Cloud management console
      • Overview
      • Creating, updating, and deleting databases
      • Tables and directories
      • Access management
    • Yandex.Cloud CLI
    • Backup and recovery
    • Diagnostics
      • Overview
      • System views
      • Monitoring
    • Metric reference
  • Pricing policy
    • Overview
    • Serverless mode
      • Pricing policy for serverless mode
      • Query cost for YQL
      • Request cost for the Document API
      • Request cost for special APIs
    • Dedicated mode
  • Amazon DynamoDB-compatible HTTP API
    • All methods
    • Actions
      • BatchGetItem
      • BatchWriteItem
      • CreateTable
      • DeleteItem
      • DeleteTable
      • DescribeTable
      • DescribeTimeToLive
      • GetItem
      • ListTables
      • PutItem
      • Query
      • Scan
      • TransactGetItems
      • TransactWriteItems
      • UpdateItem
      • UpdateTimeToLive
    • Common errors
  • YQL
    • Overview
    • Data types
      • Overview
      • Simple
      • Optional
      • Containers
      • Special
      • Type casting
      • Text representation of data types
      • JSON
    • Syntax
      • Overview
      • Lexical structure
      • Expressions
      • ACTION
      • ALTER TABLE
      • CREATE TABLE
      • DECLARE
      • DELETE
      • DISCARD
      • DROP TABLE
      • GROUP BY
      • FLATTEN
      • INSERT
      • INTO RESULT
      • JOIN
      • PRAGMA
      • REPLACE
      • SELECT
      • UPDATE
      • UPSERT
      • VALUES
      • WINDOW
      • Unsupported statements
    • Built-in functions
      • Overview
      • Basic
      • Aggregate
      • Window
      • For lists
      • For dictionaries
      • For structures
      • For types
      • For JSON
      • C++ libraries
        • Overview
        • Hyperscan
        • Pcre
        • Pire
        • Re2
        • String
        • Unicode
        • DateTime
        • Url
        • Ip
        • Yson
        • Digest
        • Math
        • Histogram
    • YQL tutorial
      • Overview
      • Creating a table
      • Adding data to a table
      • Selecting data from all columns
      • Selecting data from specific columns
      • Sorting and filtering
      • Data aggregation
      • Additional selection criteria
      • Joining tables with JOIN
      • Inserting and updating data with REPLACE
      • Inserting and updating data with UPSERT
      • Inserting data with INSERT
      • Updating data with UPDATE
      • Deleting data
      • Adding and deleting columns
      • Deleting a table
  • Working with the YDB CLI
    • Overview
    • Install
    • Structure of YDB CLI commands
    • Service commands
    • Connecting to and authenticating with a database
    • Global parameters
    • Working with the DB schema
      • List of objects
      • Information about the object
      • Directories
      • Secondary indexes
      • Renaming tables
    • Operations with data
      • Making a DB query
      • Query execution plan
      • Streaming table reads
      • Scan queries
    • Importing and exporting data
      • Overview
      • File structure of data export
      • Exporting data to the file system
      • Importing data from the file system
      • Connecting to and authenticating with S3
      • Exporting data to S3
      • Importing data from S3
    • Managing profiles
      • Overview
      • Creating a profile
      • Using a profile in requests
      • Getting profile information
      • Deleting a profile
      • Activated profile
    • Information services
      • List of endpoints
      • Authentication
    • Load testing
      • Overview
      • Stock load
  • Working with the YDB SDK
    • Overview
    • Install
    • Authentication
    • Test app
      • Overview
      • C++
      • C# (.NET)
      • Go
      • Java
      • Node.js
      • PHP
      • Python
      • Archive
        • Go v1
        • Go v2
    • Handling errors in the API
    • Code recipes
      • Overview
      • Authentication
        • Overview
        • Using a token
        • Anonymous
        • Service account file
        • Metadata service
        • Using environment variables
        • Username and password based
      • Balancing
        • Overview
        • Random choice
        • Prefer the nearest data center
        • Prefer the availability zone
      • Running repeat queries
      • Troubleshooting
        • Overview
        • Enable logging
        • Enable metrics in Prometheus
        • Enable tracing in Jaeger
      • Setting the session pool size
  • Managing a cluster
    • Overview
    • Kubernetes
      • Overview
      • Deploying in Yandex Managed Service for Kubernetes
      • Deploying in AWS Elastic Kubernetes Service
      • Use
    • Manual
      • Overview
      • Local deployment
      • Cluster configuration
      • Production checklist
      • Maintaining a cluster's disk subsystem
        • Overview
        • How to stay within the failure model
        • Disk load balancing
        • Methods to free up space on physical devices
        • Cluster extension
        • Adding storage groups
        • Safe restart and shutdown of nodes
        • Enabling/disabling SelfHeal
        • Enabling/disabling Scrubbing
        • Moving VDisks
        • Updating configurations via CMS
        • Updating configuration of the actor system
    • Embedded UI
      • Overview
      • YDB Monitoring
      • Hive web-viewer
      • Connections overview
      • Logs
      • Charts
    • System views
  • Questions and answers
    • Overview
    • General questions
    • Errors
    • YQL
    • Serverless
    • All questions on one page
  1. YQL
  2. Built-in functions
  3. C++ libraries
  4. Re2

Re2

Written by
Yandex Cloud
  • Re2::Grep / Re2::Match
  • Re2::Capture
  • Re2::FindAndConsume
  • Re2::Replace
  • Re2::Count
  • Re2::Options

List of functions

  • Re2::Grep(String) -> (String?) -> Bool
  • Re2::Match(String) -> (String?) -> Bool
  • Re2::Capture(String) -> (String?) -> Struct<_1:String?,foo:String?,...>
  • Re2::FindAndConsume(String) -> (String?) -> List<String>
  • Re2::Replace(String) -> (String?, String) -> String?
  • Re2::Count(String) -> (String?) -> Uint32
  • Re2::Options([CaseSensitive:Bool?,DotNl:Bool?,Literal:Bool?,LogErrors:Bool?,LongestMatch:Bool?,MaxMem:Uint64?,NeverCapture:Bool?,NeverNl:Bool?,OneLine:Bool?,PerlClasses:Bool?,PosixSyntax:Bool?,Utf8:Bool?,WordBoundary:Bool?]) -> Struct<CaseSensitive:Bool,DotNl:Bool,Literal:Bool,LogErrors:Bool,LongestMatch:Bool,MaxMem:Uint64,NeverCapture:Bool,NeverNl:Bool,OneLine:Bool,PerlClasses:Bool,PosixSyntax:Bool,Utf8:Bool,WordBoundary:Bool>

As Pire has certain limitations needed to ensure efficient string matching against regular expressions, it might be too complex or even impossible to use Pire for some tasks. For such situations, we added another module to support regular expressions based on google::RE2. It offers a broader range of features (see the official documentation).

By default, the UTF-8 mode is enabled automatically if the regular expression is a valid UTF-8-encoded string, but is not a valid ASCII string. You can manually control the settings of the re2 library, if you pass the result of the Re2::Options function as the second argument to other module functions, next to the regular expression.

Warning

Make sure to double all the backslashes in your regular expressions (if they are within a quoted string): standard string literals are treated as C-escaped strings in SQL. You can also format regular expressions as raw strings @@regexp@@: double slashes are not needed in this case.

Examples

$value = "xaaxaaxaa";
$options = Re2::Options(false AS CaseSensitive);
$match = Re2::Match("[ax]+\\d");
$grep = Re2::Grep("a.*");
$capture = Re2::Capture(".*(?P<foo>xa?)(a{2,}).*");
$replace = Re2::Replace("x(a+)x");
$count = Re2::Count("a", $options);

SELECT
  $match($value) AS match,
  $grep($value) AS grep,
  $capture($value) AS capture,
  $capture($value)._1 AS capture_member,
  $replace($value, "b\\1z") AS replace,
  $count($value) AS count;

/*
- match: `false`
- grep: `true`
- capture: `(_0: 'xaaxaaxaa', _1: 'aa', foo: 'x')`
- capture_member: `"aa"`
- replace: `"baazaaxaa"`
- count:: `6`
*/

Re2::Grep / Re2::Match

If you leave out the details of implementation and syntax of regular expressions, those functions are totally similar to the applicable functions from the Pire modules. With other things equal and no specific preferences, we recommend that you use Pire::Grep or Pire::Match.

Re2::Capture

Unlike Pire::Capture, Re2::Capture supports multiple and named capturing groups.
Result type: a structure with the fields of the type String?.

  • Each field corresponds to a capturing group with the applicable name.
  • For unnamed groups, the following names are generated: _1, _2, etc.
  • The result always includes the _0 field containing the entire substring matching the regular expression.

For more information about working with structures in YQL, see the section on containers.

Re2::FindAndConsume

Searches for all occurrences of the regular expression in the passed text and returns a list of values corresponding to the parenthesized part of the regular expression for each occurrence.

Re2::Replace

Works as follows:

  • In the input string (first argument), all the non-overlapping substrings matching the regular expression are replaced by the specified string (second argument).
  • In the replacement string, you can use the contents of capturing groups from the regular expression using back-references in the format: \\1, \\2 etc. The \\0 back-reference stands for the whole substring that matches the regular expression.

Re2::Count

Returns the number of non-overlapping substrings of the input string that have matched the regular expression.

Re2::Options

Notes on Re2::Options from the official repository

Parameter Default Comments
CaseSensitive:Bool? true match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)
DotNl:Bool? false let . match \n (default )
Literal:Bool? false interpret string as literal, not regexp
LogErrors:Bool? true log syntax and execution errors to ERROR
LongestMatch:Bool? false search for longest match, not first match
MaxMem:Uint64? - (see below) approx. max memory footprint of RE2
NeverCapture:Bool? false parse all parents as non-capturing
NeverNl:Bool? false never match \n, even if it is in regexp
PosixSyntax:Bool? false restrict regexps to POSIX egrep syntax
Utf8:Bool? true text and pattern are UTF-8; otherwise Latin-1
The following options are only consulted when PosixSyntax == true. When PosixSyntax == false, these features are always enabled and cannot be turned off; to perform multi-line matching in that case, begin the regexp with (?m).
PerlClasses:Bool? false allow Perl's \d \s \w \D \S \W
WordBoundary:Bool? false allow Perl's \b \B (word boundary and not)
OneLine:Bool? false ^ and $ only match beginning and end of text

It is not recommended to use Re2::Options in the code. Most parameters can be replaced with regular expression flags.

Flag usage examples

$value = "Foo bar FOO"u;
-- enable case-insensitive mode
$capture = Re2::Capture(@@(?i)(foo)@@);

SELECT
    $capture($value) AS capture;

$capture = Re2::Capture(@@(?i)(?P<vasya>FOO).*(?P<banan>bar)@@);

SELECT
    $capture($value) AS capture;

In both cases, the word VASYA will be found. Using the raw string @@regexp@@ lets you avoid double slashes.

Was the article helpful?

Language / Region
© 2022 Yandex.Cloud LLC
In this article:
  • Re2::Grep / Re2::Match
  • Re2::Capture
  • Re2::FindAndConsume
  • Re2::Replace
  • Re2::Count
  • Re2::Options