Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
© 2022 Yandex.Cloud LLC
  • Contents
  • Getting started
    • Overview
    • Creating a database
    • Authentication
    • YDB command line interface (CLI)
    • YQL query language
    • YDB SDK
    • Self-deployment
      • Overview
      • Docker
      • Binary file
      • Minikube
    • Useful links
    • Amazon DynamoDB-compatible Document API
      • Setting up AWS tools
      • Working with data from the HTTP interface
      • Working with the AWS CLI
        • Overview
        • Creating a table
        • Adding data to a table
        • Reading data from a table
        • Updating data
        • Data selections
        • Deleting created resources
      • Working with the AWS SDK
        • Overview
        • Creating a table
        • Uploading data to a table
        • Managing records in a table
          • Creating a record
          • Reading a record
          • Updating a record
          • Deleting a record
        • Searching and extracting data
        • Deleting a table
  • Practical guidelines
    • Deploying a web application
    • Developing a Slack bot
    • Connecting to YDB from a Python function in Yandex Cloud Functions
    • Connecting to a YDB database from a Yandex Cloud Functions function in Node.js
    • Converting a video to a GIF in Python
    • Developing a skill for Alice and a website with authorization
  • Concepts
    • Overview
    • Terms and definitions
    • Connecting to and authenticating with a database
    • Data model and schema
    • Serverless and Dedicated operation modes
    • Transactions
    • Secondary indexes
    • Time to Live (TTL)
    • Scan queries
    • Database limits
    • YDB cluster
      • Overview
      • General YDB schema
      • Disk subsystem of a cluster
    • Quotas and limits
  • Step-by-step instructions
    • Overview
    • Renaming
    • Using a query plan and AST
    • Reading and writing data
    • Working with secondary indexes
  • Recommendations
    • Overview
    • Schema design
    • Partitioning tables
    • Secondary indexes
    • Paginated output
    • Loading large data volumes
    • Using timeouts
  • Managing databases
    • Overview
    • Cloud management console
      • Overview
      • Creating, updating, and deleting databases
      • Tables and directories
      • Access management
    • Yandex.Cloud CLI
    • Backup and recovery
    • Diagnostics
      • Overview
      • System views
      • Monitoring
    • Metric reference
  • Pricing policy
    • Overview
    • Serverless mode
      • Pricing policy for serverless mode
      • Query cost for YQL
      • Request cost for the Document API
      • Request cost for special APIs
    • Dedicated mode
  • Amazon DynamoDB-compatible HTTP API
    • All methods
    • Actions
      • BatchGetItem
      • BatchWriteItem
      • CreateTable
      • DeleteItem
      • DeleteTable
      • DescribeTable
      • DescribeTimeToLive
      • GetItem
      • ListTables
      • PutItem
      • Query
      • Scan
      • TransactGetItems
      • TransactWriteItems
      • UpdateItem
      • UpdateTimeToLive
    • Common errors
  • YQL
    • Overview
    • Data types
      • Overview
      • Simple
      • Optional
      • Containers
      • Special
      • Type casting
      • Text representation of data types
      • JSON
    • Syntax
      • Overview
      • Lexical structure
      • Expressions
      • ACTION
      • ALTER TABLE
      • CREATE TABLE
      • DECLARE
      • DELETE
      • DISCARD
      • DROP TABLE
      • GROUP BY
      • FLATTEN
      • INSERT
      • INTO RESULT
      • JOIN
      • PRAGMA
      • REPLACE
      • SELECT
      • UPDATE
      • UPSERT
      • VALUES
      • WINDOW
      • Unsupported statements
    • Built-in functions
      • Overview
      • Basic
      • Aggregate
      • Window
      • For lists
      • For dictionaries
      • For structures
      • For types
      • For JSON
      • C++ libraries
        • Overview
        • Hyperscan
        • Pcre
        • Pire
        • Re2
        • String
        • Unicode
        • DateTime
        • Url
        • Ip
        • Yson
        • Digest
        • Math
        • Histogram
    • YQL tutorial
      • Overview
      • Creating a table
      • Adding data to a table
      • Selecting data from all columns
      • Selecting data from specific columns
      • Sorting and filtering
      • Data aggregation
      • Additional selection criteria
      • Joining tables with JOIN
      • Inserting and updating data with REPLACE
      • Inserting and updating data with UPSERT
      • Inserting data with INSERT
      • Updating data with UPDATE
      • Deleting data
      • Adding and deleting columns
      • Deleting a table
  • Working with the YDB CLI
    • Overview
    • Install
    • Structure of YDB CLI commands
    • Service commands
    • Connecting to and authenticating with a database
    • Global parameters
    • Working with the DB schema
      • List of objects
      • Information about the object
      • Directories
      • Secondary indexes
      • Renaming tables
    • Operations with data
      • Making a DB query
      • Query execution plan
      • Streaming table reads
      • Scan queries
    • Importing and exporting data
      • Overview
      • File structure of data export
      • Exporting data to the file system
      • Importing data from the file system
      • Connecting to and authenticating with S3
      • Exporting data to S3
      • Importing data from S3
    • Managing profiles
      • Overview
      • Creating a profile
      • Using a profile in requests
      • Getting profile information
      • Deleting a profile
      • Activated profile
    • Information services
      • List of endpoints
      • Authentication
    • Load testing
      • Overview
      • Stock load
  • Working with the YDB SDK
    • Overview
    • Install
    • Authentication
    • Test app
      • Overview
      • C++
      • C# (.NET)
      • Go
      • Java
      • Node.js
      • PHP
      • Python
      • Archive
        • Go v1
        • Go v2
    • Handling errors in the API
    • Code recipes
      • Overview
      • Authentication
        • Overview
        • Using a token
        • Anonymous
        • Service account file
        • Metadata service
        • Using environment variables
        • Username and password based
      • Balancing
        • Overview
        • Random choice
        • Prefer the nearest data center
        • Prefer the availability zone
      • Running repeat queries
      • Troubleshooting
        • Overview
        • Enable logging
        • Enable metrics in Prometheus
        • Enable tracing in Jaeger
      • Setting the session pool size
  • Managing a cluster
    • Overview
    • Kubernetes
      • Overview
      • Deploying in Yandex Managed Service for Kubernetes
      • Deploying in AWS Elastic Kubernetes Service
      • Use
    • Manual
      • Overview
      • Local deployment
      • Cluster configuration
      • Production checklist
      • Maintaining a cluster's disk subsystem
        • Overview
        • How to stay within the failure model
        • Disk load balancing
        • Methods to free up space on physical devices
        • Cluster extension
        • Adding storage groups
        • Safe restart and shutdown of nodes
        • Enabling/disabling SelfHeal
        • Enabling/disabling Scrubbing
        • Moving VDisks
        • Updating configurations via CMS
        • Updating configuration of the actor system
    • Embedded UI
      • Overview
      • YDB Monitoring
      • Hive web-viewer
      • Connections overview
      • Logs
      • Charts
    • System views
  • Questions and answers
    • Overview
    • General questions
    • Errors
    • YQL
    • Serverless
    • All questions on one page
  1. YQL
  2. Built-in functions
  3. C++ libraries
  4. Pire

Pire

Written by
Yandex Cloud
  • Call syntax
  • Grep
  • Match
  • MultiGrep/MultiMatch
  • Capture
  • REPLACE

List of functions

  • Pire::Grep(String) -> (String?) -> Bool
  • Pire::Match(String) -> (String?) -> Bool
  • Pire::MultiGrep(String) -> (String?) -> Tuple<Bool, Bool, ...>
  • Pire::MultiMatch(String) -> (String?) -> Tuple<Bool, Bool, ...>
  • Pire::Capture(String) -> (String?) -> String?
  • Pire::Replace(String) -> (String?, String) -> String?

One of the options to match regular expressions in YQL is to use Pire (Perl Incompatible Regular Expressions). This is a very fast library of regular expressions developed at Yandex: at the lower level, it looks up the input string once, without any lookaheads or rollbacks, spending 5 machine instructions per character (on x86 and x86_64).

The speed is achieved by using the reasonable restrictions:

  • Pire is primarily focused at checking whether a string matches a regular expression.
  • The matching substring can also be returned (by Capture), but with restrictions (a match with only one group is returned).

By default, all functions work in the single-byte mode. However, if the regular expression is a valid UTF-8 string but is not a valid ASCII string, the UTF-8 mode is enabled automatically.

To enable the Unicode mode, you can put one character that's beyond ASCII with the ? operator, for example: \\w+я?.

Call syntax

To avoid compiling a regular expression at each table row, wrap the function call by a named expression:

$re = Pire::Grep("\\d+"); -- create a callable value to match a specific regular expression
SELECT * FROM table WHERE $re(key); -- use it to filter the table

Alert

When escaping special characters in a regular expression, be sure to use the second slash, since all the standard string literals in SQL can accept C-escaped strings, and the \d sequence is not a valid sequence (even if it were, it wouldn't search for numbers as intended).

You can enable the case-insensitive mode by specifying, at the beginning of the regular expression, the flag (?i).

Examples

$value = "xaaxaaxaa";
$match = Pire::Match("a.*");
$grep = Pire::Grep("axa");
$insensitive_grep = Pire::Grep("(?i)axa");
$multi_match = Pire::MultiMatch(@@a.*
.*a.*
.*a
.*axa.*@@);
$capture = Pire::Capture(".*x(a).*");
$capture_many = Pire::Capture(".*x(a+).*");
$replace = Pire::Replace(".*x(a).*");

SELECT
  $match($value) AS match,
  $grep($value) AS grep,
  $insensitive_grep($value) AS insensitive_grep,
  $multi_match($value) AS multi_match,
  $multi_match($value).0 AS some_multi_match,
  $capture($value) AS capture,
  $capture_many($value) AS capture_many,
  $replace($value, "b") AS replace;

/*
- match: `false`
- grep: `true`
- insensitive_grep: `true`
- multi_match: `(false, true, true, true)`
- some_multi_match: `false`
- capture: `"a"`
- capture_many: `"aa"`
- replace: `"xaaxaaxba"`
*/

Grep

Matches the regular expression with a part of the string (arbitrary substring).

Match

Matches the whole string against the regular expression.
To get a result similar to Grep (where substring matching is included), enclose the regular expression in .*. For example, use .*foo.* instead of foo.

MultiGrep/MultiMatch

Pire lets you match against multiple regular expressions in a single pass through the text and get a separate response for each match.
Use the MultiGrep/MultiMatch functions to optimize the query execution speed. Be sure to do it carefully, since the size of the state machine used for matching grows exponentially with the number of regular expressions:

  • If you want to match a string against any of the listed expressions (the results are joined with "or"), it would be much more efficient to combine the query parts in a single regular expression with | and match it using regular Grep or Match.
  • Pire has a limit on the size of the state machine (YQL uses the default value set in the library). If you exceed the limit, the error is raised at the start of the query: Failed to glue up regexes, probably the finite state machine appeared to be too large.
    When you call MultiGrep/MultiMatch, regular expressions are passed one per line using multiline string literals:

Examples

$multi_match = Pire::MultiMatch(@@a.*
.*x.*
.*axa.*@@);

SELECT
    $multi_match("a") AS a,
    $multi_match("axa") AS axa;

/*
- a: `(true, false, false)`
- axa: `(true, true, true)`
*/

Capture

If a string matches the specified regular expression, it returns a substring that matches the group enclosed in parentheses in the regular expression.
Capture is non-greedy: the shortest possible substring is returned.

Alert

The expression must contain only one group in parentheses. NULL (empty Optional) is returned in case of no match.

If the above limitations and features are unacceptable for some reason, we recommend that you consider Re2::Capture.

REPLACE

Pire doesn't support replace based on a regular expression. Pire::Replace implemented in YQL is a simplified emulation using Capture. It may run correctly, if the substring occurs more than once in the source string.

As a rule, it's better to use Re2::Replace instead.

Was the article helpful?

Language / Region
© 2022 Yandex.Cloud LLC
In this article:
  • Call syntax
  • Grep
  • Match
  • MultiGrep/MultiMatch
  • Capture
  • REPLACE