PQL regex() function
regex() — Determines what is returned in the result.
If the regex() contains a capture such as something in brackets like
(\d+), then that will be used as the result. If there is no
capture, then the entire match is returned. If there are multiple captures,
then they are returned as a list.
Syntax
The regex() function returns a
string, and takes 2 parameters, a tree and a string or regular expression:
string regex(tree input, string input or regular expression)
Return value
The regex() function returns a tree.
Description
PQL supports the use of regular expressions (REGEX) based on the syntax
of the PCRE library version 7.7. The
regex() function is useful for extracting fields from the data.
In PQL, regular expressions are surrounded by question marks because the forward slash is reserved for paths.
The second argument to the regex() function can be either a
regular expression, or a string that is compiled to a regular expression,
so you can use a value in the database as the regular expression. However,
specifying regular expressions as strings is much less efficient because
the regular expression must be recompiled each time the code executes,
rather than just once at parse time.
You can use the regex() function to test every leaf node in the
entire tree against the regex. Note that this is far slower that doing a search,
but this is the only method possible in PQL, and you do get the full power of
regular expressions.
SELECT regex(/, ?(v|V)irus (p|P)attern?) from /
Examples
The following examples use the same basic query with a different capture to demonstrate that a regex() contains a capture uses the capture as the result, a regex() without a capture returns the entire match, and a regex() with multiple captures returns them as a list.
To try the following examples yourself, run this MERGE statement
in your sandbox to establish some data:
MERGE into / values {val[true] => '192.168.10.12'}
Example 1: No capture
Query: This query contains no capture.
select regex(val, ?\d+\.\d+\.\d+\.\d+?) from /
Results: This query returns the entire match.
row regex(val, /\d+\.\d+\.\d+\.\d+/) 192.168.10.12
Example 2: One capture
Query: This query contains one capture.
SELECT regex(val, ?\d+\.(\d+)\.\d+\.\d+?) from /
Results: This query uses the capture as the result.
row regex(val, /\d+\.(\d+)\.\d+\.\d+/) 168
Example 3: Multiple captures
Query: This query contains multiple captures.
SELECT regex(val, ?(\d+)\.(\d+)\.(\d+)\.(\d+)?) FROM /
Results: This query returns them as a list.
row
regex(val, /(\d+)\.(\d+)\.(\d+)\.(\d+)/)
match_1 192
match_2 168
match_3 10
match_4 12