DQL Definition¶

Global constraints are as follows:

Non-keywords (such as metric names, tag names, etc.) are case-sensitive, keywords and function names are case-insensitive.
# is used as the line comment character, inline comments are not supported.
Supported operators:
- + - Addition
- - - Subtraction
- * - Multiplication
- / - Division
- % - Modulus
- = - Equal to
- != - Not equal to
- <= - Less than or equal to
- < - Less than
- >= - Greater than or equal to
- > - Greater than
- ^ - Exponentiation
- && - Logical AND
- || - Logical OR
Supported keywords:

AND AS ASC AUTO
BY DESC FALSE
LIMIT LINEAR
NIL OFFSET OR PREVIOUS
SLIMIT SOFFSET TRUE

Identifiers: Identifiers have several forms to be compatible with various variable naming conventions:
- Normal variable names can only contain [_a-zA-Z0-9] characters, and the first character cannot be a digit. For example, _abc, _abc123, _123ab.
- Other forms of variable names:
  - this+is-a*xx/yy^zz?variable, by should be written as `this+is-a*xx/yy^zz?variable`, `by`, the former variable contains operators, the latter by is a DQL keyword.
  - Supports Chinese and other UTF8 identifiers, such as M::cpu:(usage AS usage rate) [5m].
    - Supports emojis: M::cpu:(usage AS usage rate👍) [5m].
  - Variables containing a backtick, this`is-a-vairalbe should be written as `identifier("this`is-a-vairalbe")`.
String values can use double quotes and single quotes: "this is a string" and 'this is a string' are equivalent.
Special strings:
- base64 strings: DQL supports processing base64 strings. For base64 strings, DQL can automatically decode the original string during query, written as:
  - b64`some-base64-string`
  - b64'some-base64-string'
  - b64"some-base64-string"
- Regular expression strings: The original re('xxx') is deprecated, it is recommended to use the following forms to identify regular strings.
  - re`some-regexp`* (recommended)*
  - re'some-regexp'
  - re"some-regexp"
Supported data types:
- Floating point (123.4, 5.67E3)
- Integer (123, -1)
- String ('Zhang San', "hello world")
- Boolean (true, false)
- Duration (1y, 1w, 1d, 1h, 1m, 1s, 1ms, 1us, 1ns representing 1 year/week/day/hour/minute/second/millisecond/microsecond/nanosecond respectively)

Query¶

Queries follow the following syntax paradigm. Note that the relative order of each part cannot be changed, such as time-expr cannot appear before filter-clause.

namespace::
    data-source
    target-clause
    filter-clause
    time-expr
    by-clause
    having-clause
    order-by-clause
    limit-clause
    offset-clause
    sorder-by-clause
    slimit-clause
    soffset-clause

From a syntax perspective, data-source is mandatory (similar to the FROM clause in SQL), while other parts are optional. However, certain constraints are imposed on the actual execution of queries (such as time_expr not allowing too large a time span).

Examples:

# Get all field data of the metric set cpu for the last 5 minutes
M::cpu [5m]

# Find all metric data matching the regular expression *db for the last 5 minutes
M::re('*db') [5m]

# Get all field data of the metric set cpu from 10 minutes ago to 5 minutes ago
M::cpu [10m:5m]

# Get all field data of the metric set cpu from 10 minutes ago to 5 minutes ago, aggregated at 1-minute intervals
M::cpu:(usage_idle) [10m:5m:1m]

# Query the time series data of the metric set cpu for the last 5 minutes for two fields time_active, time_guest_nice,
# filtered by host and cpu tags, and grouped by host and cpu.
M:: cpu:(time_active, time_guest_nice)
        { host = "host-name", cpu = "cpu0" } [5m] BY host,cpu

# Get the top 10 sorted by height in descending order
O::human:(height, age) { age > 100, sex = "female" } ORDER BY height desc LIMIT 10

M::cpu,mem:(time_active, time_guest_nice, host) { host = "host-name", cpu = "cpu0" } [5m] BY host,cpu

Note: :: and : can have whitespace characters on both sides, the following statements are equivalent:

M::cpu:(time_active, time_guest_nice)
    { host = "host-name", cpu = "cpu0" } [5m]

M   ::cpu : (time_active, time_guest_nice)
    { host = "host-name", cpu = "cpu0" } [5m]

M   :: cpu :   (time_active, time_guest_nice)
    { host = "host-name", cpu = "cpu0" } [5m]

Examples of Queries for Each Data Type¶

M::metric set name:(aggregation function(metric name)) { tag name = 'tag value' } BY tag name
- For example: M::cpu:(last(usage_system)) {host=xxx} BY host_ip
L::log source:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name
- For example: L::datakit:(COUNT(*)) { index = 'default' } BY host
O::category name:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name
- For example: O::HOST:(COUNT(*)) { class = 'HOST' } BY host_ip
E::event source:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name
- For example: E::monitor:(COUNT(create_time)) { create_time = 1688708829409 } BY df_event_id
T::service name:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name
- For example: T::mysqli:(COUNT(resource)) { status = 'ok' } BY status
R::data source:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name
- For example: R::error:(LAST(browser)) { city = 'unknown' } BY city
S::classification name:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name
- For example: S::storage:(LAST(host)) { level = re('warn') } BY level
N::network source:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name
- For example: N::httpflow:(FIRST(direction)) { http_version = '1.1' } BY dst_ip_type
P::Profiling name:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name
- For example: P::mysqli:(COUNT(resource)) { status = 'ok' } BY status

Examples without using aggregation functions:

Count the number of containers in different 【namespace】:

O::docker_containers:(COUNT(*)) BY namespace

Query all fields of containers and return the latest 10 records:

O::docker_containers {host=xxx} limit 10

Statements¶

namespace¶

Semantically, the following data sources are currently supported:

M/metric - Time series metric data
O/object - Object data
CO/custom_object - User Resource Catalog data
L/logging - Log data
E/event - Event data
T/tracing - Tracing data
R/rum - RUM data
F/func - Func function calculation
N/network - Network eBPF data lookup

Syntactically, data sources are not constrained. The data source syntax is as follows:

data-source ::
    # Specific query details...

In specific queries, if no data source is specified, it defaults to metric (or M), meaning time series metrics are the default data source for DQL.

target-clause¶

The result list of the query:

M::cpu:(time_active, system_usage) {host="biz_prod"} [5m]

# Here, calculations between different metrics (types should be basically matched) on the same metric set are supported
M::cpu:(time_active+1, time_active/time_guest_nice) [5m]

filter-clause¶

The filter clause is used to filter the result data, similar to the where condition in SQL:

# Query the height of centenarian females in the population object (__class=human)
O::human:(height) { age > 100, sex = "female" }

# Filter with regular expressions
O::human:(height) { age > 100, sex != re("female") }

# Filter with calculation expressions
O::human:(height) { (age + 1)/2 > 31, sex != re("female") }

# Filter with OR expressions
O::human:(height) { age > 31 || sex != re("female"), weight > 70}

# Result column with aggregation
M::cpu:(avg(time_active) AS time_active_avg) [1d::1h]

# Result column with aggregation filling
M::cpu:(fill(avg(time_active) AS time_active_avg, 0.1)) [1d::1h]

# Query with in list, where the relationship between options in the in list is logical OR, and the in list can only contain numbers or strings
O::human:(height) { age in [30, 40, 50], weight > 70}

About filling:

Numerical filling: Such as cpu:(fill(f1, 123), fill(f2, "foo bar"), fill(f3, 123.456))
Linear filling: Such as cpu:(fill(f1, LINEAR))
Previous value filling: Such as cpu:(fill(f1, PREVIOUS))

Note: Multiple filter conditions are defaulted to AND relationships, but if you want to express OR relationships, use the || operator. The following two statements are equivalent:

O::human:(height) { age > 31, sex != re("female") }
O::human:(height) { age > 31 && sex != re("female") }

A complex filter expression:

M::some_metric {(a>123.45 && b!=re("abc")) || (z!="abc"), c=re("xyz")} [1d::30m]

time-expr¶

DataFlux data characteristics have time attributes, so the time expression is represented by a separate clause:

time-expr consists of 3 parts [start_time:end_time:interval:rollup]:

Sequence	Name	Mandatory	Description	Example
1	`start_time`	No	Start time of time filter	`1672502400000` / `1672502400`
2	`end_time`	No	End time of time filter	`1672588800000` / `1672588800`
3	`interval`	No	Time aggregation period, generally needs to be used with aggregation or rolling aggregation, supports `s`, `m`, `h`, `d` and other time units, can be combined	`1s`/`1m`/`1h` etc
4	`rollup`	No	Rolling aggregation function name, currently supported aggregation functions are `avg`, `sum`, `min`, `max`, `count`, `first`, `last`, `stddev`	`last`

Note:

start_time, end_time support 3 formats:

Numeric value with time unit, e.g.: 1m
Timestamp, e.g.: 1672502400
Time value in milliseconds, 1672502400000

interval time units support the following:

ns - Nanosecond
us - Microsecond
ms - Millisecond
s - Second
m - Minute
h - Hour
d - Day
w - Week
y - Year, specified as 365d, without distinguishing leap years.

rollup rolling aggregation function names, currently supported aggregation functions are:

avg: Average
sum: Sum
min: Minimum
max: Maximum
count: Count
first: First
last: Last
deriv: Rate of change per second, estimated by subtracting the first value from the last value and dividing by the time interval
rate: Rate of change per second, similar to deriv but does not return negative results, logic consistent with PromQL
irate: Instantaneous rate of change, estimated by the difference between the last two values and divided by the time interval, logic consistent with PromQL
p99, p95, p90, p75: Percentile calculation, supports using p followed by any number to express the percentage
median: Median, equivalent to p50
stddev: Standard deviation

rollup rolling aggregation is to split a single time series into different time periods according to the given interval, and perform aggregation on each time period. When rollup is empty, it means no rolling aggregation is performed.

Common examples:

[5m] - Last 5 minutes
[10m:5m] - Last 10 minutes to last 5 minutes
[10m:5m:1m] - Last 10 minutes to last 5 minutes, and the results are aggregated at 1-minute intervals
[1672502400000:1672588800000] - 2023-01-01 00:00:00 to 2023-01-02 00:00:00 time range
[1672502400:1672588800] - 2023-01-01 00:00:00 to 2023-01-02 00:00:00 time range

by-clause Statement¶

The BY clause is used to classify and aggregate the results, similar to GROUP BY in MySQL.

having-clause Statement¶

The HAVING clause is used to filter the results after aggregation, similar to HAVING in MySQL.

# Get all hosts with CPU usage greater than 80%
M::cpu:(max(`usage_total`) as `max_usage`) by host having max_usage > 80

order-by-clause Statement¶

The ORDER BY clause sorts the results, similar to ORDER BY in MySQL.

Note: 1. "Metric data" only supports sorting by the time field; 2. When the query groups by, order-by will not take effect, please use sorder-by for sorting.

# Get all host CPU usage, sorted by time in descending order
M::cpu:(`usage_total`) order by time desc

# Get all log data, sorted by response time in ascending order
L::`*`:(`*`) order by response_time asc

sorder-by-clause Statement¶

The SORDER BY clause sorts the groups.

# Get the maximum CPU usage of different hosts, sorted by host name in descending order
M::cpu:(max(`usage_total`)) by host sorder by host desc

# Get the maximum CPU usage of different hosts, sorted by maximum CPU usage in ascending order
M::cpu:(max(`usage_total`) as m) by host sorder by m

# Get the CPU usage of different hosts, sorted by the latest CPU usage in ascending order
M::cpu:(`usage_total`) sorder by usage_total

limit Statement¶

Used to specify the number of rows to return.

Note:

For time series data, if the DQL statement contains both by phrase and limit phrase, limit constrains the number of returned records in each aggregation group.

# Return three cpu records
M::cpu:() limit 3

# Return three cpu records for each host
M::cpu:() by host limit 3

slimit Statement¶

Used to specify the number of groups.

# Return CPU usage information for three hosts
M::cpu:(last(usage_total)) by host slimit 3

# Return CPU information for three hosts, where each host returns three records
M::cpu:() by host limit 3 slimit 3

SHOW Statement¶

SHOW_xxx is used to browse data (function names are case-insensitive):

SHOW_MEASUREMENT() - View the list of metric sets, supports filter-clause, limit and offset statements
SHOW_OBJECT_CLASS() - View the list of object classifications
SHOW_CUSTOM_OBJECT_SOURCE() - View the list of Resource Catalog data types
SHOW_EVENT_SOURCE() - View the list of event sources
SHOW_LOGGING_SOURCE() - View the list of log sources
SHOW_TRACING_SERVICE() - View the list of tracing sources
SHOW_RUM_TYPE() - View the list of RUM data types
SHOW_NETWORK_SOURCE() - View the list of network eBPF data types
SHOW_WORKSPACES() - View the current workspace and its authorized workspace information

More show functions, see Function Documentation

Nested Queries and Statement Blocks¶

Use () to represent the separation between subqueries and outer queries, such as two-level nesting:

M::(
# Subquery
M::cpu:(usage_total) {host='kind'}
):(last(usage_total))  # Outer query target column
{}  # Outer query filter condition

Three-level nesting:

M::(M::(M::cpu:(usage_total) {host='kind'}):(usage_total) {usage_total > 0} ):(last(usage_total))

In principle, there is no limit to the nesting levels. But multiple parallel subqueries are not allowed at a certain nesting level, such as:

object::(     # Second-level query
        object::( # Third-level query
                object::a:(f1,f2,f3) {host="foo"}
            ):(f1,f2),

        object::( # Parallel third-level query: not supported
                object::b:(f1,f2,f3) {host="foo"}
            ):(f1,f2)
    ):(f1)

Special Usage¶

If the message field is of json data type (currently only logs are supported), then the following form is supported to directly extract fields through DQL:

L::nginx { @abc.def = "xyz" }

It is equivalent to the following query, that is, @ represents message@json, which is a shorthand.

L::nginx { `message@json.abc.def` = "xyz" }