Skip to content

DQL Definition


Global constraints are as follows:

  • Non-keywords (such as metric names, tag names, etc.) are case-sensitive, keywords and function names are case-insensitive.

  • # is used as the line comment character, inline comments are not supported.

  • Supported operators:

    • + - Addition
    • - - Subtraction
    • * - Multiplication
    • / - Division
    • % - Modulus
    • = - Equal to
    • != - Not equal to
    • <= - Less than or equal to
    • < - Less than
    • >= - Greater than or equal to
    • > - Greater than
    • ^ - Exponentiation
    • && - Logical AND
    • || - Logical OR
  • Supported keywords:

AND AS ASC AUTO
BY DESC FALSE
LIMIT LINEAR
NIL OFFSET OR PREVIOUS
SLIMIT SOFFSET TRUE
  • Identifiers: Identifiers have several forms to be compatible with various variable naming conventions:

    • Normal variable names can only contain [_a-zA-Z0-9] characters, and the first character cannot be a digit. For example, _abc, _abc123, _123ab.
    • Other forms of variable names:
      • this+is-a*xx/yy^zz?variable, by should be written as `this+is-a*xx/yy^zz?variable`, `by`, the former variable contains operators, the latter by is a DQL keyword.
      • Supports Chinese and other UTF8 identifiers, such as M::cpu:(usage AS usage rate) [5m].
        • Supports emojis: M::cpu:(usage AS usage rate👍) [5m].
      • Variables containing a backtick, this`is-a-vairalbe should be written as `identifier("this`is-a-vairalbe")`.
  • String values can use double quotes and single quotes: "this is a string" and 'this is a string' are equivalent.

  • Special strings:

    • base64 strings: DQL supports processing base64 strings. For base64 strings, DQL can automatically decode the original string during query, written as:

      • b64`some-base64-string`
      • b64'some-base64-string'
      • b64"some-base64-string"
    • Regular expression strings: The original re('xxx') is deprecated, it is recommended to use the following forms to identify regular strings.

      • re`some-regexp`* (recommended)*
      • re'some-regexp'
      • re"some-regexp"
  • Supported data types:

    • Floating point (123.4, 5.67E3)
    • Integer (123, -1)
    • String ('Zhang San', "hello world")
    • Boolean (true, false)
    • Duration (1y, 1w, 1d, 1h, 1m, 1s, 1ms, 1us, 1ns representing 1 year/week/day/hour/minute/second/millisecond/microsecond/nanosecond respectively)

Query

Queries follow the following syntax paradigm. Note that the relative order of each part cannot be changed, such as time-expr cannot appear before filter-clause.

namespace::
    data-source
    target-clause
    filter-clause
    time-expr
    by-clause
    having-clause
    order-by-clause
    limit-clause
    offset-clause
    sorder-by-clause
    slimit-clause
    soffset-clause

From a syntax perspective, data-source is mandatory (similar to the FROM clause in SQL), while other parts are optional. However, certain constraints are imposed on the actual execution of queries (such as time_expr not allowing too large a time span).

Examples:

# Get all field data of the metric set cpu for the last 5 minutes
M::cpu [5m]

# Find all metric data matching the regular expression *db for the last 5 minutes
M::re('*db') [5m]

# Get all field data of the metric set cpu from 10 minutes ago to 5 minutes ago
M::cpu [10m:5m]

# Get all field data of the metric set cpu from 10 minutes ago to 5 minutes ago, aggregated at 1-minute intervals
M::cpu:(usage_idle) [10m:5m:1m]

# Query the time series data of the metric set cpu for the last 5 minutes for two fields time_active, time_guest_nice,
# filtered by host and cpu tags, and grouped by host and cpu.
M:: cpu:(time_active, time_guest_nice)
        { host = "host-name", cpu = "cpu0" } [5m] BY host,cpu

# Get the top 10 sorted by height in descending order
O::human:(height, age) { age > 100, sex = "female" } ORDER BY height desc LIMIT 10

M::cpu,mem:(time_active, time_guest_nice, host) { host = "host-name", cpu = "cpu0" } [5m] BY host,cpu

Note: :: and : can have whitespace characters on both sides, the following statements are equivalent:

M::cpu:(time_active, time_guest_nice)
    { host = "host-name", cpu = "cpu0" } [5m]

M   ::cpu : (time_active, time_guest_nice)
    { host = "host-name", cpu = "cpu0" } [5m]

M   :: cpu :   (time_active, time_guest_nice)
    { host = "host-name", cpu = "cpu0" } [5m]

Examples of Queries for Each Data Type

  • M::metric set name:(aggregation function(metric name)) { tag name = 'tag value' } BY tag name

    • For example: M::cpu:(last(usage_system)) {host=xxx} BY host_ip
  • L::log source:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name

    • For example: L::datakit:(COUNT(*)) { index = 'default' } BY host
  • O::category name:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name

    • For example: O::HOST:(COUNT(*)) { class = 'HOST' } BY host_ip
  • E::event source:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name

    • For example: E::monitor:(COUNT(create_time)) { create_time = 1688708829409 } BY df_event_id
  • T::service name:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name

    • For example: T::mysqli:(COUNT(resource)) { status = 'ok' } BY status
  • R::data source:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name

    • For example: R::error:(LAST(browser)) { city = 'unknown' } BY city
  • S::classification name:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name

    • For example: S::storage:(LAST(host)) { level = re('warn') } BY level
  • N::network source:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name

    • For example: N::httpflow:(FIRST(direction)) { http_version = '1.1' } BY dst_ip_type
  • P::Profiling name:(aggregation function(attribute name)) { attribute name = 'attribute value' } BY attribute name

    • For example: P::mysqli:(COUNT(resource)) { status = 'ok' } BY status

Examples without using aggregation functions:

  • Count the number of containers in different 【namespace】:

O::docker_containers:(COUNT(*)) BY namespace

  • Query all fields of containers and return the latest 10 records:

O::docker_containers {host=xxx} limit 10

Statements

namespace

Semantically, the following data sources are currently supported:

  • M/metric - Time series metric data
  • O/object - Object data
  • CO/custom_object - User Resource Catalog data
  • L/logging - Log data
  • E/event - Event data
  • T/tracing - Tracing data
  • R/rum - RUM data
  • F/func - Func function calculation
  • N/network - Network eBPF data lookup

Syntactically, data sources are not constrained. The data source syntax is as follows:

data-source ::
    # Specific query details...

In specific queries, if no data source is specified, it defaults to metric (or M), meaning time series metrics are the default data source for DQL.

target-clause

The result list of the query:

M::cpu:(time_active, system_usage) {host="biz_prod"} [5m]

# Here, calculations between different metrics (types should be basically matched) on the same metric set are supported
M::cpu:(time_active+1, time_active/time_guest_nice) [5m]

filter-clause

The filter clause is used to filter the result data, similar to the where condition in SQL:

# Query the height of centenarian females in the population object (__class=human)
O::human:(height) { age > 100, sex = "female" }

# Filter with regular expressions
O::human:(height) { age > 100, sex != re("female") }

# Filter with calculation expressions
O::human:(height) { (age + 1)/2 > 31, sex != re("female") }

# Filter with OR expressions
O::human:(height) { age > 31 || sex != re("female"), weight > 70}

# Result column with aggregation
M::cpu:(avg(time_active) AS time_active_avg) [1d::1h]

# Result column with aggregation filling
M::cpu:(fill(avg(time_active) AS time_active_avg, 0.1)) [1d::1h]

# Query with in list, where the relationship between options in the in list is logical OR, and the in list can only contain numbers or strings
O::human:(height) { age in [30, 40, 50], weight > 70}

About filling:

  • Numerical filling: Such as cpu:(fill(f1, 123), fill(f2, "foo bar"), fill(f3, 123.456))
  • Linear filling: Such as cpu:(fill(f1, LINEAR))
  • Previous value filling: Such as cpu:(fill(f1, PREVIOUS))

Note: Multiple filter conditions are defaulted to AND relationships, but if you want to express OR relationships, use the || operator. The following two statements are equivalent:

O::human:(height) { age > 31, sex != re("female") }
O::human:(height) { age > 31 && sex != re("female") }

A complex filter expression:

M::some_metric {(a>123.45 && b!=re("abc")) || (z!="abc"), c=re("xyz")} [1d::30m]

time-expr

DataFlux data characteristics have time attributes, so the time expression is represented by a separate clause:

time-expr consists of 3 parts [start_time:end_time:interval:rollup]:

Sequence Name Mandatory Description Example
1 start_time No Start time of time filter 1672502400000 / 1672502400
2 end_time No End time of time filter 1672588800000 / 1672588800
3 interval No Time aggregation period, generally needs to be used with aggregation or rolling aggregation, supports s, m, h, d and other time units, can be combined 1s/1m/1h etc
4 rollup No Rolling aggregation function name, currently supported aggregation functions are avg, sum, min, max, count, first, last, stddev last

Note:

start_time, end_time support 3 formats:

  • Numeric value with time unit, e.g.: 1m
  • Timestamp, e.g.: 1672502400
  • Time value in milliseconds, 1672502400000

interval time units support the following:

  • ns - Nanosecond
  • us - Microsecond
  • ms - Millisecond
  • s - Second
  • m - Minute
  • h - Hour
  • d - Day
  • w - Week
  • y - Year, specified as 365d, without distinguishing leap years.

rollup rolling aggregation function names, currently supported aggregation functions are:

  • avg: Average
  • sum: Sum
  • min: Minimum
  • max: Maximum
  • count: Count
  • first: First
  • last: Last
  • deriv: Rate of change per second, estimated by subtracting the first value from the last value and dividing by the time interval
  • rate: Rate of change per second, similar to deriv but does not return negative results, logic consistent with PromQL
  • irate: Instantaneous rate of change, estimated by the difference between the last two values and divided by the time interval, logic consistent with PromQL
  • p99, p95, p90, p75: Percentile calculation, supports using p followed by any number to express the percentage
  • median: Median, equivalent to p50
  • stddev: Standard deviation

rollup rolling aggregation is to split a single time series into different time periods according to the given interval, and perform aggregation on each time period. When rollup is empty, it means no rolling aggregation is performed.

Common examples:

  • [5m] - Last 5 minutes
  • [10m:5m] - Last 10 minutes to last 5 minutes
  • [10m:5m:1m] - Last 10 minutes to last 5 minutes, and the results are aggregated at 1-minute intervals
  • [1672502400000:1672588800000] - 2023-01-01 00:00:00 to 2023-01-02 00:00:00 time range
  • [1672502400:1672588800] - 2023-01-01 00:00:00 to 2023-01-02 00:00:00 time range

by-clause Statement

The BY clause is used to classify and aggregate the results, similar to GROUP BY in MySQL.

having-clause Statement

The HAVING clause is used to filter the results after aggregation, similar to HAVING in MySQL.

# Get all hosts with CPU usage greater than 80%
M::cpu:(max(`usage_total`) as `max_usage`) by host having max_usage > 80

order-by-clause Statement

The ORDER BY clause sorts the results, similar to ORDER BY in MySQL.

Note: 1. "Metric data" only supports sorting by the time field; 2. When the query groups by, order-by will not take effect, please use sorder-by for sorting.

# Get all host CPU usage, sorted by time in descending order
M::cpu:(`usage_total`) order by time desc
# Get all log data, sorted by response time in ascending order
L::`*`:(`*`) order by response_time asc

sorder-by-clause Statement

The SORDER BY clause sorts the groups.

# Get the maximum CPU usage of different hosts, sorted by host name in descending order
M::cpu:(max(`usage_total`)) by host sorder by host desc
# Get the maximum CPU usage of different hosts, sorted by maximum CPU usage in ascending order
M::cpu:(max(`usage_total`) as m) by host sorder by m
# Get the CPU usage of different hosts, sorted by the latest CPU usage in ascending order
M::cpu:(`usage_total`) sorder by usage_total

limit Statement

Used to specify the number of rows to return.

Note:

For time series data, if the DQL statement contains both by phrase and limit phrase, limit constrains the number of returned records in each aggregation group.

# Return three cpu records
M::cpu:() limit 3

# Return three cpu records for each host
M::cpu:() by host limit 3

slimit Statement

Used to specify the number of groups.

# Return CPU usage information for three hosts
M::cpu:(last(usage_total)) by host slimit 3

# Return CPU information for three hosts, where each host returns three records
M::cpu:() by host limit 3 slimit 3

SHOW Statement

SHOW_xxx is used to browse data (function names are case-insensitive):

  • SHOW_MEASUREMENT() - View the list of metric sets, supports filter-clause, limit and offset statements
  • SHOW_OBJECT_CLASS() - View the list of object classifications
  • SHOW_CUSTOM_OBJECT_SOURCE() - View the list of Resource Catalog data types
  • SHOW_EVENT_SOURCE() - View the list of event sources
  • SHOW_LOGGING_SOURCE() - View the list of log sources
  • SHOW_TRACING_SERVICE() - View the list of tracing sources
  • SHOW_RUM_TYPE() - View the list of RUM data types
  • SHOW_NETWORK_SOURCE() - View the list of network eBPF data types
  • SHOW_WORKSPACES() - View the current workspace and its authorized workspace information

More show functions, see Function Documentation

Nested Queries and Statement Blocks

Use () to represent the separation between subqueries and outer queries, such as two-level nesting:

M::(
# Subquery
M::cpu:(usage_total) {host='kind'}
):(last(usage_total))  # Outer query target column
{}  # Outer query filter condition

Three-level nesting:

M::(M::(M::cpu:(usage_total) {host='kind'}):(usage_total) {usage_total > 0} ):(last(usage_total))

In principle, there is no limit to the nesting levels. But multiple parallel subqueries are not allowed at a certain nesting level, such as:

object::(     # Second-level query
        object::( # Third-level query
                object::a:(f1,f2,f3) {host="foo"}
            ):(f1,f2),

        object::( # Parallel third-level query: not supported
                object::b:(f1,f2,f3) {host="foo"}
            ):(f1,f2)
    ):(f1)

Special Usage

If the message field is of json data type (currently only logs are supported), then the following form is supported to directly extract fields through DQL:

L::nginx { @abc.def = "xyz" }

It is equivalent to the following query, that is, @ represents message@json, which is a shorthand.

L::nginx { `message@json.abc.def` = "xyz" }

Function Description