DQL Definition¶
The following is the definition of DataFlux Query Language (DQL). As different syntaxes are gradually supported, this document will be adjusted and updated accordingly.
Global constraints are as follows:
-
Non-keywords (such as metric names, tag names, etc.) are case-sensitive, keywords and function names are case-insensitive;
-
#
is used as the line comment character, inline comments are not supported; -
Supported operators:
+
- Addition-
- Subtraction*
- Multiplication/
- Division%
- Modulo=
- Equal to!=
- Not equal to<=
- Less than or equal to<
- Less than>=
- Greater than or equal to>
- Greater than^
- Exponentiation&&
- Logical AND||
- Logical OR
-
Supported keywords:
-
Identifiers: Identifiers have several forms to facilitate compatibility with various variable naming conventions:
- Normal variable names can only contain characters
[_a-zA-Z0-9]
, and the first character cannot be a number. For example,_abc, _abc123, _123ab
. - Other forms of variable names:
this+is-a*xx/yy^zz?variable
,by
should be written as`this+is-a*xx/yy^zz?variable`
,`by`
, the former contains operators in the variable, the latterby
is a DQL keyword;- Supports Chinese and other UTF8 identifiers, such as
M::cpu:(usage AS 使用率) [5m]
.- Supports emojis:
M::cpu:(usage AS 使用率👍) [5m]
.
- Supports emojis:
- Variables containing a backtick,
this`is-a-vairalbe
should be written as`identifier("this`is-a-vairalbe")`
to modify.
- Normal variable names can only contain characters
-
String values can use double quotes and single quotes:
"this is a string"
and'this is a string'
are equivalent. -
Special strings:
-
base64 strings: DQL supports processing base64 strings. For base64 strings, DQL can automatically decode the original string during querying. The writing is as follows:
b64`some-base64-string`
b64'some-base64-string'
b64"some-base64-string"
-
Regular expression strings: The original
re('xxx')
is deprecated. It is recommended to use the following forms to identify regular expression strings.re`some-regexp`
(Recommended)re'some-regexp'
re"some-regexp"
-
-
Supported data types:
- Floating point numbers (
123.4
,5.67E3
) - Integers (
123
,-1
) - Strings (
'张三'
,"hello world"
) - Boolean (
true
,false
) - Duration (
1y
,1w
,1d
,1h
,1m
,1s
,1ms
,1us
,1ns
representing 1 year/week/day/hour/minute/second/millisecond/microsecond/nanosecond respectively)
- Floating point numbers (
Query¶
Queries follow the following syntax paradigm. Note that the relative order of each part cannot be changed, such as time-expr
cannot appear before filter-clause
.
namespace::
data-source
target-clause
filter-clause
time-expr
by-clause
having-clause
order-by-clause
limit-clause
offset-clause
sorder-by-clause
slimit-clause
soffset-clause
From a syntax perspective, data-source
is mandatory (similar to the FROM
clause in SQL), while other parts are optional. However, certain constraints will be applied during the actual query execution (such as time_expr
not allowing too large a time span).
Examples:
# Get all fields data of the Measurement cpu in the last 5 minutes
M::cpu [5m]
# Find all metrics matching the regular expression *db in the last 5 minutes
M::re('*db') [5m]
# Get all fields data of the Measurement cpu from 10 minutes ago to 5 minutes ago
M::cpu [10m:5m]
# Get all fields data of the Measurement cpu from 10 minutes ago to 5 minutes ago, aggregated at 1-minute intervals
M::cpu:(usage_idle) [10m:5m:1m]
# Query the time series data Measurement cpu for the last 5 minutes, for the two fields time_active, time_guest_nice,
# Filtered by the tags host and cpu, and grouped by host and cpu to display the results.
M:: cpu:(time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m] BY host,cpu
# Get the top 10 sorted by height in descending order
O::human:(height, age) { age > 100, sex = "女性" } ORDER BY height desc LIMIT 10
M::cpu,mem:(time_active, time_guest_nice, host) { host = "host-name", cpu = "cpu0" } [5m] BY host,cpu
Note::::
and :
can have whitespace characters on both sides, the following statements are equivalent:
M::cpu:(time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m]
M ::cpu : (time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m]
M :: cpu : (time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m]
Examples of Queries for Each Data Type¶
-
M::
Measurement Name
:(Aggregation Function(Metric Name
)) {Tag Name
= 'Tag Value' } BYTag Name
- For example: M::
cpu
:(last(usage_system
)) {host
=xxx
} BYhost_ip
- For example: M::
-
L::
Log Source
:(Aggregation Function(Attribute Name
)) {Attribute Name
= 'Attribute Value' } BYAttribute Name
- For example: L::
datakit
:(COUNT(*
)) {index
= 'default' } BYhost
- For example: L::
-
O::
Category Name
:(Aggregation Function(Attribute Name
)) {Attribute Name
= 'Attribute Value' } BYAttribute Name
- For example: O::
HOST
:(COUNT(*
)) {class
= 'HOST' } BYhost_ip
- For example: O::
-
E::
Event Source
:(Aggregation Function(Attribute Name
)) {Attribute Name
= 'Attribute Value' } BYAttribute Name
- For example: E::
monitor
:(COUNT(create_time
)) {create_time
= 1688708829409 } BYdf_event_id
- For example: E::
-
T::
Service Name
:(Aggregation Function(Attribute Name
)) {Attribute Name
= 'Attribute Value' } BYAttribute Name
- For example: T::
mysqli
:(COUNT(resource
)) {status
= 'ok' } BYstatus
- For example: T::
-
R::
Data Source
:(Aggregation Function(Attribute Name
)) {Attribute Name
= 'Attribute Value' } BYAttribute Name
- For example: R::
error
:(LAST(browser
)) {city
= 'unknown' } BYcity
- For example: R::
-
S::
Classification Name
:(Aggregation Function(Attribute Name
)) {Attribute Name
= 'Attribute Value' } BYAttribute Name
- For example: S::
storage
:(LAST(host
)) {level
= re('warn') } BYlevel
- For example: S::
-
N::
Network Source
:(Aggregation Function(Attribute Name
)) {Attribute Name
= 'Attribute Value' } BYAttribute Name
- For example: N::
httpflow
:(FIRST(direction
)) {http_version
= '1.1' } BYdst_ip_type
- For example: N::
-
P::
Profiling Name
:(Aggregation Function(Attribute Name
)) {Attribute Name
= 'Attribute Value' } BYAttribute Name
- For example: P::
mysqli
:(COUNT(resource
)) {status
= 'ok' } BYstatus
- For example: P::
Examples without aggregation functions:
For example:
- Count the number of containers in different namespaces:
O::docker_containers
:(COUNT(*
)) BY namespace
- Query all fields of containers and return the latest 10 records:
O::docker_containers
{host
=xxx
} limit 10
Statements¶
namespace¶
Semantically, the following data sources are currently supported:
- M/metric - Time series metrics data
- O/object - Object data
- CO/custom_object - User Resource Catalog data
- L/logging - Log data
- E/event - Event data
- T/tracing - Tracing data
- R/rum - RUM data
- F/func - Func function calculation
- N/network - Network eBPF data lookup
From a syntax perspective, data sources are not constrained. The data source syntax is as follows:
In specific queries, if no data source is specified, the default is metric
(or M
), meaning time series metrics are the default data source in DQL.
target-clause¶
The result list of the query:
M::cpu:(time_active, system_usage) {host="biz_prod"} [5m]
# Here, calculations between different metrics (types should be basically compatible) on the same Measurement are supported
M::cpu:(time_active+1, time_active/time_guest_nice) [5m]
filter-clause¶
The filter clause is used to filter the result data, similar to the where
condition in SQL:
# Query the height of centenarian females in the population object (__class=human)
O::human:(height) { age > 100, sex = "女性" }
# Filter with regular expressions
O::human:(height) { age > 100, sex != re("女") }
# Filter with calculation expressions
O::human:(height) { (age + 1)/2 > 31, sex != re("女") }
# Filter with OR operation expressions
O::human:(height) { age > 31 || sex != re("女"), weight > 70}
# Result column with aggregation
M::cpu:(avg(time_active) AS time_active_avg) [1d::1h]
# Result column with aggregation filling
M::cpu:(fill(avg(time_active) AS time_active_avg, 0.1)) [1d::1h]
# Query with in list, where the relationship between options in the in list is logical OR, and the in list can only contain numbers or strings
O::human:(height) { age in [30, 40, 50], weight > 70}
About filling:
- Numeric filling: Such as
cpu:(fill(f1, 123), fill(f2, "foo bar"), fill(f3, 123.456))
- Linear filling: Such as
cpu:(fill(f1, LINEAR))
- Previous value filling: Such as
cpu:(fill(f1, PREVIOUS))
Note: Multiple filter conditions are defaulted to AND
relationship, but if you want to express OR
relationship, use the ||
operator. The following two statements are equivalent:
Here is a complex filter expression:
time-expr¶
DataFlux data characteristics all have time attributes, so the expression of time is represented by a separate clause:
time-expr
consists of 3 parts [start_time
:end_time
:interval
:rollup
]:
Number | Name | Mandatory | Description | Example |
---|---|---|---|---|
1 | start_time |
No | Start time of the time filter | 1672502400000 / 1672502400 |
2 | end_time |
No | End time of the time filter | 1672588800000 / 1672588800 |
3 | interval |
No | Time aggregation period, generally needs to be used in conjunction with aggregation or rolling aggregation, supports s , m , h , d and other time units, can be combined |
1s /1m /1h etc. |
4 | rollup |
No | Rolling aggregation function name, currently supported aggregation functions are avg , sum , min , max , count , first , last , stddev |
last |
Note:
start_time
, end_time
support 3 formats:
- Numeric values with time units, e.g.:
1m
- Timestamps, e.g.:
1672502400
- Time values in milliseconds,
1672502400000
interval
supports the following time units:
ns
- Nanosecondsus
- Microsecondsms
- Millisecondss
- Secondsm
- Minutesh
- Hoursd
- Daysw
- Weeksy
- Years, specified as 365d, without distinguishing leap years.
rollup
rolling aggregation function names, currently supported aggregation functions are:
avg
: Averagesum
: Summin
: Minimummax
: Maximumcount
: Countfirst
: Firstlast
: Lastderiv
: Rate of change per second, estimated by subtracting the first value from the last value and dividing by the time intervalrate
: Rate of change per second, similar toderiv
but does not return negative results, logic is consistent with PromQLirate
: Instantaneous rate of change, estimated by the difference between the last two values divided by the time interval, logic is consistent with PromQLp99
,p95
,p90
,p75
: Percentile calculation, supports using p followed by any number to express the percentagemedian
: Median, equivalent top50
stddev
: Standard deviation
rollup
rolling aggregation splits a single time series into different time periods according to the given interval
, and performs aggregation on each time period. When rollup
is empty, it means no rolling aggregation is performed.
Common examples:
[5m]
- Last 5 minutes[10m:5m]
- Last 10 minutes to last 5 minutes[10m:5m:1m]
- Last 10 minutes to last 5 minutes, and the results are aggregated at 1-minute intervals[1672502400000:1672588800000]
-2023-01-01 00:00:00
to2023-01-02 00:00:00
time range[1672502400:1672588800]
-2023-01-01 00:00:00
to2023-01-02 00:00:00
time range
by-clause Statement¶
The BY
clause is used to classify and aggregate the results, similar to GROUP BY
in MySQL.
having-clause Statement¶
The HAVING
clause is used to filter the results after aggregation, similar to HAVING
in MySQL.
# Get all hosts with CPU usage greater than 80%
M::cpu:(max(`usage_total`) as `max_usage`) by host having max_usage > 80
order-by-clause Statement¶
The ORDER BY
clause sorts the results, similar to ORDER BY
in MySQL.
Note: 1. "Metric data" only supports sorting by the time field; 2. When the query groups by, order-by will not take effect, please use sorder-by for sorting.
# Get all host CPU usage, sorted by time in descending order
M::cpu:(`usage_total`) order by time desc
# Get all log data, sorted by response time in ascending order
L::`*`:(`*`) order by response_time asc
sorder-by-clause Statement¶
The SORDER BY
clause sorts the groups.
# Get the maximum CPU usage of different hosts, sorted by host name in descending order
M::cpu:(max(`usage_total`)) by host sorder by host desc
# Get the maximum CPU usage of different hosts, sorted by maximum CPU usage in ascending order
M::cpu:(max(`usage_total`) as m) by host sorder by m
# Get the CPU usage of different hosts, sorted by the latest CPU usage in ascending order
M::cpu:(`usage_total`) sorder by usage_total
limit Statement¶
Used to specify the number of rows to return,
Note:
For time series data, if the dql statement contains both by phrase and limit phrase, limit constrains the number of returned rows in each aggregation group
# Return three cpu records
M::cpu:() limit 3
# Return three cpu records for each host
M::cpu:() by host limit 3
slimit Statement¶
Used to specify the number of groups
# Return CPU usage information for three hosts
M::cpu:(last(usage_total)) by host slimit 3
# Return CPU information for three hosts, where each host returns three records
M::cpu:() by host limit 3 slimit 3
SHOW Statement¶
SHOW_xxx
is used to browse data (function names are case-insensitive):
SHOW_MEASUREMENT()
- View the list of Measurements, supportsfilter-clause
,limit
andoffset
statementsSHOW_OBJECT_CLASS()
- View the list of object classificationsSHOW_CUSTOM_OBJECT_SOURCE()
- View the list of Resource Catalog data typesSHOW_EVENT_SOURCE()
- View the list of event sourcesSHOW_LOGGING_SOURCE()
- View the list of log sourcesSHOW_TRACING_SERVICE()
- View the list of tracing sourcesSHOW_RUM_TYPE()
- View the list of RUM data typesSHOW_NETWORK_SOURCE()
- View the list of network eBPF data typesSHOW_SECURITY_SOURCE()
- View the list of Security Check data typesSHOW_WORKSPACES()
- View the current workspace and its authorized workspace information
For more show functions, see Function Documentation
Nested Queries and Statement Blocks¶
Use ()
to represent the separation between subqueries and outer queries, such as two-layer nesting:
M::(
# Subquery
M::cpu:(usage_total) {host='kind'}
):(last(usage_total)) # Outer query target column
{} # Outer query filter condition
Three-layer nesting:
In principle, there is no limit to the nesting level. But multiple parallel subqueries are not allowed in a certain nesting level, such as:
object::( # Second layer query
object::( # Third layer query
object::a:(f1,f2,f3) {host="foo"}
):(f1,f2),
object::( # Parallel third layer query: not supported
object::b:(f1,f2,f3) {host="foo"}
):(f1,f2)
):(f1)
Special Usage¶
If the message field is of json data type (currently only logs are supported), then it is supported to directly extract fields through DQL in the following form:
It is equivalent to the following query, that is, @
represents message@json
, which is a shorthand.