DQL Definition¶
Global constraints are as follows:
-
Non-keywords (such as metric names, tag names, etc.) are case-sensitive, keywords and function names are case-insensitive.
-
#
is used as the line comment character, inline comments are not supported. -
Supported operators:
+
- Addition-
- Subtraction*
- Multiplication/
- Division%
- Modulus=
- Equal to!=
- Not equal to<=
- Less than or equal to<
- Less than>=
- Greater than or equal to>
- Greater than^
- Exponentiation&&
- Logical AND||
- Logical OR
-
Supported keywords:
-
Identifiers: Identifiers have several forms to be compatible with various variable naming conventions:
- Normal variable names can only contain
[_a-zA-Z0-9]
characters, and the first character cannot be a digit. For example,_abc, _abc123, _123ab
. - Other forms of variable names:
this+is-a*xx/yy^zz?variable
,by
should be written as`this+is-a*xx/yy^zz?variable`
,`by`
, the former variable contains operators, the latterby
is a DQL keyword.- Supports Chinese and other UTF8 identifiers, such as
M::cpu:(usage AS usage rate) [5m]
.- Supports emojis:
M::cpu:(usage AS usage rate👍) [5m]
.
- Supports emojis:
- Variables containing a backtick,
this`is-a-vairalbe
should be written as`identifier("this`is-a-vairalbe")`
.
- Normal variable names can only contain
-
String values can use double quotes and single quotes:
"this is a string"
and'this is a string'
are equivalent. -
Special strings:
-
base64 strings: DQL supports processing base64 strings. For base64 strings, DQL can automatically decode the original string during query, written as:
b64`some-base64-string`
b64'some-base64-string'
b64"some-base64-string"
-
Regular expression strings: The original
re('xxx')
is deprecated, it is recommended to use the following forms to identify regular strings.re`some-regexp`
* (recommended)*re'some-regexp'
re"some-regexp"
-
-
Supported data types:
- Floating point (
123.4
,5.67E3
) - Integer (
123
,-1
) - String (
'Zhang San'
,"hello world"
) - Boolean (
true
,false
) - Duration (
1y
,1w
,1d
,1h
,1m
,1s
,1ms
,1us
,1ns
representing 1 year/week/day/hour/minute/second/millisecond/microsecond/nanosecond respectively)
- Floating point (
Query¶
Queries follow the following syntax paradigm. Note that the relative order of each part cannot be changed, such as time-expr
cannot appear before filter-clause
.
namespace::
data-source
target-clause
filter-clause
time-expr
by-clause
having-clause
order-by-clause
limit-clause
offset-clause
sorder-by-clause
slimit-clause
soffset-clause
From a syntax perspective, data-source
is mandatory (similar to the FROM
clause in SQL), while other parts are optional. However, certain constraints are imposed on the actual execution of queries (such as time_expr
not allowing too large a time span).
Examples:
# Get all field data of the metric set cpu for the last 5 minutes
M::cpu [5m]
# Find all metric data matching the regular expression *db for the last 5 minutes
M::re('*db') [5m]
# Get all field data of the metric set cpu from 10 minutes ago to 5 minutes ago
M::cpu [10m:5m]
# Get all field data of the metric set cpu from 10 minutes ago to 5 minutes ago, aggregated at 1-minute intervals
M::cpu:(usage_idle) [10m:5m:1m]
# Query the time series data of the metric set cpu for the last 5 minutes for two fields time_active, time_guest_nice,
# filtered by host and cpu tags, and grouped by host and cpu.
M:: cpu:(time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m] BY host,cpu
# Get the top 10 sorted by height in descending order
O::human:(height, age) { age > 100, sex = "female" } ORDER BY height desc LIMIT 10
M::cpu,mem:(time_active, time_guest_nice, host) { host = "host-name", cpu = "cpu0" } [5m] BY host,cpu
Note: ::
and :
can have whitespace characters on both sides, the following statements are equivalent:
M::cpu:(time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m]
M ::cpu : (time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m]
M :: cpu : (time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m]
Examples of Queries for Each Data Type¶
-
M::
metric set name
:(aggregation function(metric name
)) {tag name
= 'tag value' } BYtag name
- For example: M::
cpu
:(last(usage_system
)) {host
=xxx
} BYhost_ip
- For example: M::
-
L::
log source
:(aggregation function(attribute name
)) {attribute name
= 'attribute value' } BYattribute name
- For example: L::
datakit
:(COUNT(*
)) {index
= 'default' } BYhost
- For example: L::
-
O::
category name
:(aggregation function(attribute name
)) {attribute name
= 'attribute value' } BYattribute name
- For example: O::
HOST
:(COUNT(*
)) {class
= 'HOST' } BYhost_ip
- For example: O::
-
E::
event source
:(aggregation function(attribute name
)) {attribute name
= 'attribute value' } BYattribute name
- For example: E::
monitor
:(COUNT(create_time
)) {create_time
= 1688708829409 } BYdf_event_id
- For example: E::
-
T::
service name
:(aggregation function(attribute name
)) {attribute name
= 'attribute value' } BYattribute name
- For example: T::
mysqli
:(COUNT(resource
)) {status
= 'ok' } BYstatus
- For example: T::
-
R::
data source
:(aggregation function(attribute name
)) {attribute name
= 'attribute value' } BYattribute name
- For example: R::
error
:(LAST(browser
)) {city
= 'unknown' } BYcity
- For example: R::
-
S::
classification name
:(aggregation function(attribute name
)) {attribute name
= 'attribute value' } BYattribute name
- For example: S::
storage
:(LAST(host
)) {level
= re('warn') } BYlevel
- For example: S::
-
N::
network source
:(aggregation function(attribute name
)) {attribute name
= 'attribute value' } BYattribute name
- For example: N::
httpflow
:(FIRST(direction
)) {http_version
= '1.1' } BYdst_ip_type
- For example: N::
-
P::
Profiling name
:(aggregation function(attribute name
)) {attribute name
= 'attribute value' } BYattribute name
- For example: P::
mysqli
:(COUNT(resource
)) {status
= 'ok' } BYstatus
- For example: P::
Examples without using aggregation functions:
- Count the number of containers in different 【namespace】:
O::docker_containers
:(COUNT(*
)) BY namespace
- Query all fields of containers and return the latest 10 records:
O::docker_containers
{host
=xxx
} limit 10
Statements¶
namespace¶
Semantically, the following data sources are currently supported:
- M/metric - Time series metric data
- O/object - Object data
- CO/custom_object - User Resource Catalog data
- L/logging - Log data
- E/event - Event data
- T/tracing - Tracing data
- R/rum - RUM data
- F/func - Func function calculation
- N/network - Network eBPF data lookup
Syntactically, data sources are not constrained. The data source syntax is as follows:
In specific queries, if no data source is specified, it defaults to metric
(or M
), meaning time series metrics are the default data source for DQL.
target-clause¶
The result list of the query:
M::cpu:(time_active, system_usage) {host="biz_prod"} [5m]
# Here, calculations between different metrics (types should be basically matched) on the same metric set are supported
M::cpu:(time_active+1, time_active/time_guest_nice) [5m]
filter-clause¶
The filter clause is used to filter the result data, similar to the where
condition in SQL:
# Query the height of centenarian females in the population object (__class=human)
O::human:(height) { age > 100, sex = "female" }
# Filter with regular expressions
O::human:(height) { age > 100, sex != re("female") }
# Filter with calculation expressions
O::human:(height) { (age + 1)/2 > 31, sex != re("female") }
# Filter with OR expressions
O::human:(height) { age > 31 || sex != re("female"), weight > 70}
# Result column with aggregation
M::cpu:(avg(time_active) AS time_active_avg) [1d::1h]
# Result column with aggregation filling
M::cpu:(fill(avg(time_active) AS time_active_avg, 0.1)) [1d::1h]
# Query with in list, where the relationship between options in the in list is logical OR, and the in list can only contain numbers or strings
O::human:(height) { age in [30, 40, 50], weight > 70}
About filling:
- Numerical filling: Such as
cpu:(fill(f1, 123), fill(f2, "foo bar"), fill(f3, 123.456))
- Linear filling: Such as
cpu:(fill(f1, LINEAR))
- Previous value filling: Such as
cpu:(fill(f1, PREVIOUS))
Note: Multiple filter conditions are defaulted to AND
relationships, but if you want to express OR
relationships, use the ||
operator. The following two statements are equivalent:
O::human:(height) { age > 31, sex != re("female") }
O::human:(height) { age > 31 && sex != re("female") }
A complex filter expression:
time-expr¶
DataFlux data characteristics have time attributes, so the time expression is represented by a separate clause:
time-expr
consists of 3 parts [start_time
:end_time
:interval
:rollup
]:
Sequence | Name | Mandatory | Description | Example |
---|---|---|---|---|
1 | start_time |
No | Start time of time filter | 1672502400000 / 1672502400 |
2 | end_time |
No | End time of time filter | 1672588800000 / 1672588800 |
3 | interval |
No | Time aggregation period, generally needs to be used with aggregation or rolling aggregation, supports s , m , h , d and other time units, can be combined |
1s /1m /1h etc |
4 | rollup |
No | Rolling aggregation function name, currently supported aggregation functions are avg , sum , min , max , count , first , last , stddev |
last |
Note:
start_time
, end_time
support 3 formats:
- Numeric value with time unit, e.g.:
1m
- Timestamp, e.g.:
1672502400
- Time value in milliseconds,
1672502400000
interval
time units support the following:
ns
- Nanosecondus
- Microsecondms
- Milliseconds
- Secondm
- Minuteh
- Hourd
- Dayw
- Weeky
- Year, specified as 365d, without distinguishing leap years.
rollup
rolling aggregation function names, currently supported aggregation functions are:
avg
: Averagesum
: Summin
: Minimummax
: Maximumcount
: Countfirst
: Firstlast
: Lastderiv
: Rate of change per second, estimated by subtracting the first value from the last value and dividing by the time intervalrate
: Rate of change per second, similar toderiv
but does not return negative results, logic consistent with PromQLirate
: Instantaneous rate of change, estimated by the difference between the last two values and divided by the time interval, logic consistent with PromQLp99
,p95
,p90
,p75
: Percentile calculation, supports using p followed by any number to express the percentagemedian
: Median, equivalent top50
stddev
: Standard deviation
rollup
rolling aggregation is to split a single time series into different time periods according to the given interval
, and perform aggregation on each time period. When rollup
is empty, it means no rolling aggregation is performed.
Common examples:
[5m]
- Last 5 minutes[10m:5m]
- Last 10 minutes to last 5 minutes[10m:5m:1m]
- Last 10 minutes to last 5 minutes, and the results are aggregated at 1-minute intervals[1672502400000:1672588800000]
-2023-01-01 00:00:00
to2023-01-02 00:00:00
time range[1672502400:1672588800]
-2023-01-01 00:00:00
to2023-01-02 00:00:00
time range
by-clause Statement¶
The BY
clause is used to classify and aggregate the results, similar to GROUP BY
in MySQL.
having-clause Statement¶
The HAVING
clause is used to filter the results after aggregation, similar to HAVING
in MySQL.
# Get all hosts with CPU usage greater than 80%
M::cpu:(max(`usage_total`) as `max_usage`) by host having max_usage > 80
order-by-clause Statement¶
The ORDER BY
clause sorts the results, similar to ORDER BY
in MySQL.
Note: 1. "Metric data" only supports sorting by the time field; 2. When the query groups by, order-by will not take effect, please use sorder-by for sorting.
# Get all host CPU usage, sorted by time in descending order
M::cpu:(`usage_total`) order by time desc
# Get all log data, sorted by response time in ascending order
L::`*`:(`*`) order by response_time asc
sorder-by-clause Statement¶
The SORDER BY
clause sorts the groups.
# Get the maximum CPU usage of different hosts, sorted by host name in descending order
M::cpu:(max(`usage_total`)) by host sorder by host desc
# Get the maximum CPU usage of different hosts, sorted by maximum CPU usage in ascending order
M::cpu:(max(`usage_total`) as m) by host sorder by m
# Get the CPU usage of different hosts, sorted by the latest CPU usage in ascending order
M::cpu:(`usage_total`) sorder by usage_total
limit Statement¶
Used to specify the number of rows to return.
Note:
For time series data, if the DQL statement contains both by phrase and limit phrase, limit constrains the number of returned records in each aggregation group.
# Return three cpu records
M::cpu:() limit 3
# Return three cpu records for each host
M::cpu:() by host limit 3
slimit Statement¶
Used to specify the number of groups.
# Return CPU usage information for three hosts
M::cpu:(last(usage_total)) by host slimit 3
# Return CPU information for three hosts, where each host returns three records
M::cpu:() by host limit 3 slimit 3
SHOW Statement¶
SHOW_xxx
is used to browse data (function names are case-insensitive):
SHOW_MEASUREMENT()
- View the list of metric sets, supportsfilter-clause
,limit
andoffset
statementsSHOW_OBJECT_CLASS()
- View the list of object classificationsSHOW_CUSTOM_OBJECT_SOURCE()
- View the list of Resource Catalog data typesSHOW_EVENT_SOURCE()
- View the list of event sourcesSHOW_LOGGING_SOURCE()
- View the list of log sourcesSHOW_TRACING_SERVICE()
- View the list of tracing sourcesSHOW_RUM_TYPE()
- View the list of RUM data typesSHOW_NETWORK_SOURCE()
- View the list of network eBPF data typesSHOW_WORKSPACES()
- View the current workspace and its authorized workspace information
More show functions, see Function Documentation
Nested Queries and Statement Blocks¶
Use ()
to represent the separation between subqueries and outer queries, such as two-level nesting:
M::(
# Subquery
M::cpu:(usage_total) {host='kind'}
):(last(usage_total)) # Outer query target column
{} # Outer query filter condition
Three-level nesting:
In principle, there is no limit to the nesting levels. But multiple parallel subqueries are not allowed at a certain nesting level, such as:
object::( # Second-level query
object::( # Third-level query
object::a:(f1,f2,f3) {host="foo"}
):(f1,f2),
object::( # Parallel third-level query: not supported
object::b:(f1,f2,f3) {host="foo"}
):(f1,f2)
):(f1)
Special Usage¶
If the message field is of json data type (currently only logs are supported), then the following form is supported to directly extract fields through DQL:
It is equivalent to the following query, that is, @
represents message@json
, which is a shorthand.