DQL¶
DQL (Debug Query Language) is the core query language of the TrueWatch platform, designed for efficient querying and analysis of time series data, log data, event data, and more. DQL combines the semantic expression of SQL with the syntax structure of PromQL, aiming to provide users with a flexible and powerful query tool. This document will help you quickly understand the basic syntax and design philosophy of DQL, and demonstrate how to write DQL queries through examples.
Basic Query Structure¶
The basic query structure of DQL is as follows:
namespace[index]::datasource[:select-clause] [time-expr] [{where-clause}] [group-by-clause] [having-clause] [order-by-clause] [limit-clause] [offset-clause] [sorder-by-clause] [slimit-clause] [soffset-clause]
Execution order:
- Filter data based on
namespace::datasource
,time-expr
, andwhere-clause
; - If
time-expr
includes rollup, execute the rollup logic first; - Execute
group-by-clause
grouping and perform aggregate functions inselect-clause
within each group; - Execute
having-clause
to filter aggregate groups; - Execute non-aggregate functions in
select-clause
; - Execute
order-by-clause
,limit-clause
, andoffset-clause
to sort and paginate data within groups; - Execute
sorder-by-clause
,slimit-clause
, andsoffset-clause
to sort and paginate groups.
Namespace¶
DQL supports querying all business data types in TrueWatch, with each data type corresponding to a namespace as follows:
Namespace | Description |
---|---|
M | Metric, time series data |
L | Logging, log data |
O | Object, infrastructure object data |
OH | History object, infrastructure object historical data |
CO | Custom object, infrastructure custom object data |
COH | History custom object, infrastructure custom object historical data |
N | Network, network data (collected via eBPF collector) |
T | Trace, trace data |
P | Profile, profiling data |
R | RUM (Real User Monitoring), user access data |
E | Event, event data |
UE | Unrecover Event, unrecovered event data |
Index¶
An index can be understood as a table in a traditional database. Within a single namespace, data may be further split based on source or data volume to improve query performance. The most typical example is logs, which natively support multiple indices, allowing data to be routed to different indices based on various conditions in the console.
When index
is not explicitly specified, the default default
index is used. Multiple index names can be connected using commas, in which case data from multiple indices will be queried simultaneously. Index names must be explicitly declared; wildcards and regular expressions are not supported.
Due to historical reasons, we also support specifying indices using the index
column name in the where
clause, but its continued use is not recommended:
Data Source¶
The data source is similar to the FROM clause in SQL, used to specify the source of the specific data to be queried. The data source can be a specific dataset, a filter condition for a measurement, or a subquery.
Basic Data Source¶
The definition of data sources varies across different namespaces:
Namespace | Data Source |
---|---|
M | Measurement |
L | Source |
O | Basic object classification, enum values: host , host_processes , docker_containers |
OH | Infrastructure object classification, enum values (same as O definition) |
CO | Custom object classification, configurable via the console UI or specified in the reported object data. |
COH | Infrastructure custom object classification (same as CO definition) |
N | Enum values: netflow , dnsflow , httpflow , bash |
T | Service name (service) |
P | Service name (service) |
R | RUM (Real User Monitoring), user access data. Enum values: session , view , resource , long_task , action , error |
E | Event, event data. Enum values: monitor , audit , system |
Three forms are supported:
-
Directly write the data source name
This queries log data from the nginx source. -
Use
This queries log data from all sources, useful when the data source name is uncertain.*
to represent all data sources -
Use regular expressions to match data sources
This queries data from sources that start with "backend".
Subquery Data Source¶
A subquery is a special type of data source. The inner subquery is executed first to generate a temporary table, which is then used as the data source for the outer query.
Example:
In this example:
- The inner subquery
L::*:(count(error)) by app_id
calculates the number of error logs per application and groups them byapp_id
; - The outer query uses this result set as the data source to continue counting the number of affected Apps.
Select Clause¶
The select clause is used to specify the fields or expressions to be returned by the query. It is one of the most basic and important parts of a DQL query, determining which data is included in the query results.
Selecting Data Fields¶
You can specify the field names to be returned from the data source. Field names can be written in three forms:
- Directly write the name;
- When the field name contains special characters or keywords, wrap the field name with backticks
`
to avoid ambiguity; - Use
*
to represent all field names.
However, note that data wrapped in single or double quotes are strings, not field names.
- ✅
message
- ✅
`message`
,`limit`
- ❌
'message'
- ❌
"message"
Extracting JSON Key-Values¶
When the content of a data field is in JSON format, we can extract data from internal fields using JSON Path.
The structure is as follows:
You can specify the field name; when the field name is omitted, message
is used as the default value.
Currently, we only support a subset of JSON Path:
.Use dot notation to access Object keys
["key"] Use square brackets to access Object keys, commonly used when the key contains spaces or dots and needs escaping
[1] Use square brackets to access array elements
Example:
This statement will return the request_body
and user.id
key-values from the message
field in nginx log data.
Calculated Fields¶
- Expression Calculation
response_time
field in nginx log data divided by 1000.
- Function Calculation
This will return the maximum value of the response_time
field in nginx log data, grouped by host
.
Aliases¶
This will return the maximum value of the response_time
field in nginx log data, named max_response_time
, and grouped by host
.
Aliases are not always necessary; in most cases, we can directly use the underlying column name to access the generated column data.