Skip to content

DQL


DQL (Debug Query Language) is the core query language of the TrueWatch platform, designed for efficient querying and analysis of time series data, log data, event data, and more. DQL combines the semantic expression of SQL with the syntax structure of PromQL, aiming to provide users with a flexible and powerful query tool. This document will help you quickly understand the basic syntax and design philosophy of DQL, and demonstrate how to write DQL queries through examples.

Basic Query Structure

The basic query structure of DQL is as follows:

namespace[index]::datasource[:select-clause] [time-expr] [{where-clause}] [group-by-clause] [having-clause] [order-by-clause] [limit-clause] [offset-clause] [sorder-by-clause] [slimit-clause] [soffset-clause]

Execution order:

  1. Filter data based on namespace::datasource, time-expr, and where-clause;
  2. If time-expr includes rollup, execute the rollup logic first;
  3. Execute group-by-clause grouping and perform aggregate functions in select-clause within each group;
  4. Execute having-clause to filter aggregate groups;
  5. Execute non-aggregate functions in select-clause;
  6. Execute order-by-clause, limit-clause, and offset-clause to sort and paginate data within groups;
  7. Execute sorder-by-clause, slimit-clause, and soffset-clause to sort and paginate groups.

Namespace

DQL supports querying all business data types in TrueWatch, with each data type corresponding to a namespace as follows:

Namespace Description
M Metric, time series data
L Logging, log data
O Object, infrastructure object data
OH History object, infrastructure object historical data
CO Custom object, infrastructure custom object data
COH History custom object, infrastructure custom object historical data
N Network, network data (collected via eBPF collector)
T Trace, trace data
P Profile, profiling data
R RUM (Real User Monitoring), user access data
E Event, event data
UE Unrecover Event, unrecovered event data

Index

An index can be understood as a table in a traditional database. Within a single namespace, data may be further split based on source or data volume to improve query performance. The most typical example is logs, which natively support multiple indices, allowing data to be routed to different indices based on various conditions in the console.

When index is not explicitly specified, the default default index is used. Multiple index names can be connected using commas, in which case data from multiple indices will be queried simultaneously. Index names must be explicitly declared; wildcards and regular expressions are not supported.

L("default")::`nginx` is equivalent to L::`nginx`
L("index1", "index2")::`nginx`

Due to historical reasons, we also support specifying indices using the index column name in the where clause, but its continued use is not recommended:

L::`nginx` { index = "index1" }
L::`nginx` { index IN ["index1", "index2"] }

Data Source

The data source is similar to the FROM clause in SQL, used to specify the source of the specific data to be queried. The data source can be a specific dataset, a filter condition for a measurement, or a subquery.

Basic Data Source

The definition of data sources varies across different namespaces:

Namespace Data Source
M Measurement
L Source
O Basic object classification, enum values: host, host_processes, docker_containers
OH Infrastructure object classification, enum values (same as O definition)
CO Custom object classification, configurable via the console UI or specified in the reported object data.
COH Infrastructure custom object classification (same as CO definition)
N Enum values: netflow, dnsflow, httpflow, bash
T Service name (service)
P Service name (service)
R RUM (Real User Monitoring), user access data. Enum values: session, view, resource, long_task, action, error
E Event, event data. Enum values: monitor, audit, system

Three forms are supported:

  1. Directly write the data source name

    L::`nginx`
    
    This queries log data from the nginx source.

  2. Use * to represent all data sources

    L::*
    
    This queries log data from all sources, useful when the data source name is uncertain.

  3. Use regular expressions to match data sources

    L::re("backend.*")
    

    This queries data from sources that start with "backend".

Subquery Data Source

A subquery is a special type of data source. The inner subquery is executed first to generate a temporary table, which is then used as the data source for the outer query.

Example:

L::( L::*:(count(error)) by app_id ):(count_distinct(app_id))

In this example:

  1. The inner subquery L::*:(count(error)) by app_id calculates the number of error logs per application and groups them by app_id;
  2. The outer query uses this result set as the data source to continue counting the number of affected Apps.

Select Clause

The select clause is used to specify the fields or expressions to be returned by the query. It is one of the most basic and important parts of a DQL query, determining which data is included in the query results.

Selecting Data Fields

You can specify the field names to be returned from the data source. Field names can be written in three forms:

  1. Directly write the name;
  2. When the field name contains special characters or keywords, wrap the field name with backticks ` to avoid ambiguity;
  3. Use * to represent all field names.

However, note that data wrapped in single or double quotes are strings, not field names.

  • message
  • `message`, `limit`
  • 'message'
  • "message"

Extracting JSON Key-Values

When the content of a data field is in JSON format, we can extract data from internal fields using JSON Path.

The structure is as follows:

[field-name]@json-path

You can specify the field name; when the field name is omitted, message is used as the default value. Currently, we only support a subset of JSON Path:

  1. .Use dot notation to access Object keys
  2. ["key"] Use square brackets to access Object keys, commonly used when the key contains spaces or dots and needs escaping
  3. [1] Use square brackets to access array elements

Example:

L::`nginx`:(`message@request_body`, `@request_header["user.id"]`) limit 1

This statement will return the request_body and user.id key-values from the message field in nginx log data.

Calculated Fields

  • Expression Calculation

L::`nginx`:(`response_time` / 1000) limit 1
This will return the value of the response_time field in nginx log data divided by 1000.

  • Function Calculation
L::`nginx`:(max(`response_time`)) by `host`

This will return the maximum value of the response_time field in nginx log data, grouped by host.

Aliases

L::`nginx`:(max(`response_time`) as `max_response_time`) by `host`

This will return the maximum value of the response_time field in nginx log data, named max_response_time, and grouped by host.

Aliases are not always necessary; in most cases, we can directly use the underlying column name to access the generated column data.

L::`nginx`:(max