Skip to content

Data Forwarding


For data that requires long-term storage but has a low update frequency (such as logs), you can use the data forwarding feature to automatically save it to object storage or forward it in real-time to external systems like Kafka. This feature filters user-required data based on rules, enabling low-cost long-term archiving while supporting subsequent secondary processing.

Once a rule takes effect, you can quickly search the stored data by setting the query time range and the rule on the data forwarding page.

How It Works

When data is forwarded to object storage, the workflow is as follows: User-reported data that matches the rules is first written line by line into a temporary file on the server's local disk. When the size of this temporary file accumulates to a preset value (e.g., 256MB) or the continuous writing time exceeds a set duration (e.g., 1 hour), the system automatically closes the current file and creates a new temporary file to continue receiving data.

Simultaneously, a background service continuously scans these closed temporary files, compresses them using the gzip format to reduce their size, and then uploads them to your specified object storage location according to predefined path rules. When you need to search this stored data, the system locates the relevant files in the object storage based on the same path rules, downloads and decompresses them, and matches each line against your search criteria.

File Format Specification

The final file format stored in object storage is: compressed using gzip. After decompression, the file content appears as multiple lines of text, with each line corresponding to a complete original data record saved in JSON format. Any empty lines in the file are automatically ignored by the system.

A typical data forwarding file for log-type data looks similar to the following:

The date field is mandatory, used to identify the timestamp of the data entry, as a millisecond-level Unix timestamp. The message field contains the specific log content.

{"__docid":"L_1750649205520_d1cciupkac7k1683bhq0","__namespace":"backup_log","date":1750649205520,"date_ns":168000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/gin.log","host":"X.local","log_read_lines":2,"message":"[GIN] 2025/06/23 - 11:26:43 | 200 | 1.012923708s |       127.0.0.1 | GET     \"/metrics\"","message_length":87,"service":"default","source":"default","status":"unknown"}

{"__docid":"L_1750649205516_d1cciupkac7k1683bhqg","__namespace":"backup_log","date":1750649205516,"date_ns":897000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/gin.log","host":"X.local","log_read_lines":1,"message":"[GIN] 2025/06/23 - 11:26:38 | 200 | 1.012696542s |       127.0.0.1 | GET     \"/metrics\"","message_length":87,"service":"default","source":"default","status":"unknown"}

{"__docid":"L_1750649206520_d1cciupkac7k1683bhr0","__namespace":"backup_log","date":1750649206520,"date_ns":948000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":150,"message":"2025-06-23T11:26:46.520+0800\tWARN\thost_processes\tprocess/input.go:332\tprocess: {\"pid\":411}, proc.PageFaults(): not implemented yet","message_length":130,"service":"default","source":"default","status":"unknown"}

{"__docid":"L_1750649205520_d1cciupkac7k1683bhrg","__namespace":"backup_log","date":1750649205520,"date_ns":419000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":9,"message":"2025-06-23T11:26:43.876+0800\tWARN\tcontainer\tcontainer/impl.go:254\tendpoint unix:///var/run/crio/crio.sock does not exist, maybe it is not running, skip","message_length":151,"service":"default","source":"default","status":"unknown"}

{"__docid":"L_1750649205517_d1cciupkac7k1683bhs0","__namespace":"backup_log","date":1750649205517,"date_ns":79000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":1,"message":"2025-06-23T11:26:38.365+0800\tWARN\thttp\thttpapi/http.go:494\tlistener.Close failed: close tcp [::]:9529: use of closed network connection","message_length":135,"service":"default","source":"default","status":"unknown"}
{"__docid":"L_1750649205517_d1cciupkac7k1683bhsg","__namespace":"backup_log","date":1750649205517,"date_ns":80000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":2,"message":"2025-06-23T11:26:38.365+0800\tWARN\thttp\thttpapi/http.go:494\tlistener.Close failed: close tcp [::]:9529: use of closed network connection","message_length":135,"service":"default","source":"default","status":"unknown"}

File Naming and Storage Path

[{$path_prefix}/]{$workspace_uuid}/[{$data_type}/]

{$rule_name}/{$year}/{$month}/{$day}/{$hour}/{$time}-{$hostname}.gz

Parts enclosed in [] indicate "optional". Please refer to the details below:

Variable
Description
Example Notes
$path_prefix Path prefix path/to/backup Optional, corresponds to the storage path option when creating a backup rule
Object storage does not support keys starting with /, so do not start with /
$workspace_uuid Workspace id wksp_d9a1851859e040469d290409bc17cceb
$data_type Backup data type, optional values:

  • logging: LOG
  • rum: RUM
  • tracing: APM
  • event: Event
  • audit_event: Audit Event
  • tracing Since LOG is the default data type, for LOG-type data, the {$data_type}/ part (i.e., logging/) should be omitted
    $rule_name Rule name backup_logging_for_test Corresponds to the rule name option when creating a rule
    It is recommended to use English
    $year Year of log occurrence time, 4 digits 2025 UTC timezone
    $month Month of log occurrence time, 2 digits 03 UTC timezone
    $day Day of log occurrence time, 2 digits 01 UTC timezone
    $hour Hour of log occurrence time, 2 digits 22 UTC timezone
    $time Occurrence time of the last log in the file
    Format: HHMMSS + 3-digit milliseconds
    220607889 UTC timezone
    $hostname First 16 characters of the MD5 hash of the hostname c6a92aafa992599c When constructing files manually, you can use the crc64 of the current file
    or generate a random 64-bit number and convert it to hexadecimal

    Path examples:

    wksp_d9a1851859e040469d290409bc17cceb/backup_logging_for_test/2025/05/06/17/175950000-c6a92aafa992599c.gz

    path/to/backup/wksp_d9a1851859e040469d290409bc17cceb/tracing/test-minio/2025/05/06/17/175950000-c6a92aafa992599c.gz

    File Splitting Rules
    • Time boundary: A single file contains logs only from the same hour, never crossing hours
    • Size boundary: The uncompressed original file is controlled between 256 MB and 512 MB. After gzip compression, it is typically tens of MB to a hundred MB. Files that are too large or too small will reduce search efficiency.

    You can upload external files to object storage following the format and path rules generated by the data forwarding feature. The console will search and display them in the same way.

    Getting Started

    Create Forwarding Rules

    Based on different archive types, establish data forwarding rules that meet your business needs:

    AWS S3

    Huawei Cloud OBS

    Alibaba Cloud OSS

    Kafka Message Queue

    Volcengine TOS

    Manage Forwarding Rules

    In the data forwarding rules list, you can perform a series of operations to manage them.