Data Forwarding¶
For data that needs to be stored for a long time but has a low update frequency (such as logs), the data forwarding feature can automatically save it to object storage or forward it in real-time to external systems like Kafka. This feature filters the data users need according to rules, enabling low-cost long-term archiving and supporting subsequent secondary processing.
Once the rules take effect, you can quickly retrieve the stored data by setting the query period and rules on the data forwarding page.
How It Works¶
When data is forwarded to object storage, the workflow is as follows: User-reported data that meets the rules is first written line by line to a temporary file on the server's local disk. When the size of this temporary file accumulates to a preset value (e.g., 256MB) or the continuous writing time exceeds the set duration (e.g., 1 hour), the system automatically closes the current file and creates a new temporary file to continue receiving data.
At the same time, a background service continuously scans these closed temporary files, compresses them using the gzip format to reduce their size, and then uploads them to the object storage location you specified according to the predefined path rules. When you need to search for these stored data, the system locates the relevant files in the object storage according to the same path rules, downloads and decompresses them, and matches your search criteria line by line.
File Format Description¶
The final file format stored in object storage is: compressed with gzip. After decompression, the file content appears as multiple lines of text, each line corresponding to a raw data record, with the data saved in JSON format. Any blank lines in the file are automatically ignored by the system.
A typical data forwarding file for log type data is similar to the following:
Where date
is a required field, used to identify the time of the data, as a millisecond-level Unix timestamp, and the message
field is the specific log content.
{"__docid":"L_1750649205520_d1cciupkac7k1683bhq0","__namespace":"backup_log","date":1750649205520,"date_ns":168000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/gin.log","host":"X.local","log_read_lines":2,"message":"[GIN] 2025/06/23 - 11:26:43 | 200 | 1.012923708s | 127.0.0.1 | GET \"/metrics\"","message_length":87,"service":"default","source":"default","status":"unknown"}
{"__docid":"L_1750649205516_d1cciupkac7k1683bhqg","__namespace":"backup_log","date":1750649205516,"date_ns":897000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/gin.log","host":"X.local","log_read_lines":1,"message":"[GIN] 2025/06/23 - 11:26:38 | 200 | 1.012696542s | 127.0.0.1 | GET \"/metrics\"","message_length":87,"service":"default","source":"default","status":"unknown"}
{"__docid":"L_1750649206520_d1cciupkac7k1683bhr0","__namespace":"backup_log","date":1750649206520,"date_ns":948000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":150,"message":"2025-06-23T11:26:46.520+0800\tWARN\thost_processes\tprocess/input.go:332\tprocess: {\"pid\":411}, proc.PageFaults(): not implemented yet","message_length":130,"service":"default","source":"default","status":"unknown"}
{"__docid":"L_1750649205520_d1cciupkac7k1683bhrg","__namespace":"backup_log","date":1750649205520,"date_ns":419000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":9,"message":"2025-06-23T11:26:43.876+0800\tWARN\tcontainer\tcontainer/impl.go:254\tendpoint unix:///var/run/crio/crio.sock does not exist, maybe it is not running, skip","message_length":151,"service":"default","source":"default","status":"unknown"}
{"__docid":"L_1750649205517_d1cciupkac7k1683bhs0","__namespace":"backup_log","date":1750649205517,"date_ns":79000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":1,"message":"2025-06-23T11:26:38.365+0800\tWARN\thttp\thttpapi/http.go:494\tlistener.Close failed: close tcp [::]:9529: use of closed network connection","message_length":135,"service":"default","source":"default","status":"unknown"}
{"__docid":"L_1750649205517_d1cciupkac7k1683bhsg","__namespace":"backup_log","date":1750649205517,"date_ns":80000,"df_metering_size":-9223372036854775808,"filepath":"/var/log/datakit/log","host":"X.local","log_read_lines":2,"message":"2025-06-23T11:26:38.365+0800\tWARN\thttp\thttpapi/http.go:494\tlistener.Close failed: close tcp [::]:9529: use of closed network connection","message_length":135,"service":"default","source":"default","status":"unknown"}
File Naming and Storage Path¶
[{$path_prefix}/]{$workspace_uuid}/[{$data_type}/]
{$rule_name}/{$year}/{$month}/{$day}/{$hour}/{$time}-{$hostname}.gz
The parts enclosed in []
are "optional", please refer to the following description for details:
Variable | Description | Example | Notes |
---|---|---|---|
$path_prefix |
Path prefix | path/to/backup |
Optional, corresponds to the storage path option when creating a new backup rule Object storage does not support keys starting with / , so do not start with / |
$workspace_uuid |
Workspace ID | wksp_d9a1851859e040469d290409bc17cceb |
|
$data_type |
Backup data type, optional values:logging : Logsrum : RUMtracing : Tracingevent : Eventsaudit_event : Audit Events |
tracing |
Since logs are the default data type, for log type data, the {$data_type}/ (i.e., logging/ ) part should be omitted |
$rule_name |
Rule name | backup_logging_for_test |
Corresponds to the rule name option when creating a new rule It is recommended to use English |
$year |
Year of the log occurrence time, 4 digits | 2025 |
UTC timezone |
$month |
Month of the log occurrence time, 2 digits | 03 |
UTC timezone |
$day |
Day of the log occurrence time, 2 digits | 01 |
UTC timezone |
$hour |
Hour of the log occurrence time, 2 digits | 22 |
UTC timezone |
$time |
Occurrence time of the last log in the file Format: HHMMSS + 3-digit milliseconds |
220607889 |
UTC timezone |
$hostname |
First 16 digits of the MD5 hash of the hostname | c6a92aafa992599c |
When constructing the file yourself, you can use the crc64 of the current file or generate a 64-bit random number and then convert it to hexadecimal |
Path examples:
wksp_d9a1851859e040469d290409bc17cceb/backup_logging_for_test/2025/05/06/17/175950000-c6a92aafa992599c.gz
path/to/backup/wksp_d9a1851859e040469d290409bc17cceb/tracing/test-minio/2025/05/06/17/175950000-c6a92aafa992599c.gz
File Splitting Rules
- Time boundary: A single file only contains logs from the same hour, never crossing hours.
- Size boundary: The uncompressed original file is controlled between 256 MB and 512 MB, and after gzip compression, it is usually tens of MB to hundreds of MB. Too large or too small files will reduce retrieval efficiency.
You can upload external files to object storage according to the format and path rules generated by the data forwarding rules, and the console will search and display them in the same way.