Skip to content

Log Collection

TrueWatch offers comprehensive log collection capabilities, primarily divided into host log collection and K8S container log collection. The installation methods for DataKit differ between the two, and the log collection methods also vary. The collected log data is uniformly aggregated to TrueWatch for centralized storage, search, and analysis, helping us quickly identify and resolve issues.

This article mainly introduces how to collect logs in a host environment. For log collection in a K8S environment, refer to the best practices Log Collection in Kubernetes Clusters.

Prerequisites

Install DataKit.

Alternatively, you can log in to TrueWatch, go to Integration > DataKit, and select Linux, Windows, MacOS based on the host system to get the DataKit installation instructions and steps.

Log Collector Configuration

After installing DataKit, you can enable standard log collection or custom log collection to collect log data from system logs, application logs such as Nginx, Redis, Docker, ES, and more.

Navigate to the conf.d/log directory under the DataKit installation directory, copy logging.conf.sample and rename it to logging.conf for configuration. After configuration, restart DataKit to apply the changes.

For details, refer to Host Log Collection.

By enabling the standard log collectors supported by TrueWatch, such as Nginx, Redis, ES, etc., you can start log collection with one click.

Note

When configuring the log collector, you need to enable the Pipeline function to extract the log time time and log level status fields:

  • time: The generation time of the log. If the time field is not extracted or fails to parse, the system current time is used by default;
  • status: The log level. If the status field is not extracted, the status is set to unknown by default.

For more details, refer to the documentation Pipeline Configuration and Usage.

Log Data Storage

After configuring the log collector, restart DataKit, and the log data will be uniformly reported to the TrueWatch workspace.

  • For users with a large amount of log data, we can configure Log Index or Log Blacklist to save data storage costs;
  • For users who need long-term log storage, we can use Log Backup to preserve log data.
Large Time Deviation Data Writing

❓ When writing data with timestamps significantly deviating from the current time, it can harm the efficiency of the min-max index in the storage engine's data blocks. This may result in scanning a large number of data blocks even for a small time range query, severely degrading query performance.

❗ To mitigate this issue, the system defaults to filtering out data points with timestamps deviating more than 12 hours from the current time during writing (only discarding the timed-out data points, not the entire data packet). This mechanism helps maintain index effectiveness and improves query efficiency.

Kodo-X adds configuration:

kodo-x.yaml: Add 3 parameters enable_discard_expired_data: Whether to enable discarding data with large time deviations, enabled by default discard_expired_seconds: The criterion for determining large time deviations, default value is 12 hours

global:
   enable_discard_expired_data: true
   discard_expired_seconds: 12 * 3600

discard_data_type: The type of data to discard, default value:

DiscardDataType: map[string]bool{
                // "metering": true,
                // "TAE":      true,
                // "AE":       true,
                "B":  true,
                "CO": true,
                "D":  true,
                "E":  true,
                "EL": true,
                "L":  true,
                // "NM":       true,
                "N":  true,
                "OH": true,
                "O":  true,
                "P":  true,
                // "RM":       true,
                "R": true,
                "S": true,
                // "TM":       true,
                "T": true,
            },