Log Collection¶
TrueWatch offers comprehensive log collection capabilities, primarily divided into host log collection and K8S container log collection. The installation methods for DataKit differ between the two, and the log collection methods also vary. The collected log data is uniformly aggregated to TrueWatch for centralized storage, search, and analysis, helping us quickly identify and resolve issues.
This article mainly introduces how to collect logs in a host environment. For log collection in a K8S environment, refer to the best practices Log Collection in Kubernetes Clusters.
Prerequisites¶
Alternatively, you can log in to TrueWatch, go to Integration > DataKit, and select Linux, Windows, MacOS based on the host system to get the DataKit installation instructions and steps.
Log Collector Configuration¶
After installing DataKit, you can enable standard log collection or custom log collection to collect log data from system logs, application logs such as Nginx, Redis, Docker, ES, and more.
Navigate to the conf.d/log
directory under the DataKit installation directory, copy logging.conf.sample
and rename it to logging.conf
for configuration. After configuration, restart DataKit to apply the changes.
For details, refer to Host Log Collection.
Note
When configuring the log collector, you need to enable the Pipeline function to extract the log time time
and log level status
fields:
time
: The generation time of the log. If thetime
field is not extracted or fails to parse, the system current time is used by default;status
: The log level. If thestatus
field is not extracted, thestatus
is set tounknown
by default.
For more details, refer to the documentation Pipeline Configuration and Usage.
Log Data Storage¶
After configuring the log collector, restart DataKit, and the log data will be uniformly reported to the TrueWatch workspace.
- For users with a large amount of log data, we can configure Log Index or Log Blacklist to save data storage costs;
- For users who need long-term log storage, we can use Log Backup to preserve log data.
Large Time Deviation Data Writing
When writing data with timestamps significantly deviating from the current time, it can harm the efficiency of the min-max index in the storage engine's data blocks. This may result in scanning a large number of data blocks even for a small time range query, severely degrading query performance.
To mitigate this issue, the system defaults to filtering out data points with timestamps deviating more than 12 hours from the current time during writing (only discarding the timed-out data points, not the entire data packet). This mechanism helps maintain index effectiveness and improves query efficiency.
Kodo-X adds configuration:
kodo-x.yaml
: Add 3 parameters
enable_discard_expired_data
: Whether to enable discarding data with large time deviations, enabled by default
discard_expired_seconds
: The criterion for determining large time deviations, default value is 12 hours
discard_data_type
: The type of data to discard, default value: