DataKit Pipeline Offload¶
The Pipeline Offload feature of DataKit can be used to reduce high data latency and high HOST load caused by data processing.
Configuration Method¶
It needs to be enabled in the main configuration file datakit.conf
. See the configuration below. The currently supported target receiver
options are datakit-http
and ploffload
, and multiple DataKit
addresses can be configured to achieve load balancing.
Note:
- Currently, it only supports offloading LOG category data processing tasks;
- The current
DataKit
address cannot be entered in theaddresses
configuration item, otherwise a loop will form, causing the data to stay within the currentDataKit
; - Ensure that the
DataWay
configuration of the targetDataKit
is consistent with the currentDataKit
, otherwise the receiving party will send data to itsDataWay
address; - If the
receiver
is set toploffload
, theploffload
collector must be enabled on the receiving end's DataKit.
Please check if the target network address is accessible from this machine. If the target listens on a loopback address, it will not be accessible.
Reference configuration:
[pipeline]
# Offload data processing tasks to post-level data processors.
[pipeline.offload]
receiver = "datakit-http"
addresses = [
# "http://<ip>:<port>"
]
If the receiving end DataKit has the ploffload
collector enabled, it can be configured as follows:
[pipeline]
# Offload data processing tasks to post-level data processors.
[pipeline.offload]
receiver = "ploffload"
addresses = [
# "http://<ip>:<port>"
]
Working Principle¶
After DataKit
locates the Pipeline
data processing script, it determines whether it is a remote script from TrueWatch
. If so, the data will be forwarded to the post-level data processor (such as DataKit
) for processing. The load balancing method is round-robin.
Deploy Post-Level Data Processor¶
There are several ways to deploy the data processor (DataKit) used for receiving computational tasks:
- Host Deployment
Currently, there is no support for a DataKit specifically dedicated to data processing. For host deployment of DataKit, refer to the documentation.
- Container Deployment
Environment variables ENV_DATAWAY
and ENV_HTTP_LISTEN
need to be set. The DataWay address should match the one configured in the DataKit with the Pipeline Offload function enabled. It is recommended to map the port listened by the DataKit running inside the container to the host machine.
Reference command: