Skip to content

Injecting logfwd via DataKit Operator

Operator injection of logfwd primarily collects internal Pod logs (logs not retained in container stdout). Its implementation principle is to inject a Sidecar container into the Pod. This Sidecar container directly collects logs from specified commands within the container and sends them to DataKit.

In conjunction with specific CRD configurations, the logfwd method allows for dynamic adjustment of collection settings for target Pods without the need to restart them.

sequenceDiagram
autonumber

box User pod
participant container as Business Container
participant logfwd as logfwd Sidecar
end

participant opr as DataKit Operator
participant crd as ClusterLoggingConfig

box DataKit
participant logfwds as logfwd Server
end

opr ->> logfwd: Inject logfwd
opr ->> crd: Watch CRD changes
opr ->> opr: Cache if any
logfwd ->> opr: Periodically poll for CRD changes (1min)

alt CRD changed
logfwd ->> logfwd: Update collection config
end

logfwd ->> container: Collect logs
logfwd ->> logfwds: Collect and report logs

Prerequisites

  1. DataKit enables the logfwdserver collector, listening on the default port 9533.
  2. DataKit service needs to open port 9533 so that other Pods can access datakit-service.datakit.svc:9533.

Usage Instructions

For Operator versions <= v1.6.0, please refer to here for logfwd injection usage.

Use ClusterLoggingConfig CRD for centralized log collection management: Version-1.7.0

  • Centralized Collection Configuration Management: Supports listening to Kubernetes ClusterLoggingConfig CRD and exposing matching results for logfwd sidecar polling (sidecar defaults to making an HTTP request to Operator every 60 seconds, logfwd requires Version-1.86.0).
  • Hot Updates & Granular Matching: CRD selector (Namespace/Pod/Label/Container) changes take effect immediately without rebuilding Workloads.
  • Simplified Configuration: Log collection configuration is fully managed via CRD, and overriding configuration via Annotation is no longer supported.

If you are not yet familiar with the definition and writing method of ClusterLoggingConfig, please read the Container Log Collection CRD Configuration Document first.

Operation Flow:

  1. Register ClusterLoggingConfig CRD (as described in DataKit documentation).
  2. Upgrade/Install DataKit Operator v1.7.0 and add RBAC read permissions for the CRD.
  3. Set the logfwds array in DataKit Operator configuration, configuring namespace_selectors/label_selectors matching rules and the log_configs field.
  4. (Optional) Add Annotation admission.datakit/logfwd.enabled: "true" to the target Pod to allow injection (if set to "false", injection will be rejected).
  5. Create ClusterLoggingConfig resource, and the logfwd sidecar will periodically (default 60 seconds) pull the collection configuration.

Installing the latest datakit-operator.yaml will include the necessary permissions, or refer to the following minimal example:

Minimal Example
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: datakit-operator
rules:
- apiGroups: ["logging.datakits.io"]
  resources: ["clusterloggingconfigs"]
  verbs: ["get", "list", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: datakit-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: datakit-operator
subjects:
- kind: ServiceAccount
  name: datakit-operator
  namespace: datakit

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: datakit-operator
  namespace: datakit

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: datakit-operator
  namespace: datakit
  labels:
    app: datakit-operator
spec:
  replicas: 1  # Do not change the ReplicaSet number!
  selector:
     matchLabels:
       app: datakit-operator
  template:
    metadata:
      labels:
        app: datakit-operator
    spec:
      serviceAccountName: datakit-operator
      containers:
      - name: operator
        # other..

CRD Configuration

ClusterLoggingConfig Example:

apiVersion: logging.datakits.io/v1alpha1
kind: ClusterLoggingConfig
metadata:
  name: nginx-logs
spec:
  selector:
    namespaceRegex: "^(middleware)$"
    podLabelSelector: "app=logging"
  podTargetLabels:
    - app
    - env
  configs: # The following configuration corresponds one-to-one with log_configs in ConfigMap
    - type: file
      source: nginx-access
      service: nginx
      path: /var/log/nginx/access.log
      pipeline: nginx-access.p
      storage_index: app-logs
      multiline_match: "^\\d{4}-\\d{2}-\\d{2}"
      tags:
        team: web

After applying the above resource, DataKit Operator will:

  1. Listen for Deployment creation events and inject the datakit-logfwd Sidecar container.
  2. Match Pods based on the ClusterLoggingConfig selector, continuously maintaining matching results for the Sidecar to read during polling.
  3. After the Sidecar starts, it interacts with the Operator via LOGFWD_DATAKIT_OPERATOR_ENDPOINT, pulling CRD configuration every 60 seconds and forwarding tasks to DataKit logfwdserver.

Log Collection Configuration

To inject logfwd via Operator, add a configuration structure like the following to the Operator's ConfigMap:

{
    "admission_inject_v2": {         // Injection configuration v2
        "logfwds": [
            // Supports multiple logfwd configurations here
            { ... }, // Single logfwd configuration
            { ... }, // Another logfwd configuration
        ],        
    }
}

The fields supported by a single logfwd configuration are as follows:

Field Type Description Required Example Value
envs object Environment variable configuration Y See example below
image string logfwd image address Y See example below
label_selectors array Label selectors Y ["logs-enabled=true"]
log_configs string Log configuration1 Y "[{\"type\":\"file\"...}]"
log_volume_paths array Log volume mount paths Y ["/var/log/app"]
namespace_selectors array Namespace selectors Y ["default"]
resources object Resource limit configuration N See example below

Environment Variable Configuration

Logfwd injection adds several environment variables and image version requirements, which can be configured in the datakit-operator-config ConfigMap:

"logfwds": [
    {
        "image": "pubrepo.truewatch.com/datakit/logfwd:1.90.0",
        "envs": {
            "LOGFWD_DATAKIT_HOST":              "{fieldRef:status.hostIP}",
            "LOGFWD_DATAKIT_PORT":              "9533",
            "LOGFWD_DATAKIT_OPERATOR_ENDPOINT": "datakit-operator.datakit.svc:443",
            "LOGFWD_GLOBAL_SERVICE":            "{fieldRef:metadata.labels['app']}",
            "LOGFWD_POD_NAME":                  "{fieldRef:metadata.name}",
            "LOGFWD_POD_NAMESPACE":             "{fieldRef:metadata.namespace}",
            "LOGFWD_POD_IP":                    "{fieldRef:status.podIP}"
        },
        "log_configs": "",
        "log_volume_paths": []
    }
]

envs has the following options:

Environment Variable Name Configuration Item Meaning
LOGFWD_DATAKIT_HOST DataKit instance address (IP or resolvable domain name)
LOGFWD_DATAKIT_PORT DataKit logfwdserver listening port, e.g., 9533
LOGFWD_DATAKIT_OPERATOR_ENDPOINT DataKit Operator Endpoint, e.g., datakit-operator.datakit.svc:443 or https://datakit-operator.datakit.svc:443, used to query CRD configuration; leave empty to not attempt pulling. Supports automatic addition of https:// prefix
LOGFWD_GLOBAL_SOURCE Global source, priority higher than source field in individual configuration
LOGFWD_GLOBAL_SERVICE Global service, if service is not specified in individual configuration, use global value; if global value is also empty, fallback to source
LOGFWD_GLOBAL_STORAGE_INDEX Global storage_index, priority higher than storage_index field in individual configuration
LOGFWD_POD_NAME Automatically write pod_name tag, usually injected via Downward API
LOGFWD_POD_NAMESPACE Automatically write namespace tag
LOGFWD_POD_IP Automatically write pod_ip tag, facilitating container instance location

Log Settings

log_configs is used for debugging or overriding CRD content. If log_configs is empty, logfwd injection will be skipped. Structure example:

[
  {
    "type": "file",
    "disable": false,
    "source": "nginx-access",
    "service": "nginx",
    "path": "/var/log/nginx/access.log",
    "pipeline": "nginx-access.p",
    "storage_index": "app-logs",
    "multiline_match": "^\\d{4}-\\d{2}-\\d{2}",
    "remove_ansi_escape_codes": false,
    "from_beginning": false,
    "character_encoding": "utf-8",
    "tags": {
      "env": "production",
      "team": "backend"
    }
  }
]
Field Type Required Description Example
type string Y logfwd collection type can only be "file" "file"
source string Y Log source identifier, used to distinguish different log streams "nginx-access"
path string Y Log file path (supports glob pattern), required when type=file "/var/log/nginx/*.log"
disable boolean N Whether to disable this collection configuration false
service string N Service the log belongs to, default value is log source (source) "nginx"
multiline_match string N Regular expression for the start line of multi-line logs, note that backslashes need to be escaped in JSON "^\\d{4}-\\d{2}-\\d{2}"
pipeline string N Log parsing pipeline configuration file name (needs to be configured on DataKit side) "nginx-access.p"
storage_index string N Index name for log storage "app-logs"
remove_ansi_escape_codes boolean N Whether to remove ANSI escape codes (color codes, etc.) from log data false
from_beginning boolean N Whether to start collecting logs from the beginning of the file (default is from the end) false
from_beginning_threshold_size int N When a file is found, if file size is smaller than this value, collect from beginning. Unit: bytes, default 20MB 1000
character_encoding string N Character encoding, supports utf-8, utf-16le, utf-16be, gbk, gb18030 or empty string (auto-detect). Default is empty "utf-8"
tags object N Additional tag key-value pairs, will be appended to each log record {"env": "prod"}
logfiles array Y List of files to collect ["<your-logfile-path>"] Version-1.7.0 Deprecated
ignore array Y List of files to ignore ["<your-logfile-path>"] Version-1.7.0 Deprecated

Mount Path Settings

log_volume_paths: List of host paths (string array) that need to be mounted, used to allow the sidecar to access real log files, e.g., ["/var/log", "/data/log"]. Please avoid having both parent and child paths to prevent Volume conflicts.

Annotation Support

Operator logfwd injection supports adding the following Annotations to application Pods:

  • admission.datakit/logfwd.enabled: Controls whether injection is allowed. Value "false" rejects injection; value "true" or unset allows injection (but requires matching rules and log_configs field to actually trigger injection).
  • admission.datakit/logfwd.log_configs: Version-1.7.0 Removed, log collection configuration should be fully managed via ClusterLoggingConfig CRD.
  • admission.datakit/logfwd.volume_paths: Version-1.7.0 Removed, log collection configuration should be fully managed via ClusterLoggingConfig CRD.
Warning

If the log_configs field in the configuration is empty, logfwd injection will be skipped. Even if the Pod adds Annotation admission.datakit/logfwd.enabled: "true" and matches the selector rules, ensure that the log_configs field is not empty for successful injection.

Injection Example

Below is a Deployment example configuring log collection using the CRD method:

apiVersion: apps/v1
kind: Deployment
metadata:
    name: logging-demo
    namespace: middleware
    labels:
    app: logging
spec:
    replicas: 1
    selector:
    matchLabels:
        app: logging
    template:
    metadata:
        labels:
        app: logging
        annotations:
        admission.datakit/logfwd.enabled: "true"
    spec:
        containers:
        - name: log-app
        image: nginx:1.25

Corresponding ClusterLoggingConfig CRD resource also needs to be created to configure log collection rules.

Create resources using the yaml file:

$ kubectl apply -f logging.yaml
...

Verify as follows:

$ kubectl get pod

NAME                                   READY   STATUS    RESTARTS      AGE
logging-deployment-5d48bf9995-vt6bb       1/1     Running   0             4s

$ kubectl get pod logging-deployment-5d48bf9995-vt6bb -o=jsonpath={.spec.containers\[\*\].name}
log-container datakit-logfwd

Finally, you can check on the TrueWatch log platform whether logs are being collected.


  1. Is a complex JSON string; needs escaping when embedded.