Injecting logfwd via DataKit Operator¶
Operator injection of logfwd primarily collects internal Pod logs (logs not retained in container stdout). Its implementation principle is to inject a Sidecar container into the Pod. This Sidecar container directly collects logs from specified commands within the container and sends them to DataKit.
In conjunction with specific CRD configurations, the logfwd method allows for dynamic adjustment of collection settings for target Pods without the need to restart them.
sequenceDiagram
autonumber
box User pod
participant container as Business Container
participant logfwd as logfwd Sidecar
end
participant opr as DataKit Operator
participant crd as ClusterLoggingConfig
box DataKit
participant logfwds as logfwd Server
end
opr ->> logfwd: Inject logfwd
opr ->> crd: Watch CRD changes
opr ->> opr: Cache if any
logfwd ->> opr: Periodically poll for CRD changes (1min)
alt CRD changed
logfwd ->> logfwd: Update collection config
end
logfwd ->> container: Collect logs
logfwd ->> logfwds: Collect and report logs
Prerequisites¶
- DataKit enables the
logfwdservercollector, listening on the default port9533. - DataKit service needs to open port
9533so that other Pods can accessdatakit-service.datakit.svc:9533.
Usage Instructions¶
For Operator versions <= v1.6.0, please refer to here for logfwd injection usage.
Use ClusterLoggingConfig CRD for centralized log collection management: Version-1.7.0
- Centralized Collection Configuration Management: Supports listening to Kubernetes
ClusterLoggingConfigCRD and exposing matching results for logfwd sidecar polling (sidecar defaults to making an HTTP request to Operator every 60 seconds, logfwd requires Version-1.86.0). - Hot Updates & Granular Matching: CRD selector (Namespace/Pod/Label/Container) changes take effect immediately without rebuilding Workloads.
- Simplified Configuration: Log collection configuration is fully managed via CRD, and overriding configuration via Annotation is no longer supported.
If you are not yet familiar with the definition and writing method of ClusterLoggingConfig, please read the Container Log Collection CRD Configuration Document first.
Operation Flow:
- Register
ClusterLoggingConfigCRD (as described in DataKit documentation). - Upgrade/Install DataKit Operator v1.7.0 and add RBAC read permissions for the CRD.
- Set the
logfwdsarray in DataKit Operator configuration, configuringnamespace_selectors/label_selectorsmatching rules and thelog_configsfield. - (Optional) Add Annotation
admission.datakit/logfwd.enabled: "true"to the target Pod to allow injection (if set to"false", injection will be rejected). - Create
ClusterLoggingConfigresource, and the logfwd sidecar will periodically (default 60 seconds) pull the collection configuration.
Installing the latest datakit-operator.yaml will include the necessary permissions, or refer to the following minimal example:
Minimal Example
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: datakit-operator
rules:
- apiGroups: ["logging.datakits.io"]
resources: ["clusterloggingconfigs"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: datakit-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: datakit-operator
subjects:
- kind: ServiceAccount
name: datakit-operator
namespace: datakit
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: datakit-operator
namespace: datakit
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: datakit-operator
namespace: datakit
labels:
app: datakit-operator
spec:
replicas: 1 # Do not change the ReplicaSet number!
selector:
matchLabels:
app: datakit-operator
template:
metadata:
labels:
app: datakit-operator
spec:
serviceAccountName: datakit-operator
containers:
- name: operator
# other..
CRD Configuration¶
ClusterLoggingConfig Example:
apiVersion: logging.datakits.io/v1alpha1
kind: ClusterLoggingConfig
metadata:
name: nginx-logs
spec:
selector:
namespaceRegex: "^(middleware)$"
podLabelSelector: "app=logging"
podTargetLabels:
- app
- env
configs: # The following configuration corresponds one-to-one with log_configs in ConfigMap
- type: file
source: nginx-access
service: nginx
path: /var/log/nginx/access.log
pipeline: nginx-access.p
storage_index: app-logs
multiline_match: "^\\d{4}-\\d{2}-\\d{2}"
tags:
team: web
After applying the above resource, DataKit Operator will:
- Listen for Deployment creation events and inject the
datakit-logfwdSidecar container. - Match Pods based on the
ClusterLoggingConfigselector, continuously maintaining matching results for the Sidecar to read during polling. - After the Sidecar starts, it interacts with the Operator via
LOGFWD_DATAKIT_OPERATOR_ENDPOINT, pulling CRD configuration every 60 seconds and forwarding tasks to DataKitlogfwdserver.
Log Collection Configuration¶
To inject logfwd via Operator, add a configuration structure like the following to the Operator's ConfigMap:
{
"admission_inject_v2": { // Injection configuration v2
"logfwds": [
// Supports multiple logfwd configurations here
{ ... }, // Single logfwd configuration
{ ... }, // Another logfwd configuration
],
}
}
The fields supported by a single logfwd configuration are as follows:
| Field | Type | Description | Required | Example Value |
|---|---|---|---|---|
envs |
object | Environment variable configuration | Y | See example below |
image |
string | logfwd image address | Y | See example below |
label_selectors |
array | Label selectors | Y | ["logs-enabled=true"] |
log_configs |
string | Log configuration1 | Y | "[{\"type\":\"file\"...}]" |
log_volume_paths |
array | Log volume mount paths | Y | ["/var/log/app"] |
namespace_selectors |
array | Namespace selectors | Y | ["default"] |
resources |
object | Resource limit configuration | N | See example below |
Environment Variable Configuration¶
Logfwd injection adds several environment variables and image version requirements, which can be configured in the datakit-operator-config ConfigMap:
"logfwds": [
{
"image": "pubrepo.truewatch.com/datakit/logfwd:1.90.0",
"envs": {
"LOGFWD_DATAKIT_HOST": "{fieldRef:status.hostIP}",
"LOGFWD_DATAKIT_PORT": "9533",
"LOGFWD_DATAKIT_OPERATOR_ENDPOINT": "datakit-operator.datakit.svc:443",
"LOGFWD_GLOBAL_SERVICE": "{fieldRef:metadata.labels['app']}",
"LOGFWD_POD_NAME": "{fieldRef:metadata.name}",
"LOGFWD_POD_NAMESPACE": "{fieldRef:metadata.namespace}",
"LOGFWD_POD_IP": "{fieldRef:status.podIP}"
},
"log_configs": "",
"log_volume_paths": []
}
]
envs has the following options:
| Environment Variable Name | Configuration Item Meaning |
|---|---|
LOGFWD_DATAKIT_HOST |
DataKit instance address (IP or resolvable domain name) |
LOGFWD_DATAKIT_PORT |
DataKit logfwdserver listening port, e.g., 9533 |
LOGFWD_DATAKIT_OPERATOR_ENDPOINT |
DataKit Operator Endpoint, e.g., datakit-operator.datakit.svc:443 or https://datakit-operator.datakit.svc:443, used to query CRD configuration; leave empty to not attempt pulling. Supports automatic addition of https:// prefix |
LOGFWD_GLOBAL_SOURCE |
Global source, priority higher than source field in individual configuration |
LOGFWD_GLOBAL_SERVICE |
Global service, if service is not specified in individual configuration, use global value; if global value is also empty, fallback to source |
LOGFWD_GLOBAL_STORAGE_INDEX |
Global storage_index, priority higher than storage_index field in individual configuration |
LOGFWD_POD_NAME |
Automatically write pod_name tag, usually injected via Downward API |
LOGFWD_POD_NAMESPACE |
Automatically write namespace tag |
LOGFWD_POD_IP |
Automatically write pod_ip tag, facilitating container instance location |
Log Settings¶
log_configs is used for debugging or overriding CRD content. If log_configs is empty, logfwd injection will be skipped. Structure example:
[
{
"type": "file",
"disable": false,
"source": "nginx-access",
"service": "nginx",
"path": "/var/log/nginx/access.log",
"pipeline": "nginx-access.p",
"storage_index": "app-logs",
"multiline_match": "^\\d{4}-\\d{2}-\\d{2}",
"remove_ansi_escape_codes": false,
"from_beginning": false,
"character_encoding": "utf-8",
"tags": {
"env": "production",
"team": "backend"
}
}
]
| Field | Type | Required | Description | Example |
|---|---|---|---|---|
type |
string | Y | logfwd collection type can only be "file" |
"file" |
source |
string | Y | Log source identifier, used to distinguish different log streams | "nginx-access" |
path |
string | Y | Log file path (supports glob pattern), required when type=file | "/var/log/nginx/*.log" |
disable |
boolean | N | Whether to disable this collection configuration | false |
service |
string | N | Service the log belongs to, default value is log source (source) | "nginx" |
multiline_match |
string | N | Regular expression for the start line of multi-line logs, note that backslashes need to be escaped in JSON | "^\\d{4}-\\d{2}-\\d{2}" |
pipeline |
string | N | Log parsing pipeline configuration file name (needs to be configured on DataKit side) | "nginx-access.p" |
storage_index |
string | N | Index name for log storage | "app-logs" |
remove_ansi_escape_codes |
boolean | N | Whether to remove ANSI escape codes (color codes, etc.) from log data | false |
from_beginning |
boolean | N | Whether to start collecting logs from the beginning of the file (default is from the end) | false |
from_beginning_threshold_size |
int | N | When a file is found, if file size is smaller than this value, collect from beginning. Unit: bytes, default 20MB | 1000 |
character_encoding |
string | N | Character encoding, supports utf-8, utf-16le, utf-16be, gbk, gb18030 or empty string (auto-detect). Default is empty |
"utf-8" |
tags |
object | N | Additional tag key-value pairs, will be appended to each log record | {"env": "prod"} |
logfiles |
array | Y | List of files to collect | ["<your-logfile-path>"] Version-1.7.0 Deprecated |
ignore |
array | Y | List of files to ignore | ["<your-logfile-path>"] Version-1.7.0 Deprecated |
Mount Path Settings¶
log_volume_paths: List of host paths (string array) that need to be mounted, used to allow the sidecar to access real log files, e.g., ["/var/log", "/data/log"]. Please avoid having both parent and child paths to prevent Volume conflicts.
Annotation Support¶
Operator logfwd injection supports adding the following Annotations to application Pods:
admission.datakit/logfwd.enabled: Controls whether injection is allowed. Value"false"rejects injection; value"true"or unset allows injection (but requires matching rules andlog_configsfield to actually trigger injection).: Version-1.7.0 Removed, log collection configuration should be fully managed viaadmission.datakit/logfwd.log_configsClusterLoggingConfigCRD.: Version-1.7.0 Removed, log collection configuration should be fully managed viaadmission.datakit/logfwd.volume_pathsClusterLoggingConfigCRD.
Warning
If the log_configs field in the configuration is empty, logfwd injection will be skipped. Even if the Pod adds Annotation admission.datakit/logfwd.enabled: "true" and matches the selector rules, ensure that the log_configs field is not empty for successful injection.
Injection Example¶
Below is a Deployment example configuring log collection using the CRD method:
apiVersion: apps/v1
kind: Deployment
metadata:
name: logging-demo
namespace: middleware
labels:
app: logging
spec:
replicas: 1
selector:
matchLabels:
app: logging
template:
metadata:
labels:
app: logging
annotations:
admission.datakit/logfwd.enabled: "true"
spec:
containers:
- name: log-app
image: nginx:1.25
Corresponding ClusterLoggingConfig CRD resource also needs to be created to configure log collection rules.
Create resources using the yaml file:
Verify as follows:
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
logging-deployment-5d48bf9995-vt6bb 1/1 Running 0 4s
$ kubectl get pod logging-deployment-5d48bf9995-vt6bb -o=jsonpath={.spec.containers\[\*\].name}
log-container datakit-logfwd
Finally, you can check on the TrueWatch log platform whether logs are being collected.
-
Is a complex JSON string; needs escaping when embedded. ↩