Journald

The Journald collector is used to collect logs from the systemd journal (journald) on Linux systems. It uses an external binary wrapper to interface with libsystemd and efficiently collects structured log entries from the journal.

Prerequisites¶

Linux only: Requires systemd and journald
libsystemd: External binary requires libsystemd development libraries
Permissions: DataKit needs read access to journal files (typically requires joining systemd-journal group)

System Requirements Check¶

Before deploying the journald collector, verify your system meets the requirements:

Quick check with one-liner:

systemctl --version >/dev/null 2>&1 && journalctl -n 1 >/dev/null 2>&1 && echo "Systemd OK" || echo "Systemd not available"

Comprehensive pre-flight check script:

journald-prereq-check.sh

#!/bin/bash
# journald-prereq-check.sh - Verify systemd requirements

echo "=== Journald Collector Prerequisites Check ==="
echo

# 1. Check if systemctl exists
echo -n "1. systemctl command: "
if command -v systemctl >/dev/null 2>&1; then
    VERSION=$(systemctl --version | head -1)
    echo "✅ Found - $VERSION"
else
    echo "❌ NOT FOUND - systemctl not installed"
    exit 1
fi

# 2. Check libsystemd library
echo -n "2. libsystemd.so.0: "
if ldconfig -p 2>/dev/null | grep -q "libsystemd.so.0"; then
    LIBPATH=$(ldconfig -p 2>/dev/null | grep "libsystemd.so.0" | head -1 | awk '{print $NF}')
    echo "✅ Found - $LIBPATH"
else
    echo "❌ NOT FOUND - libsystemd.so.0 missing"
    exit 1
fi

# 3. Check journalctl access
echo -n "3. journalctl access: "
if journalctl -n 1 >/dev/null 2>&1; then
    echo "✅ OK - Can read journal"
else
    echo "⚠️  LIMITED - journalctl exists but no read access"
fi

# 4. Check journal directories
echo "4. Journal directories:"
for dir in "/var/log/journal" "/run/log/journal"; do
    echo -n "   $dir: "
    if [ -d "$dir" ]; then
        if [ -r "$dir" ]; then
            echo "✅ Exists and readable"
        else
            echo "⚠️  Exists but NOT readable"
        fi
    else
        echo "❌ NOT FOUND"
    fi
done

# 5. Check systemd version
echo -n "5. systemd version: "
SYSTEMD_VERSION=$(systemctl --version | head -1 | grep -oP 'systemd \K\d+' || echo "0")
if [ "$SYSTEMD_VERSION" -ge 205 ]; then
    echo "✅ v$SYSTEMD_VERSION (meets minimum v205)"
else
    echo "⚠️  v$SYSTEMD_VERSION (older than recommended v205)"
fi

echo
echo "=== Check Complete ==="

Save as journald-prereq-check.sh and run:

chmod +x journald-prereq-check.sh
./journald-prereq-check.sh

Expected output:

=== Journald Collector Prerequisites Check ===

1. systemctl command: ✅ Found - systemd 257 (257.3-1-arch)
2. libsystemd.so.0: ✅ Found - /usr/lib/libsystemd.so.0
3. journalctl access: ✅ OK - Can read journal
4. Journal directories:
   /var/log/journal: ✅ Exists and readable
   /run/log/journal: ✅ Exists and readable
5. systemd version: ✅ v257 (meets minimum v205)

=== Check Complete ===

Possible troubleshooting solutions:

Issue	Solution
`systemctl: command not found`	Install systemd or use alternative log collection
`libsystemd.so.0: cannot open`	Install systemd-libs: `apt install libsystemd0` or `yum install systemd-libs`
`journalctl: no read access`	Add user to `systemd-journal` group: `usermod -aG systemd-journal $USER`
`/var/log/journal not found`	Enable persistent journal: `mkdir -p /var/log/journal && systemd-tmpfiles --create`

Configuration¶

Collector Configuration¶

After successfully installing and starting DataKit, enable the Journald collector by copying the configuration file:

Host InstallationKubernetes

Go to the conf.d/samples directory under the DataKit installation directory, copy journald.conf.sample and name it journald.conf. Examples are as follows:

# Collect systemd journal logs using external binary
[[inputs.journald]]
  ## Name of the collector
  name = 'journald'

  ## Run as daemon (required for journald collection)
  daemon = true

  http_endpoint = "http://localhost:9529"
  log_level = "info"
  log_path = "/usr/local/datakit/externals/journald.log"

  ## Path to datakit-journald binary
  ## Default: searches in /usr/local/datakit/externals/datakit-journald and ./externals/datakit-journald
  # cmd = "/usr/local/datakit/externals/datakit-journald"

  ## Interval to check external process (for non-daemon mode)
  # interval = "10s"

  ## Rootfs mount point for container/Kubernetes mode only
  ## DataKit uses this as the host root prefix when auto-prefixing absolute paths
  ## and preparing host-side systemd libraries (copy_node_libs).
  mount_dir = "/rootfs"

  ## Journal directory paths
  ## Host installation: use default paths
  ## Container/Kubernetes: DataKit auto-prefixes absolute paths with mount_dir.
  paths = [
    "/var/log/journal",      # Persistent storage
    "/run/log/journal",      # Runtime storage
  ]

  ## Filter by systemd unit names (supports glob patterns)
  ## Empty = all units
  # units = ["*.service", "docker.service", "kubelet.service"]

  ## Filter by priority levels
  ## Levels: emerg(0), alert(1), crit(2), err(3), warning(4), notice(5), info(6), debug(7)
  ## Empty = all priorities
  # priorities = ["err", "warning", "crit", "alert", "emerg"]

  ## Field selection - collect all by default, exclude specific fields
  exclude_fields = [
    "_BOOT_ID",
    "_MACHINE_ID",
    "__MONOTONIC_TIMESTAMP",
  ]

  ## Collection behavior
  ## tail_only=true: Only collect new entries (cursor not needed)
  ## tail_only=false: Read from last position (cursor required)
  tail_only = true
  max_entries_per_batch = 1000

  ## Cursor management (only used when tail_only=false)
  # save_cursor = true
  # cursor_file = "/usr/local/datakit/cache/journald.cursor"

  ## Environment variables for external binary
  # envs = [
  #   "LD_LIBRARY_PATH=/usr/local/datakit/externals:$LD_LIBRARY_PATH",
  # ]

  ## Host-side systemd library prepare:
  ## - Container/Kubernetes (Docker or Kubernetes): auto forced to true.
  ## - Non-container host: disabled by default. If enabled manually, set copy_node_libs_files explicitly.
  ## - In container/kubernetes mode, when copy_node_libs_files is empty, DataKit first copies
  ##   libsystemd.so* then runs "LD_LIBRARY_PATH=<dst> ldd libsystemd.so.0"
  ##   style dependency probing and copies missing .so files automatically.
  # copy_node_libs = true
  ## Optional override file list. If set, only these patterns/files are copied.
  # copy_node_libs_files = [
  #   "libsystemd.so*",
  #   "liblz4.so*",
  #   "libzstd.so*",
  #   "liblzma.so*",
  #   "libcap.so*",
  #   "libgcrypt.so*",
  #   "libgpg-error.so*",
  #   "libselinux.so*",
  #   "libmount.so*",
  #   "libblkid.so*",
  #   "libacl.so*",
  #   "libpcre2-8.so*",
  #   "libpcre.so*",
  # ]

  ## Additional arguments for external binary
  # args = []

  [inputs.journald.tags]
    # Add custom tags as needed
    # environment = "production"
    # cluster = "k8s-cluster-1"

After configuration, restart DataKit.

Can be turned on by ConfigMap Injection Collector Configuration or Config ENV_DATAKIT_INPUTS.

Configuration Options¶

Option	Type	Default	Description
`paths`	[]string	`["/var/log/journal", "/run/log/journal"]`	Journal directory paths
`units`	[]string	`[]`	Filter by systemd unit names (supports glob patterns, e.g., `*.service`)
`priorities`	[]string	`[]`	Filter by priority levels: `emerg`, `alert`, `crit`, `err`, `warning`, `notice`, `info`, `debug`
`exclude_fields`	[]string	`[]`	Journal fields to exclude from collection (e.g., `_BOOT_ID`, `_MACHINE_ID`)
`tail_only`	bool	`true`	Only collect new entries (skip historical logs on startup)
`max_entries_per_batch`	int	`1000`	Maximum number of entries to collect per batch
`save_cursor`	bool	`true`	Persist read position to resume after restart
`cursor_file`	string	`/usr/local/datakit/cache/journald.pos`	Path to store cursor position
`mount_dir`	string	`"/rootfs"`	Rootfs mount directory used in container/Kubernetes mode only. DataKit uses this prefix for absolute `paths` and as source root for host-side library prepare
`copy_node_libs`	bool	`false` (auto forced to `true` in container or Kubernetes mode)	Whether to copy host-side dynamic libraries from mount dir into DataKit-managed `external-libs` before starting the external collector. In container or Kubernetes environments (`datakit.Docker \|\| config.IsKubernetes()`), DataKit auto-enables this
`copy_node_libs_files`	[]string	`[]`	Dynamic library file names or glob patterns to copy. If configured, only these are copied. If empty in container/Kubernetes auto mode, DataKit first copies `libsystemd.so*`, then runs `LD_LIBRARY_PATH=/usr/local/datakit/externals/systemd-libs ldd libsystemd.so.0`-style dependency probing and copies missing `.so` automatically. If empty outside container/Kubernetes mode while `copy_node_libs=true`, startup fails with configuration error

Log Fields¶

`journald`¶

Systemd journal logs. Note: Field availability varies by systemd version - refer to version hints (e.g., v188+, v205+) in each field description

Tags & Fields	Description
host (`tag`)	Hostname (from `_HOSTNAME`, v188+)
service (`tag`)	Service identifier (from `SYSLOG_IDENTIFIER`, `_SYSTEMD_UNIT`, or `_COMM`)
CODE_FILE	Source code filename for debugging (v188+) Type: string Unit: N/A
CODE_FUNC	Function name for debugging (v188+) Type: string Unit: N/A
CODE_LINE	Source code line number for debugging (v188+) Type: int Unit: N/A
COREDUMP_CMDLINE	Full command line at crash time (v188+) Type: string Unit: N/A
COREDUMP_CWD	Current working directory at crash time (v188+) Type: string Unit: N/A
COREDUMP_EXE	Executable path of crashed binary (v188+) Type: string Unit: N/A
COREDUMP_GID	Crashed process GID (v188+) Type: int Unit: N/A
COREDUMP_HOSTNAME	Hostname at crash time (v188+) Type: string Unit: N/A
COREDUMP_PID	Crashed process PID (v188+) Type: int Unit: N/A
COREDUMP_ROOT	Root directory, usually / (v188+) Type: string Unit: N/A
COREDUMP_SIGNAL	Signal number that caused crash (v188+) Type: int Unit: N/A
COREDUMP_STACKTRACE	Full stack trace backtrace (v188+) Type: string Unit: N/A
COREDUMP_TIMESTAMP	Crash timestamp in microseconds (v188+) Type: int Unit: time,μs
COREDUMP_UID	Crashed process UID (v188+) Type: int Unit: N/A
COREDUMP_UNIT	System unit that crashed (v198+) Type: string Unit: N/A
COREDUMP_USER_UNIT	User unit that crashed (v198+) Type: string Unit: N/A
DOCUMENTATION	Documentation URL http/https/file/man/info (v246+) Type: string Unit: N/A
ERRNO	Unix error number associated with message (v188+) Type: int Unit: N/A
INVOCATION_ID	Invocation ID for systemd code messages (v245+) Type: string Unit: N/A
MESSAGE_ID	128-bit message identifier (`UUID` format, v188+) Type: string Unit: N/A
OBJECT_AUDIT_LOGINUID	Target login UID (v205+) Type: int Unit: N/A
OBJECT_AUDIT_SESSION	Target audit session ID (v205+) Type: int Unit: N/A
OBJECT_CMDLINE	Target process full command line (v205+) Type: string Unit: N/A
OBJECT_COMM	Target process comm (v205+) Type: string Unit: N/A
OBJECT_EXE	Target process executable path (v205+) Type: string Unit: N/A
OBJECT_GID	Target process GID (v205+) Type: int Unit: N/A
OBJECT_PID	Target process PID, requires UID 0 to set (v205+) Type: int Unit: N/A
OBJECT_SYSTEMD_CGROUP	Target cgroup path (v205+) Type: string Unit: N/A
OBJECT_SYSTEMD_INVOCATION_ID	Target invocation ID (v235+) Type: string Unit: N/A
OBJECT_SYSTEMD_OWNER_UID	Target session owner UID (v205+) Type: int Unit: N/A
OBJECT_SYSTEMD_SESSION	Target session ID (v205+) Type: string Unit: N/A
OBJECT_SYSTEMD_UNIT	Target unit name (v205+) Type: string Unit: N/A
OBJECT_SYSTEMD_USER_UNIT	Target user unit name (v205+) Type: string Unit: N/A
OBJECT_UID	Target process UID (v205+) Type: int Unit: N/A
SYSLOG_FACILITY	Syslog facility 0-23 (v188+) Type: int Unit: N/A
SYSLOG_PID	Client PID from syslog, may differ from `_PID` (v188+) Type: int Unit: N/A
SYSLOG_RAW	Original syslog line if `MESSAGE` modified or timestamp lost (v240+) Type: string Unit: N/A
SYSLOG_TIMESTAMP	Original syslog timestamp as received (v188+) Type: string Unit: N/A
TID	Thread ID numeric (v247+) Type: int Unit: N/A
UNIT	Unit name user-provided alternative to `_SYSTEMD_UNIT` (v251+) Type: string Unit: N/A
USER_INVOCATION_ID	User invocation ID for user manager messages (v245+) Type: string Unit: N/A
USER_UNIT	User unit user-provided alternative to `_SYSTEMD_USER_UNIT` (v251+) Type: string Unit: N/A
_AUDIT_LOGINUID	Login UID from kernel audit (v188+) Type: int Unit: N/A
_AUDIT_SESSION	Audit session ID from kernel (v188+) Type: int Unit: N/A
_BOOT_ID	Boot ID 128-bit hex `UUID` (v188+) Type: string Unit: N/A
_CAP_EFFECTIVE	Effective capabilities bitmask (v206+) Type: int Unit: N/A
_CMDLINE	Full command line, most complete process info (v188+) Type: string Unit: N/A
_COMM	Command name truncated to 15 chars (v188+) Type: string Unit: N/A
_CONTAINER_ID	Container ID for nspawn/containers (v205+) Type: string Unit: N/A
_CONTAINER_IMAGE	Container image for nspawn/containers (v205+) Type: string Unit: N/A
_CONTAINER_NAME	Container name for nspawn/containers (v205+) Type: string Unit: N/A
_EXE	Executable path, full path (v188+) Type: string Unit: N/A
_GID	Group ID, trusted (v188+) Type: int Unit: N/A
_KERNEL_DEVICE	Kernel device name format: `bM:N`, `cM:N`, `nN`, `+subsys:name` (v189+) Type: string Unit: N/A
_KERNEL_SUBSYSTEM	Kernel subsystem e.g. `block`, `net` (v189+) Type: string Unit: N/A
_LINE_BREAK	Line termination info: `nul`, `line-max`, `eof`, `pid-change` (v235+) Type: string Unit: N/A
_MACHINE_ID	Machine ID from `/etc/machine-id` (v188+) Type: string Unit: N/A
_NAMESPACE	Journal namespace ID (v245+) Type: string Unit: N/A
_RUNTIME_SCOPE	Runtime scope: `initrd`, `system`, or `user` (v252+) Type: string Unit: N/A
_SELINUX_CONTEXT	SELinux security context label (v188+) Type: string Unit: N/A
_SOURCE_BOOTTIME_TIMESTAMP	Boottime timestamp in microseconds `CLOCK_BOOTTIME` (v257+) Type: int Unit: time,μs
_SOURCE_REALTIME_TIMESTAMP	Source timestamp in microseconds `CLOCK_REALTIME` (v188+) Type: int Unit: time,μs
_STREAM_ID	Stream connection ID 128-bit `UUID` for stdout streams (v235+) Type: string Unit: N/A
_SYSTEMD_CGROUP	Control group path (v188+) Type: string Unit: N/A
_SYSTEMD_INVOCATION_ID	Unit invocation ID unique per unit start (v233+) Type: string Unit: N/A
_SYSTEMD_OWNER_UID	Session owner UID (v188+) Type: int Unit: N/A
_SYSTEMD_SESSION	Login session ID (v188+) Type: string Unit: N/A
_SYSTEMD_SLICE	Slice unit name e.g. `system.slice` (v188+) Type: string Unit: N/A
_SYSTEMD_UNIT	Unit name e.g. `sshd.service` (v188+) Type: string Unit: N/A
_SYSTEMD_USER_SLICE	User slice name e.g. `user.slice` (v188+) Type: string Unit: N/A
_SYSTEMD_USER_UNIT	User unit name for user sessions (v188+) Type: string Unit: N/A
_TRANSPORT	How entry was received: `audit`, `driver`, `syslog`, `journal`, `stdout`, `kernel` (v205+) Type: string Unit: N/A
_UDEV_DEVLINK	Symlinks to device, can appear multiple times (v189+) Type: string Unit: N/A
_UDEV_DEVNODE	Device node in /dev/ full path (v189+) Type: string Unit: N/A
_UDEV_SYSNAME	Device name in /sys/ (v189+) Type: string Unit: N/A
_UID	User ID, trusted cannot be spoofed (v188+) Type: int Unit: N/A
__CURSOR	Entry cursor, address field export only (v188+) Type: string Unit: N/A
__MONOTONIC_TIMESTAMP	Monotonic timestamp in microseconds, address field export only (v188+) Type: int Unit: time,μs
__REALTIME_TIMESTAMP	Reception timestamp in microseconds, address field export only (v188+) Type: int Unit: time,μs
__SEQNUM	Sequence number, address field export only (v254+) Type: int Unit: N/A
__SEQNUM_ID	Sequence ID, address field export only (v254+) Type: string Unit: N/A
journald_timestamp	Journal entry timestamp in nanoseconds (from `_SOURCE_REALTIME_TIMESTAMP` or `__REALTIME_TIMESTAMP`, v188+) Type: int Unit: time,ns
message	Log message content (from `MESSAGE`, v188+) Type: string Unit: N/A
pid	Process ID (from `_PID` or `SYSLOG_PID`, v188+) Type: int Unit: N/A
priority	Numeric priority level 0-7 (from `PRIORITY`, v188+) Type: int Unit: N/A
status	Log status level mapped from priority: `error`, `warn`, `critical`, `notice`, `info`, `debug`, `unknown` Type: string Unit: N/A

Common Use Cases¶

Collect logs from specific services

[[inputs.journald]]
  units = ["nginx.service", "mysql.service", "docker.service"]
  priorities = ["err", "crit", "alert", "emerg"]
  tail_only = true

Exclude verbose fields

[[inputs.journald]]
  exclude_fields = [
    "_BOOT_ID",
    "_MACHINE_ID",
    "__MONOTONIC_TIMESTAMP",
    "_AUDIT_SESSION",
    "_AUDIT_LOGINUID",
  ]

Kubernetes node journal collection (auto mode)

[[inputs.journald]]
  paths = ["/var/log/journal", "/run/log/journal"]
  tail_only = true

Notes:

The collector resolves candidate directories in configuration order and tries to open the first readable journal directory first
In container or Kubernetes environments (datakit.Docker || config.IsKubernetes()), DataKit auto-enables journald rootfs mode
In container/Kubernetes mode, absolute paths are automatically prefixed with mount_dir (default "/rootfs")
If the configured path is a journal root such as <mount_dir>/var/log/journal, the collector automatically descends into the machine-id subdirectory before opening it
In containerized node environments such as kind or k3d, validate logger and journalctl inside the node container rather than on the outer host
Kubernetes node journal collection with host-side systemd library prepare

[[inputs.journald]]
  mount_dir = "/rootfs"
  paths = ["/var/log/journal", "/run/log/journal"]
  tail_only = true
  copy_node_libs = true
  copy_node_libs_files = [
    "libsystemd.so*",
    "liblz4.so*",
    "libzstd.so*",
    "liblzma.so*",
    "libcap.so*",
    "libgcrypt.so*",
    "libgpg-error.so*",
    "libselinux.so*",
    "libmount.so*",
    "libblkid.so*",
    "libacl.so*",
    "libpcre2-8.so*",
    "libpcre.so*",
  ]

Collect all logs (debugging)

[[inputs.journald]]
  tail_only = false
  max_entries_per_batch = 500
  exclude_fields = []

Troubleshooting¶

Permission errors¶

Ensure DataKit has read access to journal files:

# Add datakit user to systemd-journal group
sudo usermod -aG systemd-journal datakit

# Restart DataKit
sudo systemctl restart datakit

No logs collected¶

Verify journald is running:

systemctl status systemd-journald

Check journal files exist:

ls -la /var/log/journal/
ls -la /run/log/journal/

If journalctl is available in the current environment, use it for extra validation; if the container does not ship journalctl, rely on the DataKit compatibility warning and probe result directly:

journalctl -n 10

If startup logs report reason=unsupported-format, the collector runtime is older than the target journal file format. In this case DataKit keeps the journald collector inactive and logs a warning instead of collecting partial or misleading results.

This can happen in Kubernetes whenever DataKit collects journal files from the node while the container image ships an older libsystemd than the host journal format requires. Typical symptoms are:

If journalctl is installed inside the Pod, it may report unsupported feature
DataKit starts, but the journald collector stays inactive after the compatibility warning

In container or Kubernetes environments (datakit.Docker || config.IsKubernetes()), DataKit already auto-enables host-side systemd library prepare. If you need this behavior on non-container hosts, enable:

[[inputs.journald]]
  copy_node_libs = true

When enabled, DataKit copies dynamic libraries from candidate system library directories under mount_dir (default "/rootfs") into its own external-libs directory, then prepends that directory to LD_LIBRARY_PATH automatically.

Copy behavior details:

If copy_node_libs_files is configured and non-empty, DataKit copies only that list.
If copy_node_libs_files is empty in container/Kubernetes auto mode, DataKit first copies libsystemd.so*, then probes missing dependencies with ldd libsystemd.so.0 under the copied library path, and copies the missing .so files automatically.
If copy_node_libs_files is empty on non-container and non-Kubernetes hosts while copy_node_libs=true, DataKit reports a configuration error and keeps the collector inactive.
If library prepare fails while copy_node_libs is enabled, the journald collector stays inactive (other DataKit collectors are not affected).

After the collector opens the journal successfully, it also logs the effective libsystemd path in external journald.log, for example:

loaded libsystemd paths: [/usr/local/datakit/externals/systemd-libs/libsystemd.so.0.35.0]

Constraints:

The host libsystemd is not guaranteed to be compatible with the journald external binary currently shipped in DataKit
If the host libsystemd is too old, the external binary may fail during dynamic linking because of missing symbols or version mismatches
If the host libsystemd is newer, it may still fail later with unsupported feature when reading journal files
Therefore, copy_node_libs is only a preparation mechanism, not a guarantee that the copied libraries are compatible; the final result still needs to be verified from startup logs and probe results

Do not point LD_LIBRARY_PATH at the entire host /usr/lib64 directory. That can also pull incompatible glibc components into the collector process and create a less predictable failure mode.

If startup logs contain:

resolved journal directory: target=...
opening journal from directory: ...

the collector is using directory-based journal opening, which is the recommended path for live journals. Avoid configuring individual .journal files as the primary input path.

Cursor file issues¶

If the cursor file becomes corrupted (e.g., after host reboot), the collector automatically falls back to tail mode and creates a new cursor. To manually reset:

# Remove cursor file
rm /usr/local/datakit/cache/journald.pos

# Restart DataKit
sudo systemctl restart datakit

High memory usage¶

Default batch size is 1000 entries. If memory usage is a concern, reduce the batch size:

[[inputs.journald]]
  max_entries_per_batch = 100