Skip to content

Journald


The Journald collector is used to collect logs from the systemd journal (journald) on Linux systems. It uses an external binary wrapper to interface with libsystemd and efficiently collects structured log entries from the journal.

Prerequisites

  • Linux only: Requires systemd and journald
  • libsystemd: External binary requires libsystemd development libraries
  • Permissions: DataKit needs read access to journal files (typically requires joining systemd-journal group)

System Requirements Check

Before deploying the journald collector, verify your system meets the requirements:

Quick check with one-liner:

systemctl --version >/dev/null 2>&1 && journalctl -n 1 >/dev/null 2>&1 && echo "Systemd OK" || echo "Systemd not available"

Comprehensive pre-flight check script:

journald-prereq-check.sh
#!/bin/bash
# journald-prereq-check.sh - Verify systemd requirements

echo "=== Journald Collector Prerequisites Check ==="
echo

# 1. Check if systemctl exists
echo -n "1. systemctl command: "
if command -v systemctl >/dev/null 2>&1; then
    VERSION=$(systemctl --version | head -1)
    echo "✅ Found - $VERSION"
else
    echo "❌ NOT FOUND - systemctl not installed"
    exit 1
fi

# 2. Check libsystemd library
echo -n "2. libsystemd.so.0: "
if ldconfig -p 2>/dev/null | grep -q "libsystemd.so.0"; then
    LIBPATH=$(ldconfig -p 2>/dev/null | grep "libsystemd.so.0" | head -1 | awk '{print $NF}')
    echo "✅ Found - $LIBPATH"
else
    echo "❌ NOT FOUND - libsystemd.so.0 missing"
    exit 1
fi

# 3. Check journalctl access
echo -n "3. journalctl access: "
if journalctl -n 1 >/dev/null 2>&1; then
    echo "✅ OK - Can read journal"
else
    echo "⚠️  LIMITED - journalctl exists but no read access"
fi

# 4. Check journal directories
echo "4. Journal directories:"
for dir in "/var/log/journal" "/run/log/journal"; do
    echo -n "   $dir: "
    if [ -d "$dir" ]; then
        if [ -r "$dir" ]; then
            echo "✅ Exists and readable"
        else
            echo "⚠️  Exists but NOT readable"
        fi
    else
        echo "❌ NOT FOUND"
    fi
done

# 5. Check systemd version
echo -n "5. systemd version: "
SYSTEMD_VERSION=$(systemctl --version | head -1 | grep -oP 'systemd \K\d+' || echo "0")
if [ "$SYSTEMD_VERSION" -ge 205 ]; then
    echo "✅ v$SYSTEMD_VERSION (meets minimum v205)"
else
    echo "⚠️  v$SYSTEMD_VERSION (older than recommended v205)"
fi

echo
echo "=== Check Complete ==="

Save as journald-prereq-check.sh and run:

chmod +x journald-prereq-check.sh
./journald-prereq-check.sh

Expected output:

=== Journald Collector Prerequisites Check ===

1. systemctl command: ✅ Found - systemd 257 (257.3-1-arch)
2. libsystemd.so.0: ✅ Found - /usr/lib/libsystemd.so.0
3. journalctl access: ✅ OK - Can read journal
4. Journal directories:
   /var/log/journal: ✅ Exists and readable
   /run/log/journal: ✅ Exists and readable
5. systemd version: ✅ v257 (meets minimum v205)

=== Check Complete ===

Possible troubleshooting solutions:

Issue Solution
systemctl: command not found Install systemd or use alternative log collection
libsystemd.so.0: cannot open Install systemd-libs: apt install libsystemd0 or yum install systemd-libs
journalctl: no read access Add user to systemd-journal group: usermod -aG systemd-journal $USER
/var/log/journal not found Enable persistent journal: mkdir -p /var/log/journal && systemd-tmpfiles --create

Configuration

Collector Configuration

After successfully installing and starting DataKit, enable the Journald collector by copying the configuration file:

Go to the conf.d/samples directory under the DataKit installation directory, copy journald.conf.sample and name it journald.conf. Examples are as follows:

# Collect systemd journal logs using external binary
[[inputs.journald]]
  ## Name of the collector
  name = 'journald'

  ## Run as daemon (required for journald collection)
  daemon = true

  http_endpoint = "http://localhost:9529"
  log_level = "info"
  log_path = "/usr/local/datakit/externals/journald.log"

  ## Path to datakit-journald binary
  ## Default: searches in /usr/local/datakit/externals/datakit-journald and ./externals/datakit-journald
  # cmd = "/usr/local/datakit/externals/datakit-journald"

  ## Interval to check external process (for non-daemon mode)
  # interval = "10s"

  ## Rootfs mount point for container/Kubernetes mode only
  ## DataKit uses this as the host root prefix when auto-prefixing absolute paths
  ## and preparing host-side systemd libraries (copy_node_libs).
  mount_dir = "/rootfs"

  ## Journal directory paths
  ## Host installation: use default paths
  ## Container/Kubernetes: DataKit auto-prefixes absolute paths with mount_dir.
  paths = [
    "/var/log/journal",      # Persistent storage
    "/run/log/journal",      # Runtime storage
  ]

  ## Filter by systemd unit names (supports glob patterns)
  ## Empty = all units
  # units = ["*.service", "docker.service", "kubelet.service"]

  ## Filter by priority levels
  ## Levels: emerg(0), alert(1), crit(2), err(3), warning(4), notice(5), info(6), debug(7)
  ## Empty = all priorities
  # priorities = ["err", "warning", "crit", "alert", "emerg"]

  ## Field selection - collect all by default, exclude specific fields
  exclude_fields = [
    "_BOOT_ID",
    "_MACHINE_ID",
    "__MONOTONIC_TIMESTAMP",
  ]

  ## Collection behavior
  ## tail_only=true: Only collect new entries (cursor not needed)
  ## tail_only=false: Read from last position (cursor required)
  tail_only = true
  max_entries_per_batch = 1000

  ## Cursor management (only used when tail_only=false)
  # save_cursor = true
  # cursor_file = "/usr/local/datakit/cache/journald.cursor"

  ## Environment variables for external binary
  # envs = [
  #   "LD_LIBRARY_PATH=/usr/local/datakit/externals:$LD_LIBRARY_PATH",
  # ]

  ## Host-side systemd library prepare:
  ## - Container/Kubernetes (Docker or Kubernetes): auto forced to true.
  ## - Non-container host: disabled by default. If enabled manually, set copy_node_libs_files explicitly.
  ## - In container/kubernetes mode, when copy_node_libs_files is empty, DataKit first copies
  ##   libsystemd.so* then runs "LD_LIBRARY_PATH=<dst> ldd libsystemd.so.0"
  ##   style dependency probing and copies missing .so files automatically.
  # copy_node_libs = true
  ## Optional override file list. If set, only these patterns/files are copied.
  # copy_node_libs_files = [
  #   "libsystemd.so*",
  #   "liblz4.so*",
  #   "libzstd.so*",
  #   "liblzma.so*",
  #   "libcap.so*",
  #   "libgcrypt.so*",
  #   "libgpg-error.so*",
  #   "libselinux.so*",
  #   "libmount.so*",
  #   "libblkid.so*",
  #   "libacl.so*",
  #   "libpcre2-8.so*",
  #   "libpcre.so*",
  # ]

  ## Additional arguments for external binary
  # args = []

  [inputs.journald.tags]
    # Add custom tags as needed
    # environment = "production"
    # cluster = "k8s-cluster-1"

After configuration, restart DataKit.

Configuration Options

Option Type Default Description
paths []string ["/var/log/journal", "/run/log/journal"] Journal directory paths
units []string [] Filter by systemd unit names (supports glob patterns, e.g., *.service)
priorities []string [] Filter by priority levels: emerg, alert, crit, err, warning, notice, info, debug
exclude_fields []string [] Journal fields to exclude from collection (e.g., _BOOT_ID, _MACHINE_ID)
tail_only bool true Only collect new entries (skip historical logs on startup)
max_entries_per_batch int 1000 Maximum number of entries to collect per batch
save_cursor bool true Persist read position to resume after restart
cursor_file string /usr/local/datakit/cache/journald.pos Path to store cursor position
mount_dir string "/rootfs" Rootfs mount directory used in container/Kubernetes mode only. DataKit uses this prefix for absolute paths and as source root for host-side library prepare
copy_node_libs bool false (auto forced to true in container or Kubernetes mode) Whether to copy host-side dynamic libraries from mount dir into DataKit-managed external-libs before starting the external collector. In container or Kubernetes environments (datakit.Docker || config.IsKubernetes()), DataKit auto-enables this
copy_node_libs_files []string [] Dynamic library file names or glob patterns to copy. If configured, only these are copied. If empty in container/Kubernetes auto mode, DataKit first copies libsystemd.so*, then runs LD_LIBRARY_PATH=/usr/local/datakit/externals/systemd-libs ldd libsystemd.so.0-style dependency probing and copies missing .so automatically. If empty outside container/Kubernetes mode while copy_node_libs=true, startup fails with configuration error

Log Fields

journald

Systemd journal logs. Note: Field availability varies by systemd version - refer to version hints (e.g., v188+, v205+) in each field description

Tags & Fields Description
host
(tag)
Hostname (from _HOSTNAME, v188+)
service
(tag)
Service identifier (from SYSLOG_IDENTIFIER, _SYSTEMD_UNIT, or _COMM)
CODE_FILE Source code filename for debugging (v188+)
Type: string
Unit: N/A
CODE_FUNC Function name for debugging (v188+)
Type: string
Unit: N/A
CODE_LINE Source code line number for debugging (v188+)
Type: int
Unit: N/A
COREDUMP_CMDLINE Full command line at crash time (v188+)
Type: string
Unit: N/A
COREDUMP_CWD Current working directory at crash time (v188+)
Type: string
Unit: N/A
COREDUMP_EXE Executable path of crashed binary (v188+)
Type: string
Unit: N/A
COREDUMP_GID Crashed process GID (v188+)
Type: int
Unit: N/A
COREDUMP_HOSTNAME Hostname at crash time (v188+)
Type: string
Unit: N/A
COREDUMP_PID Crashed process PID (v188+)
Type: int
Unit: N/A
COREDUMP_ROOT Root directory, usually / (v188+)
Type: string
Unit: N/A
COREDUMP_SIGNAL Signal number that caused crash (v188+)
Type: int
Unit: N/A
COREDUMP_STACKTRACE Full stack trace backtrace (v188+)
Type: string
Unit: N/A
COREDUMP_TIMESTAMP Crash timestamp in microseconds (v188+)
Type: int
Unit: time,μs
COREDUMP_UID Crashed process UID (v188+)
Type: int
Unit: N/A
COREDUMP_UNIT System unit that crashed (v198+)
Type: string
Unit: N/A
COREDUMP_USER_UNIT User unit that crashed (v198+)
Type: string
Unit: N/A
DOCUMENTATION Documentation URL http/https/file/man/info (v246+)
Type: string
Unit: N/A
ERRNO Unix error number associated with message (v188+)
Type: int
Unit: N/A
INVOCATION_ID Invocation ID for systemd code messages (v245+)
Type: string
Unit: N/A
MESSAGE_ID 128-bit message identifier (UUID format, v188+)
Type: string
Unit: N/A
OBJECT_AUDIT_LOGINUID Target login UID (v205+)
Type: int
Unit: N/A
OBJECT_AUDIT_SESSION Target audit session ID (v205+)
Type: int
Unit: N/A
OBJECT_CMDLINE Target process full command line (v205+)
Type: string
Unit: N/A
OBJECT_COMM Target process comm (v205+)
Type: string
Unit: N/A
OBJECT_EXE Target process executable path (v205+)
Type: string
Unit: N/A
OBJECT_GID Target process GID (v205+)
Type: int
Unit: N/A
OBJECT_PID Target process PID, requires UID 0 to set (v205+)
Type: int
Unit: N/A
OBJECT_SYSTEMD_CGROUP Target cgroup path (v205+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_INVOCATION_ID Target invocation ID (v235+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_OWNER_UID Target session owner UID (v205+)
Type: int
Unit: N/A
OBJECT_SYSTEMD_SESSION Target session ID (v205+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_UNIT Target unit name (v205+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_USER_UNIT Target user unit name (v205+)
Type: string
Unit: N/A
OBJECT_UID Target process UID (v205+)
Type: int
Unit: N/A
SYSLOG_FACILITY Syslog facility 0-23 (v188+)
Type: int
Unit: N/A
SYSLOG_PID Client PID from syslog, may differ from _PID (v188+)
Type: int
Unit: N/A
SYSLOG_RAW Original syslog line if MESSAGE modified or timestamp lost (v240+)
Type: string
Unit: N/A
SYSLOG_TIMESTAMP Original syslog timestamp as received (v188+)
Type: string
Unit: N/A
TID Thread ID numeric (v247+)
Type: int
Unit: N/A
UNIT Unit name user-provided alternative to _SYSTEMD_UNIT (v251+)
Type: string
Unit: N/A
USER_INVOCATION_ID User invocation ID for user manager messages (v245+)
Type: string
Unit: N/A
USER_UNIT User unit user-provided alternative to _SYSTEMD_USER_UNIT (v251+)
Type: string
Unit: N/A
_AUDIT_LOGINUID Login UID from kernel audit (v188+)
Type: int
Unit: N/A
_AUDIT_SESSION Audit session ID from kernel (v188+)
Type: int
Unit: N/A
_BOOT_ID Boot ID 128-bit hex UUID (v188+)
Type: string
Unit: N/A
_CAP_EFFECTIVE Effective capabilities bitmask (v206+)
Type: int
Unit: N/A
_CMDLINE Full command line, most complete process info (v188+)
Type: string
Unit: N/A
_COMM Command name truncated to 15 chars (v188+)
Type: string
Unit: N/A
_CONTAINER_ID Container ID for nspawn/containers (v205+)
Type: string
Unit: N/A
_CONTAINER_IMAGE Container image for nspawn/containers (v205+)
Type: string
Unit: N/A
_CONTAINER_NAME Container name for nspawn/containers (v205+)
Type: string
Unit: N/A
_EXE Executable path, full path (v188+)
Type: string
Unit: N/A
_GID Group ID, trusted (v188+)
Type: int
Unit: N/A
_KERNEL_DEVICE Kernel device name format: bM:N, cM:N, nN, +subsys:name (v189+)
Type: string
Unit: N/A
_KERNEL_SUBSYSTEM Kernel subsystem e.g. block, net (v189+)
Type: string
Unit: N/A
_LINE_BREAK Line termination info: nul, line-max, eof, pid-change (v235+)
Type: string
Unit: N/A
_MACHINE_ID Machine ID from /etc/machine-id (v188+)
Type: string
Unit: N/A
_NAMESPACE Journal namespace ID (v245+)
Type: string
Unit: N/A
_RUNTIME_SCOPE Runtime scope: initrd, system, or user (v252+)
Type: string
Unit: N/A
_SELINUX_CONTEXT SELinux security context label (v188+)
Type: string
Unit: N/A
_SOURCE_BOOTTIME_TIMESTAMP Boottime timestamp in microseconds CLOCK_BOOTTIME (v257+)
Type: int
Unit: time,μs
_SOURCE_REALTIME_TIMESTAMP Source timestamp in microseconds CLOCK_REALTIME (v188+)
Type: int
Unit: time,μs
_STREAM_ID Stream connection ID 128-bit UUID for stdout streams (v235+)
Type: string
Unit: N/A
_SYSTEMD_CGROUP Control group path (v188+)
Type: string
Unit: N/A
_SYSTEMD_INVOCATION_ID Unit invocation ID unique per unit start (v233+)
Type: string
Unit: N/A
_SYSTEMD_OWNER_UID Session owner UID (v188+)
Type: int
Unit: N/A
_SYSTEMD_SESSION Login session ID (v188+)
Type: string
Unit: N/A
_SYSTEMD_SLICE Slice unit name e.g. system.slice (v188+)
Type: string
Unit: N/A
_SYSTEMD_UNIT Unit name e.g. sshd.service (v188+)
Type: string
Unit: N/A
_SYSTEMD_USER_SLICE User slice name e.g. user.slice (v188+)
Type: string
Unit: N/A
_SYSTEMD_USER_UNIT User unit name for user sessions (v188+)
Type: string
Unit: N/A
_TRANSPORT How entry was received: audit, driver, syslog, journal, stdout, kernel (v205+)
Type: string
Unit: N/A
_UDEV_DEVLINK Symlinks to device, can appear multiple times (v189+)
Type: string
Unit: N/A
_UDEV_DEVNODE Device node in /dev/ full path (v189+)
Type: string
Unit: N/A
_UDEV_SYSNAME Device name in /sys/ (v189+)
Type: string
Unit: N/A
_UID User ID, trusted cannot be spoofed (v188+)
Type: int
Unit: N/A
__CURSOR Entry cursor, address field export only (v188+)
Type: string
Unit: N/A
__MONOTONIC_TIMESTAMP Monotonic timestamp in microseconds, address field export only (v188+)
Type: int
Unit: time,μs
__REALTIME_TIMESTAMP Reception timestamp in microseconds, address field export only (v188+)
Type: int
Unit: time,μs
__SEQNUM Sequence number, address field export only (v254+)
Type: int
Unit: N/A
__SEQNUM_ID Sequence ID, address field export only (v254+)
Type: string
Unit: N/A
journald_timestamp Journal entry timestamp in nanoseconds (from _SOURCE_REALTIME_TIMESTAMP or __REALTIME_TIMESTAMP, v188+)
Type: int
Unit: time,ns
message Log message content (from MESSAGE, v188+)
Type: string
Unit: N/A
pid Process ID (from _PID or SYSLOG_PID, v188+)
Type: int
Unit: N/A
priority Numeric priority level 0-7 (from PRIORITY, v188+)
Type: int
Unit: N/A
status Log status level mapped from priority: error, warn, critical, notice, info, debug, unknown
Type: string
Unit: N/A

Common Use Cases

  • Collect logs from specific services
[[inputs.journald]]
  units = ["nginx.service", "mysql.service", "docker.service"]
  priorities = ["err", "crit", "alert", "emerg"]
  tail_only = true
  • Exclude verbose fields
[[inputs.journald]]
  exclude_fields = [
    "_BOOT_ID",
    "_MACHINE_ID",
    "__MONOTONIC_TIMESTAMP",
    "_AUDIT_SESSION",
    "_AUDIT_LOGINUID",
  ]
  • Kubernetes node journal collection (auto mode)
[[inputs.journald]]
  paths = ["/var/log/journal", "/run/log/journal"]
  tail_only = true

Notes:

  • The collector resolves candidate directories in configuration order and tries to open the first readable journal directory first
  • In container or Kubernetes environments (datakit.Docker || config.IsKubernetes()), DataKit auto-enables journald rootfs mode
  • In container/Kubernetes mode, absolute paths are automatically prefixed with mount_dir (default "/rootfs")
  • If the configured path is a journal root such as <mount_dir>/var/log/journal, the collector automatically descends into the machine-id subdirectory before opening it
  • In containerized node environments such as kind or k3d, validate logger and journalctl inside the node container rather than on the outer host

  • Kubernetes node journal collection with host-side systemd library prepare

[[inputs.journald]]
  mount_dir = "/rootfs"
  paths = ["/var/log/journal", "/run/log/journal"]
  tail_only = true
  copy_node_libs = true
  copy_node_libs_files = [
    "libsystemd.so*",
    "liblz4.so*",
    "libzstd.so*",
    "liblzma.so*",
    "libcap.so*",
    "libgcrypt.so*",
    "libgpg-error.so*",
    "libselinux.so*",
    "libmount.so*",
    "libblkid.so*",
    "libacl.so*",
    "libpcre2-8.so*",
    "libpcre.so*",
  ]
  • Collect all logs (debugging)
[[inputs.journald]]
  tail_only = false
  max_entries_per_batch = 500
  exclude_fields = []

Troubleshooting

Permission errors

Ensure DataKit has read access to journal files:

# Add datakit user to systemd-journal group
sudo usermod -aG systemd-journal datakit

# Restart DataKit
sudo systemctl restart datakit

No logs collected

  1. Verify journald is running:
systemctl status systemd-journald
  1. Check journal files exist:
ls -la /var/log/journal/
ls -la /run/log/journal/
  1. If journalctl is available in the current environment, use it for extra validation; if the container does not ship journalctl, rely on the DataKit compatibility warning and probe result directly:
journalctl -n 10

If startup logs report reason=unsupported-format, the collector runtime is older than the target journal file format. In this case DataKit keeps the journald collector inactive and logs a warning instead of collecting partial or misleading results.

This can happen in Kubernetes whenever DataKit collects journal files from the node while the container image ships an older libsystemd than the host journal format requires. Typical symptoms are:

  • If journalctl is installed inside the Pod, it may report unsupported feature
  • DataKit starts, but the journald collector stays inactive after the compatibility warning

In container or Kubernetes environments (datakit.Docker || config.IsKubernetes()), DataKit already auto-enables host-side systemd library prepare. If you need this behavior on non-container hosts, enable:

[[inputs.journald]]
  copy_node_libs = true

When enabled, DataKit copies dynamic libraries from candidate system library directories under mount_dir (default "/rootfs") into its own external-libs directory, then prepends that directory to LD_LIBRARY_PATH automatically.

Copy behavior details:

  • If copy_node_libs_files is configured and non-empty, DataKit copies only that list.
  • If copy_node_libs_files is empty in container/Kubernetes auto mode, DataKit first copies libsystemd.so*, then probes missing dependencies with ldd libsystemd.so.0 under the copied library path, and copies the missing .so files automatically.
  • If copy_node_libs_files is empty on non-container and non-Kubernetes hosts while copy_node_libs=true, DataKit reports a configuration error and keeps the collector inactive.
  • If library prepare fails while copy_node_libs is enabled, the journald collector stays inactive (other DataKit collectors are not affected).

After the collector opens the journal successfully, it also logs the effective libsystemd path in external journald.log, for example:

loaded libsystemd paths: [/usr/local/datakit/externals/systemd-libs/libsystemd.so.0.35.0]

Constraints:

  • The host libsystemd is not guaranteed to be compatible with the journald external binary currently shipped in DataKit
  • If the host libsystemd is too old, the external binary may fail during dynamic linking because of missing symbols or version mismatches
  • If the host libsystemd is newer, it may still fail later with unsupported feature when reading journal files
  • Therefore, copy_node_libs is only a preparation mechanism, not a guarantee that the copied libraries are compatible; the final result still needs to be verified from startup logs and probe results

Do not point LD_LIBRARY_PATH at the entire host /usr/lib64 directory. That can also pull incompatible glibc components into the collector process and create a less predictable failure mode.

If startup logs contain:

resolved journal directory: target=...
opening journal from directory: ...

the collector is using directory-based journal opening, which is the recommended path for live journals. Avoid configuring individual .journal files as the primary input path.

Cursor file issues

If the cursor file becomes corrupted (e.g., after host reboot), the collector automatically falls back to tail mode and creates a new cursor. To manually reset:

# Remove cursor file
rm /usr/local/datakit/cache/journald.pos

# Restart DataKit
sudo systemctl restart datakit

High memory usage

Default batch size is 1000 entries. If memory usage is a concern, reduce the batch size:

[[inputs.journald]]
  max_entries_per_batch = 100