Skip to content

Journald


The Journald collector is used to collect logs from the systemd journal (journald) on Linux systems. It uses an external binary wrapper to interface with libsystemd and efficiently collects structured log entries from the journal.

Prerequisites

  • Linux only: Requires systemd and journald
  • libsystemd: External binary requires libsystemd development libraries
  • Permissions: DataKit needs read access to journal files (typically requires joining systemd-journal group)

System Requirements Check

Before deploying the journald collector, verify your system meets the requirements:

Quick check with one-liner:

systemctl --version >/dev/null 2>&1 && journalctl -n 1 >/dev/null 2>&1 && echo "Systemd OK" || echo "Systemd not available"

Comprehensive pre-flight check script:

journald-prereq-check.sh
#!/bin/bash
# journald-prereq-check.sh - Verify systemd requirements

echo "=== Journald Collector Prerequisites Check ==="
echo

# 1. Check if systemctl exists
echo -n "1. systemctl command: "
if command -v systemctl >/dev/null 2>&1; then
    VERSION=$(systemctl --version | head -1)
    echo "✅ Found - $VERSION"
else
    echo "❌ NOT FOUND - systemctl not installed"
    exit 1
fi

# 2. Check libsystemd library
echo -n "2. libsystemd.so.0: "
if ldconfig -p 2>/dev/null | grep -q "libsystemd.so.0"; then
    LIBPATH=$(ldconfig -p 2>/dev/null | grep "libsystemd.so.0" | head -1 | awk '{print $NF}')
    echo "✅ Found - $LIBPATH"
else
    echo "❌ NOT FOUND - libsystemd.so.0 missing"
    exit 1
fi

# 3. Check journalctl access
echo -n "3. journalctl access: "
if journalctl -n 1 >/dev/null 2>&1; then
    echo "✅ OK - Can read journal"
else
    echo "⚠️  LIMITED - journalctl exists but no read access"
fi

# 4. Check journal directories
echo "4. Journal directories:"
for dir in "/var/log/journal" "/run/log/journal"; do
    echo -n "   $dir: "
    if [ -d "$dir" ]; then
        if [ -r "$dir" ]; then
            echo "✅ Exists and readable"
        else
            echo "⚠️  Exists but NOT readable"
        fi
    else
        echo "❌ NOT FOUND"
    fi
done

# 5. Check systemd version
echo -n "5. systemd version: "
SYSTEMD_VERSION=$(systemctl --version | head -1 | grep -oP 'systemd \K\d+' || echo "0")
if [ "$SYSTEMD_VERSION" -ge 205 ]; then
    echo "✅ v$SYSTEMD_VERSION (meets minimum v205)"
else
    echo "⚠️  v$SYSTEMD_VERSION (older than recommended v205)"
fi

echo
echo "=== Check Complete ==="

Save as journald-prereq-check.sh and run:

chmod +x journald-prereq-check.sh
./journald-prereq-check.sh

Expected output:

=== Journald Collector Prerequisites Check ===

1. systemctl command: ✅ Found - systemd 257 (257.3-1-arch)
2. libsystemd.so.0: ✅ Found - /usr/lib/libsystemd.so.0
3. journalctl access: ✅ OK - Can read journal
4. Journal directories:
   /var/log/journal: ✅ Exists and readable
   /run/log/journal: ✅ Exists and readable
5. systemd version: ✅ v257 (meets minimum v205)

=== Check Complete ===

Possible troubleshooting solutions:

Issue Solution
systemctl: command not found Install systemd or use alternative log collection
libsystemd.so.0: cannot open Install systemd-libs: apt install libsystemd0 or yum install systemd-libs
journalctl: no read access Add user to systemd-journal group: usermod -aG systemd-journal $USER
/var/log/journal not found Enable persistent journal: mkdir -p /var/log/journal && systemd-tmpfiles --create

Configuration

Collector Configuration

After successfully installing and starting DataKit, enable the Journald collector by copying the configuration file:

Go to the conf.d/samples directory under the DataKit installation directory, copy journald.conf.sample and name it journald.conf. Examples are as follows:

# Collect systemd journal logs using external binary
[[inputs.journald]]
  ## Name of the collector
  name = 'journald'

  ## Run as daemon (required for journald collection)
  daemon = true

  http_endpoint = "http://localhost:9529"
  log_level = "info"
  log_path = "/usr/local/datakit/externals/journald.log"

  ## Path to datakit-journald binary
  ## Default: searches in /usr/local/datakit/externals/datakit-journald and ./externals/datakit-journald
  # cmd = "/usr/local/datakit/externals/datakit-journald"

  ## Interval to check external process (for non-daemon mode)
  # interval = "10s"

  ## Journal directory paths
  ## Host installation: use default paths
  ## Kubernetes: use /rootfs prefixed paths (e.g., /rootfs/var/log/journal)
  paths = [
    "/var/log/journal",      # Persistent storage
    "/run/log/journal",      # Runtime storage
  ]

  ## Filter by systemd unit names (supports glob patterns)
  ## Empty = all units
  # units = ["*.service", "docker.service", "kubelet.service"]

  ## Filter by priority levels
  ## Levels: emerg(0), alert(1), crit(2), err(3), warning(4), notice(5), info(6), debug(7)
  ## Empty = all priorities
  # priorities = ["err", "warning", "crit", "alert", "emerg"]

  ## Field selection - collect all by default, exclude specific fields
  exclude_fields = [
    "_BOOT_ID",
    "_MACHINE_ID",
    "__MONOTONIC_TIMESTAMP",
  ]

  ## Collection behavior
  ## tail_only=true: Only collect new entries (cursor not needed)
  ## tail_only=false: Read from last position (cursor required)
  tail_only = true
  max_entries_per_batch = 1000

  ## Cursor management (only used when tail_only=false)
  # save_cursor = true
  # cursor_file = "/usr/local/datakit/cache/journald.cursor"

  ## Environment variables for external binary
  # envs = [
  #   "LD_LIBRARY_PATH=/usr/local/datakit/externals:$LD_LIBRARY_PATH",
  # ]

  ## Additional arguments for external binary
  # args = []

  [inputs.journald.tags]
    # Add custom tags as needed
    # environment = "production"
    # cluster = "k8s-cluster-1"

After configuration, restart DataKit.

Configuration Options

Option Type Default Description
paths []string ["/var/log/journal", "/run/log/journal"] Journal directory paths
units []string [] Filter by systemd unit names (supports glob patterns, e.g., *.service)
priorities []string [] Filter by priority levels: emerg, alert, crit, err, warning, notice, info, debug
exclude_fields []string [] Journal fields to exclude from collection (e.g., _BOOT_ID, _MACHINE_ID)
tail_only bool true Only collect new entries (skip historical logs on startup)
max_entries_per_batch int 1000 Maximum number of entries to collect per batch
save_cursor bool true Persist read position to resume after restart
cursor_file string /usr/local/datakit/cache/journald.pos Path to store cursor position

Log Fields

journald

Systemd journal logs. Note: Field availability varies by systemd version - refer to version hints (e.g., v188+, v205+) in each field description

Tags & Fields Description
host
(tag)
Hostname (from _HOSTNAME, v188+)
service
(tag)
Service identifier (from SYSLOG_IDENTIFIER, _SYSTEMD_UNIT, or _COMM)
CODE_FILE Source code filename for debugging (v188+)
Type: string
Unit: N/A
CODE_FUNC Function name for debugging (v188+)
Type: string
Unit: N/A
CODE_LINE Source code line number for debugging (v188+)
Type: int
Unit: N/A
COREDUMP_CMDLINE Full command line at crash time (v188+)
Type: string
Unit: N/A
COREDUMP_CWD Current working directory at crash time (v188+)
Type: string
Unit: N/A
COREDUMP_EXE Executable path of crashed binary (v188+)
Type: string
Unit: N/A
COREDUMP_GID Crashed process GID (v188+)
Type: int
Unit: N/A
COREDUMP_HOSTNAME Hostname at crash time (v188+)
Type: string
Unit: N/A
COREDUMP_PID Crashed process PID (v188+)
Type: int
Unit: N/A
COREDUMP_ROOT Root directory, usually / (v188+)
Type: string
Unit: N/A
COREDUMP_SIGNAL Signal number that caused crash (v188+)
Type: int
Unit: N/A
COREDUMP_STACKTRACE Full stack trace backtrace (v188+)
Type: string
Unit: N/A
COREDUMP_TIMESTAMP Crash timestamp in microseconds (v188+)
Type: int
Unit: time,μs
COREDUMP_UID Crashed process UID (v188+)
Type: int
Unit: N/A
COREDUMP_UNIT System unit that crashed (v198+)
Type: string
Unit: N/A
COREDUMP_USER_UNIT User unit that crashed (v198+)
Type: string
Unit: N/A
DOCUMENTATION Documentation URL http/https/file/man/info (v246+)
Type: string
Unit: N/A
ERRNO Unix error number associated with message (v188+)
Type: int
Unit: N/A
INVOCATION_ID Invocation ID for systemd code messages (v245+)
Type: string
Unit: N/A
MESSAGE_ID 128-bit message identifier (UUID format, v188+)
Type: string
Unit: N/A
OBJECT_AUDIT_LOGINUID Target login UID (v205+)
Type: int
Unit: N/A
OBJECT_AUDIT_SESSION Target audit session ID (v205+)
Type: int
Unit: N/A
OBJECT_CMDLINE Target process full command line (v205+)
Type: string
Unit: N/A
OBJECT_COMM Target process comm (v205+)
Type: string
Unit: N/A
OBJECT_EXE Target process executable path (v205+)
Type: string
Unit: N/A
OBJECT_GID Target process GID (v205+)
Type: int
Unit: N/A
OBJECT_PID Target process PID, requires UID 0 to set (v205+)
Type: int
Unit: N/A
OBJECT_SYSTEMD_CGROUP Target cgroup path (v205+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_INVOCATION_ID Target invocation ID (v235+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_OWNER_UID Target session owner UID (v205+)
Type: int
Unit: N/A
OBJECT_SYSTEMD_SESSION Target session ID (v205+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_UNIT Target unit name (v205+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_USER_UNIT Target user unit name (v205+)
Type: string
Unit: N/A
OBJECT_UID Target process UID (v205+)
Type: int
Unit: N/A
SYSLOG_FACILITY Syslog facility 0-23 (v188+)
Type: int
Unit: N/A
SYSLOG_PID Client PID from syslog, may differ from _PID (v188+)
Type: int
Unit: N/A
SYSLOG_RAW Original syslog line if MESSAGE modified or timestamp lost (v240+)
Type: string
Unit: N/A
SYSLOG_TIMESTAMP Original syslog timestamp as received (v188+)
Type: string
Unit: N/A
TID Thread ID numeric (v247+)
Type: int
Unit: N/A
UNIT Unit name user-provided alternative to _SYSTEMD_UNIT (v251+)
Type: string
Unit: N/A
USER_INVOCATION_ID User invocation ID for user manager messages (v245+)
Type: string
Unit: N/A
USER_UNIT User unit user-provided alternative to _SYSTEMD_USER_UNIT (v251+)
Type: string
Unit: N/A
_AUDIT_LOGINUID Login UID from kernel audit (v188+)
Type: int
Unit: N/A
_AUDIT_SESSION Audit session ID from kernel (v188+)
Type: int
Unit: N/A
_BOOT_ID Boot ID 128-bit hex UUID (v188+)
Type: string
Unit: N/A
_CAP_EFFECTIVE Effective capabilities bitmask (v206+)
Type: int
Unit: N/A
_CMDLINE Full command line, most complete process info (v188+)
Type: string
Unit: N/A
_COMM Command name truncated to 15 chars (v188+)
Type: string
Unit: N/A
_CONTAINER_ID Container ID for nspawn/containers (v205+)
Type: string
Unit: N/A
_CONTAINER_IMAGE Container image for nspawn/containers (v205+)
Type: string
Unit: N/A
_CONTAINER_NAME Container name for nspawn/containers (v205+)
Type: string
Unit: N/A
_EXE Executable path, full path (v188+)
Type: string
Unit: N/A
_GID Group ID, trusted (v188+)
Type: int
Unit: N/A
_KERNEL_DEVICE Kernel device name format: bM:N, cM:N, nN, +subsys:name (v189+)
Type: string
Unit: N/A
_KERNEL_SUBSYSTEM Kernel subsystem e.g. block, net (v189+)
Type: string
Unit: N/A
_LINE_BREAK Line termination info: nul, line-max, eof, pid-change (v235+)
Type: string
Unit: N/A
_MACHINE_ID Machine ID from /etc/machine-id (v188+)
Type: string
Unit: N/A
_NAMESPACE Journal namespace ID (v245+)
Type: string
Unit: N/A
_RUNTIME_SCOPE Runtime scope: initrd, system, or user (v252+)
Type: string
Unit: N/A
_SELINUX_CONTEXT SELinux security context label (v188+)
Type: string
Unit: N/A
_SOURCE_BOOTTIME_TIMESTAMP Boottime timestamp in microseconds CLOCK_BOOTTIME (v257+)
Type: int
Unit: time,μs
_SOURCE_REALTIME_TIMESTAMP Source timestamp in microseconds CLOCK_REALTIME (v188+)
Type: int
Unit: time,μs
_STREAM_ID Stream connection ID 128-bit UUID for stdout streams (v235+)
Type: string
Unit: N/A
_SYSTEMD_CGROUP Control group path (v188+)
Type: string
Unit: N/A
_SYSTEMD_INVOCATION_ID Unit invocation ID unique per unit start (v233+)
Type: string
Unit: N/A
_SYSTEMD_OWNER_UID Session owner UID (v188+)
Type: int
Unit: N/A
_SYSTEMD_SESSION Login session ID (v188+)
Type: string
Unit: N/A
_SYSTEMD_SLICE Slice unit name e.g. system.slice (v188+)
Type: string
Unit: N/A
_SYSTEMD_UNIT Unit name e.g. sshd.service (v188+)
Type: string
Unit: N/A
_SYSTEMD_USER_SLICE User slice name e.g. user.slice (v188+)
Type: string
Unit: N/A
_SYSTEMD_USER_UNIT User unit name for user sessions (v188+)
Type: string
Unit: N/A
_TRANSPORT How entry was received: audit, driver, syslog, journal, stdout, kernel (v205+)
Type: string
Unit: N/A
_UDEV_DEVLINK Symlinks to device, can appear multiple times (v189+)
Type: string
Unit: N/A
_UDEV_DEVNODE Device node in /dev/ full path (v189+)
Type: string
Unit: N/A
_UDEV_SYSNAME Device name in /sys/ (v189+)
Type: string
Unit: N/A
_UID User ID, trusted cannot be spoofed (v188+)
Type: int
Unit: N/A
__CURSOR Entry cursor, address field export only (v188+)
Type: string
Unit: N/A
__MONOTONIC_TIMESTAMP Monotonic timestamp in microseconds, address field export only (v188+)
Type: int
Unit: time,μs
__REALTIME_TIMESTAMP Reception timestamp in microseconds, address field export only (v188+)
Type: int
Unit: time,μs
__SEQNUM Sequence number, address field export only (v254+)
Type: int
Unit: N/A
__SEQNUM_ID Sequence ID, address field export only (v254+)
Type: string
Unit: N/A
journald_timestamp Journal entry timestamp in nanoseconds (from _SOURCE_REALTIME_TIMESTAMP or __REALTIME_TIMESTAMP, v188+)
Type: int
Unit: time,ns
message Log message content (from MESSAGE, v188+)
Type: string
Unit: N/A
pid Process ID (from _PID or SYSLOG_PID, v188+)
Type: int
Unit: N/A
priority Numeric priority level 0-7 (from PRIORITY, v188+)
Type: int
Unit: N/A
status Log status level mapped from priority: error, warn, critical, notice, info, debug, unknown
Type: string
Unit: N/A

Common Use Cases

  • Collect logs from specific services
[[inputs.journald]]
  units = ["nginx.service", "mysql.service", "docker.service"]
  priorities = ["err", "crit", "alert", "emerg"]
  tail_only = true
  • Exclude verbose fields
[[inputs.journald]]
  exclude_fields = [
    "_BOOT_ID",
    "_MACHINE_ID",
    "__MONOTONIC_TIMESTAMP",
    "_AUDIT_SESSION",
    "_AUDIT_LOGINUID",
  ]
  • Kubernetes node journal collection
[[inputs.journald]]
  paths = ["/rootfs/var/log/journal", "/rootfs/run/log/journal"]
  tail_only = true
  • Collect all logs (debugging)
[[inputs.journald]]
  tail_only = false
  max_entries_per_batch = 500
  exclude_fields = []

Troubleshooting

Permission errors

Ensure DataKit has read access to journal files:

# Add datakit user to systemd-journal group
sudo usermod -aG systemd-journal datakit

# Restart DataKit
sudo systemctl restart datakit

No logs collected

  1. Verify journald is running:
systemctl status systemd-journald
  1. Check journal files exist:
ls -la /var/log/journal/
ls -la /run/log/journal/
  1. Test with journalctl:
journalctl -n 10

Cursor file issues

If the cursor file becomes corrupted (e.g., after host reboot), the collector automatically falls back to tail mode and creates a new cursor. To manually reset:

# Remove cursor file
rm /usr/local/datakit/cache/journald.pos

# Restart DataKit
sudo systemctl restart datakit

High memory usage

Default batch size is 1000 entries. If memory usage is a concern, reduce the batch size:

[[inputs.journald]]
  max_entries_per_batch = 100