Journald
The Journald collector is used to collect logs from the systemd journal (journald) on Linux systems. It uses an external binary wrapper to interface with libsystemd and efficiently collects structured log entries from the journal.
Prerequisites¶
- Linux only: Requires
systemdandjournald - libsystemd: External binary requires
libsystemddevelopment libraries - Permissions: DataKit needs read access to journal files (typically requires joining
systemd-journalgroup)
System Requirements Check¶
Before deploying the journald collector, verify your system meets the requirements:
Quick check with one-liner:
systemctl --version >/dev/null 2>&1 && journalctl -n 1 >/dev/null 2>&1 && echo "Systemd OK" || echo "Systemd not available"
Comprehensive pre-flight check script:
journald-prereq-check.sh
#!/bin/bash
# journald-prereq-check.sh - Verify systemd requirements
echo "=== Journald Collector Prerequisites Check ==="
echo
# 1. Check if systemctl exists
echo -n "1. systemctl command: "
if command -v systemctl >/dev/null 2>&1; then
VERSION=$(systemctl --version | head -1)
echo "✅ Found - $VERSION"
else
echo "❌ NOT FOUND - systemctl not installed"
exit 1
fi
# 2. Check libsystemd library
echo -n "2. libsystemd.so.0: "
if ldconfig -p 2>/dev/null | grep -q "libsystemd.so.0"; then
LIBPATH=$(ldconfig -p 2>/dev/null | grep "libsystemd.so.0" | head -1 | awk '{print $NF}')
echo "✅ Found - $LIBPATH"
else
echo "❌ NOT FOUND - libsystemd.so.0 missing"
exit 1
fi
# 3. Check journalctl access
echo -n "3. journalctl access: "
if journalctl -n 1 >/dev/null 2>&1; then
echo "✅ OK - Can read journal"
else
echo "⚠️ LIMITED - journalctl exists but no read access"
fi
# 4. Check journal directories
echo "4. Journal directories:"
for dir in "/var/log/journal" "/run/log/journal"; do
echo -n " $dir: "
if [ -d "$dir" ]; then
if [ -r "$dir" ]; then
echo "✅ Exists and readable"
else
echo "⚠️ Exists but NOT readable"
fi
else
echo "❌ NOT FOUND"
fi
done
# 5. Check systemd version
echo -n "5. systemd version: "
SYSTEMD_VERSION=$(systemctl --version | head -1 | grep -oP 'systemd \K\d+' || echo "0")
if [ "$SYSTEMD_VERSION" -ge 205 ]; then
echo "✅ v$SYSTEMD_VERSION (meets minimum v205)"
else
echo "⚠️ v$SYSTEMD_VERSION (older than recommended v205)"
fi
echo
echo "=== Check Complete ==="
Save as journald-prereq-check.sh and run:
Expected output:
=== Journald Collector Prerequisites Check ===
1. systemctl command: ✅ Found - systemd 257 (257.3-1-arch)
2. libsystemd.so.0: ✅ Found - /usr/lib/libsystemd.so.0
3. journalctl access: ✅ OK - Can read journal
4. Journal directories:
/var/log/journal: ✅ Exists and readable
/run/log/journal: ✅ Exists and readable
5. systemd version: ✅ v257 (meets minimum v205)
=== Check Complete ===
Possible troubleshooting solutions:
| Issue | Solution |
|---|---|
systemctl: command not found |
Install systemd or use alternative log collection |
libsystemd.so.0: cannot open |
Install systemd-libs: apt install libsystemd0 or yum install systemd-libs |
journalctl: no read access |
Add user to systemd-journal group: usermod -aG systemd-journal $USER |
/var/log/journal not found |
Enable persistent journal: mkdir -p /var/log/journal && systemd-tmpfiles --create |
Configuration¶
Collector Configuration¶
After successfully installing and starting DataKit, enable the Journald collector by copying the configuration file:
Go to the conf.d/samples directory under the DataKit installation directory, copy journald.conf.sample and name it journald.conf. Examples are as follows:
# Collect systemd journal logs using external binary
[[inputs.journald]]
## Name of the collector
name = 'journald'
## Run as daemon (required for journald collection)
daemon = true
http_endpoint = "http://localhost:9529"
log_level = "info"
log_path = "/usr/local/datakit/externals/journald.log"
## Path to datakit-journald binary
## Default: searches in /usr/local/datakit/externals/datakit-journald and ./externals/datakit-journald
# cmd = "/usr/local/datakit/externals/datakit-journald"
## Interval to check external process (for non-daemon mode)
# interval = "10s"
## Rootfs mount point for container/Kubernetes mode only
## DataKit uses this as the host root prefix when auto-prefixing absolute paths
## and preparing host-side systemd libraries (copy_node_libs).
mount_dir = "/rootfs"
## Journal directory paths
## Host installation: use default paths
## Container/Kubernetes: DataKit auto-prefixes absolute paths with mount_dir.
paths = [
"/var/log/journal", # Persistent storage
"/run/log/journal", # Runtime storage
]
## Filter by systemd unit names (supports glob patterns)
## Empty = all units
# units = ["*.service", "docker.service", "kubelet.service"]
## Filter by priority levels
## Levels: emerg(0), alert(1), crit(2), err(3), warning(4), notice(5), info(6), debug(7)
## Empty = all priorities
# priorities = ["err", "warning", "crit", "alert", "emerg"]
## Field selection - collect all by default, exclude specific fields
exclude_fields = [
"_BOOT_ID",
"_MACHINE_ID",
"__MONOTONIC_TIMESTAMP",
]
## Collection behavior
## tail_only=true: Only collect new entries (cursor not needed)
## tail_only=false: Read from last position (cursor required)
tail_only = true
max_entries_per_batch = 1000
## Cursor management (only used when tail_only=false)
# save_cursor = true
# cursor_file = "/usr/local/datakit/cache/journald.cursor"
## Environment variables for external binary
# envs = [
# "LD_LIBRARY_PATH=/usr/local/datakit/externals:$LD_LIBRARY_PATH",
# ]
## Host-side systemd library prepare:
## - Container/Kubernetes (Docker or Kubernetes): auto forced to true.
## - Non-container host: disabled by default. If enabled manually, set copy_node_libs_files explicitly.
## - In container/kubernetes mode, when copy_node_libs_files is empty, DataKit first copies
## libsystemd.so* then runs "LD_LIBRARY_PATH=<dst> ldd libsystemd.so.0"
## style dependency probing and copies missing .so files automatically.
# copy_node_libs = true
## Optional override file list. If set, only these patterns/files are copied.
# copy_node_libs_files = [
# "libsystemd.so*",
# "liblz4.so*",
# "libzstd.so*",
# "liblzma.so*",
# "libcap.so*",
# "libgcrypt.so*",
# "libgpg-error.so*",
# "libselinux.so*",
# "libmount.so*",
# "libblkid.so*",
# "libacl.so*",
# "libpcre2-8.so*",
# "libpcre.so*",
# ]
## Additional arguments for external binary
# args = []
[inputs.journald.tags]
# Add custom tags as needed
# environment = "production"
# cluster = "k8s-cluster-1"
After configuration, restart DataKit.
Can be turned on by ConfigMap Injection Collector Configuration or Config ENV_DATAKIT_INPUTS.
Configuration Options¶
| Option | Type | Default | Description |
|---|---|---|---|
paths |
[]string | ["/var/log/journal", "/run/log/journal"] |
Journal directory paths |
units |
[]string | [] |
Filter by systemd unit names (supports glob patterns, e.g., *.service) |
priorities |
[]string | [] |
Filter by priority levels: emerg, alert, crit, err, warning, notice, info, debug |
exclude_fields |
[]string | [] |
Journal fields to exclude from collection (e.g., _BOOT_ID, _MACHINE_ID) |
tail_only |
bool | true |
Only collect new entries (skip historical logs on startup) |
max_entries_per_batch |
int | 1000 |
Maximum number of entries to collect per batch |
save_cursor |
bool | true |
Persist read position to resume after restart |
cursor_file |
string | /usr/local/datakit/cache/journald.pos |
Path to store cursor position |
mount_dir |
string | "/rootfs" |
Rootfs mount directory used in container/Kubernetes mode only. DataKit uses this prefix for absolute paths and as source root for host-side library prepare |
copy_node_libs |
bool | false (auto forced to true in container or Kubernetes mode) |
Whether to copy host-side dynamic libraries from mount dir into DataKit-managed external-libs before starting the external collector. In container or Kubernetes environments (datakit.Docker || config.IsKubernetes()), DataKit auto-enables this |
copy_node_libs_files |
[]string | [] |
Dynamic library file names or glob patterns to copy. If configured, only these are copied. If empty in container/Kubernetes auto mode, DataKit first copies libsystemd.so*, then runs LD_LIBRARY_PATH=/usr/local/datakit/externals/systemd-libs ldd libsystemd.so.0-style dependency probing and copies missing .so automatically. If empty outside container/Kubernetes mode while copy_node_libs=true, startup fails with configuration error |
Log Fields¶
journald¶
Systemd journal logs. Note: Field availability varies by systemd version - refer to version hints (e.g., v188+, v205+) in each field description
| Tags & Fields | Description |
|---|---|
| host ( tag) |
Hostname (from _HOSTNAME, v188+) |
| service ( tag) |
Service identifier (from SYSLOG_IDENTIFIER, _SYSTEMD_UNIT, or _COMM) |
| CODE_FILE | Source code filename for debugging (v188+) Type: string Unit: N/A |
| CODE_FUNC | Function name for debugging (v188+) Type: string Unit: N/A |
| CODE_LINE | Source code line number for debugging (v188+) Type: int Unit: N/A |
| COREDUMP_CMDLINE | Full command line at crash time (v188+) Type: string Unit: N/A |
| COREDUMP_CWD | Current working directory at crash time (v188+) Type: string Unit: N/A |
| COREDUMP_EXE | Executable path of crashed binary (v188+) Type: string Unit: N/A |
| COREDUMP_GID | Crashed process GID (v188+) Type: int Unit: N/A |
| COREDUMP_HOSTNAME | Hostname at crash time (v188+) Type: string Unit: N/A |
| COREDUMP_PID | Crashed process PID (v188+) Type: int Unit: N/A |
| COREDUMP_ROOT | Root directory, usually / (v188+) Type: string Unit: N/A |
| COREDUMP_SIGNAL | Signal number that caused crash (v188+) Type: int Unit: N/A |
| COREDUMP_STACKTRACE | Full stack trace backtrace (v188+) Type: string Unit: N/A |
| COREDUMP_TIMESTAMP | Crash timestamp in microseconds (v188+) Type: int Unit: time,μs |
| COREDUMP_UID | Crashed process UID (v188+) Type: int Unit: N/A |
| COREDUMP_UNIT | System unit that crashed (v198+) Type: string Unit: N/A |
| COREDUMP_USER_UNIT | User unit that crashed (v198+) Type: string Unit: N/A |
| DOCUMENTATION | Documentation URL http/https/file/man/info (v246+) Type: string Unit: N/A |
| ERRNO | Unix error number associated with message (v188+) Type: int Unit: N/A |
| INVOCATION_ID | Invocation ID for systemd code messages (v245+) Type: string Unit: N/A |
| MESSAGE_ID | 128-bit message identifier (UUID format, v188+)Type: string Unit: N/A |
| OBJECT_AUDIT_LOGINUID | Target login UID (v205+) Type: int Unit: N/A |
| OBJECT_AUDIT_SESSION | Target audit session ID (v205+) Type: int Unit: N/A |
| OBJECT_CMDLINE | Target process full command line (v205+) Type: string Unit: N/A |
| OBJECT_COMM | Target process comm (v205+) Type: string Unit: N/A |
| OBJECT_EXE | Target process executable path (v205+) Type: string Unit: N/A |
| OBJECT_GID | Target process GID (v205+) Type: int Unit: N/A |
| OBJECT_PID | Target process PID, requires UID 0 to set (v205+) Type: int Unit: N/A |
| OBJECT_SYSTEMD_CGROUP | Target cgroup path (v205+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_INVOCATION_ID | Target invocation ID (v235+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_OWNER_UID | Target session owner UID (v205+) Type: int Unit: N/A |
| OBJECT_SYSTEMD_SESSION | Target session ID (v205+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_UNIT | Target unit name (v205+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_USER_UNIT | Target user unit name (v205+) Type: string Unit: N/A |
| OBJECT_UID | Target process UID (v205+) Type: int Unit: N/A |
| SYSLOG_FACILITY | Syslog facility 0-23 (v188+) Type: int Unit: N/A |
| SYSLOG_PID | Client PID from syslog, may differ from _PID (v188+)Type: int Unit: N/A |
| SYSLOG_RAW | Original syslog line if MESSAGE modified or timestamp lost (v240+)Type: string Unit: N/A |
| SYSLOG_TIMESTAMP | Original syslog timestamp as received (v188+) Type: string Unit: N/A |
| TID | Thread ID numeric (v247+) Type: int Unit: N/A |
| UNIT | Unit name user-provided alternative to _SYSTEMD_UNIT (v251+)Type: string Unit: N/A |
| USER_INVOCATION_ID | User invocation ID for user manager messages (v245+) Type: string Unit: N/A |
| USER_UNIT | User unit user-provided alternative to _SYSTEMD_USER_UNIT (v251+)Type: string Unit: N/A |
| _AUDIT_LOGINUID | Login UID from kernel audit (v188+) Type: int Unit: N/A |
| _AUDIT_SESSION | Audit session ID from kernel (v188+) Type: int Unit: N/A |
| _BOOT_ID | Boot ID 128-bit hex UUID (v188+)Type: string Unit: N/A |
| _CAP_EFFECTIVE | Effective capabilities bitmask (v206+) Type: int Unit: N/A |
| _CMDLINE | Full command line, most complete process info (v188+) Type: string Unit: N/A |
| _COMM | Command name truncated to 15 chars (v188+) Type: string Unit: N/A |
| _CONTAINER_ID | Container ID for nspawn/containers (v205+) Type: string Unit: N/A |
| _CONTAINER_IMAGE | Container image for nspawn/containers (v205+) Type: string Unit: N/A |
| _CONTAINER_NAME | Container name for nspawn/containers (v205+) Type: string Unit: N/A |
| _EXE | Executable path, full path (v188+) Type: string Unit: N/A |
| _GID | Group ID, trusted (v188+) Type: int Unit: N/A |
| _KERNEL_DEVICE | Kernel device name format: bM:N, cM:N, nN, +subsys:name (v189+)Type: string Unit: N/A |
| _KERNEL_SUBSYSTEM | Kernel subsystem e.g. block, net (v189+)Type: string Unit: N/A |
| _LINE_BREAK | Line termination info: nul, line-max, eof, pid-change (v235+)Type: string Unit: N/A |
| _MACHINE_ID | Machine ID from /etc/machine-id (v188+)Type: string Unit: N/A |
| _NAMESPACE | Journal namespace ID (v245+) Type: string Unit: N/A |
| _RUNTIME_SCOPE | Runtime scope: initrd, system, or user (v252+)Type: string Unit: N/A |
| _SELINUX_CONTEXT | SELinux security context label (v188+) Type: string Unit: N/A |
| _SOURCE_BOOTTIME_TIMESTAMP | Boottime timestamp in microseconds CLOCK_BOOTTIME (v257+)Type: int Unit: time,μs |
| _SOURCE_REALTIME_TIMESTAMP | Source timestamp in microseconds CLOCK_REALTIME (v188+)Type: int Unit: time,μs |
| _STREAM_ID | Stream connection ID 128-bit UUID for stdout streams (v235+)Type: string Unit: N/A |
| _SYSTEMD_CGROUP | Control group path (v188+) Type: string Unit: N/A |
| _SYSTEMD_INVOCATION_ID | Unit invocation ID unique per unit start (v233+) Type: string Unit: N/A |
| _SYSTEMD_OWNER_UID | Session owner UID (v188+) Type: int Unit: N/A |
| _SYSTEMD_SESSION | Login session ID (v188+) Type: string Unit: N/A |
| _SYSTEMD_SLICE | Slice unit name e.g. system.slice (v188+)Type: string Unit: N/A |
| _SYSTEMD_UNIT | Unit name e.g. sshd.service (v188+)Type: string Unit: N/A |
| _SYSTEMD_USER_SLICE | User slice name e.g. user.slice (v188+)Type: string Unit: N/A |
| _SYSTEMD_USER_UNIT | User unit name for user sessions (v188+) Type: string Unit: N/A |
| _TRANSPORT | How entry was received: audit, driver, syslog, journal, stdout, kernel (v205+)Type: string Unit: N/A |
| _UDEV_DEVLINK | Symlinks to device, can appear multiple times (v189+) Type: string Unit: N/A |
| _UDEV_DEVNODE | Device node in /dev/ full path (v189+) Type: string Unit: N/A |
| _UDEV_SYSNAME | Device name in /sys/ (v189+) Type: string Unit: N/A |
| _UID | User ID, trusted cannot be spoofed (v188+) Type: int Unit: N/A |
| __CURSOR | Entry cursor, address field export only (v188+) Type: string Unit: N/A |
| __MONOTONIC_TIMESTAMP | Monotonic timestamp in microseconds, address field export only (v188+) Type: int Unit: time,μs |
| __REALTIME_TIMESTAMP | Reception timestamp in microseconds, address field export only (v188+) Type: int Unit: time,μs |
| __SEQNUM | Sequence number, address field export only (v254+) Type: int Unit: N/A |
| __SEQNUM_ID | Sequence ID, address field export only (v254+) Type: string Unit: N/A |
| journald_timestamp | Journal entry timestamp in nanoseconds (from _SOURCE_REALTIME_TIMESTAMP or __REALTIME_TIMESTAMP, v188+)Type: int Unit: time,ns |
| message | Log message content (from MESSAGE, v188+)Type: string Unit: N/A |
| pid | Process ID (from _PID or SYSLOG_PID, v188+)Type: int Unit: N/A |
| priority | Numeric priority level 0-7 (from PRIORITY, v188+)Type: int Unit: N/A |
| status | Log status level mapped from priority: error, warn, critical, notice, info, debug, unknownType: string Unit: N/A |
Common Use Cases¶
- Collect logs from specific services
[[inputs.journald]]
units = ["nginx.service", "mysql.service", "docker.service"]
priorities = ["err", "crit", "alert", "emerg"]
tail_only = true
- Exclude verbose fields
[[inputs.journald]]
exclude_fields = [
"_BOOT_ID",
"_MACHINE_ID",
"__MONOTONIC_TIMESTAMP",
"_AUDIT_SESSION",
"_AUDIT_LOGINUID",
]
- Kubernetes node journal collection (auto mode)
Notes:
- The collector resolves candidate directories in configuration order and tries to open the first readable journal directory first
- In container or Kubernetes environments (
datakit.Docker || config.IsKubernetes()), DataKit auto-enables journald rootfs mode - In container/Kubernetes mode, absolute paths are automatically prefixed with
mount_dir(default"/rootfs") - If the configured path is a journal root such as
<mount_dir>/var/log/journal, the collector automatically descends into the machine-id subdirectory before opening it -
In containerized node environments such as kind or k3d, validate
loggerandjournalctlinside the node container rather than on the outer host -
Kubernetes node journal collection with host-side systemd library prepare
[[inputs.journald]]
mount_dir = "/rootfs"
paths = ["/var/log/journal", "/run/log/journal"]
tail_only = true
copy_node_libs = true
copy_node_libs_files = [
"libsystemd.so*",
"liblz4.so*",
"libzstd.so*",
"liblzma.so*",
"libcap.so*",
"libgcrypt.so*",
"libgpg-error.so*",
"libselinux.so*",
"libmount.so*",
"libblkid.so*",
"libacl.so*",
"libpcre2-8.so*",
"libpcre.so*",
]
- Collect all logs (debugging)
Troubleshooting¶
Permission errors¶
Ensure DataKit has read access to journal files:
# Add datakit user to systemd-journal group
sudo usermod -aG systemd-journal datakit
# Restart DataKit
sudo systemctl restart datakit
No logs collected¶
- Verify journald is running:
- Check journal files exist:
- If
journalctlis available in the current environment, use it for extra validation; if the container does not shipjournalctl, rely on the DataKit compatibility warning and probe result directly:
If startup logs report reason=unsupported-format, the collector runtime is older than the target journal file format. In this case DataKit keeps the journald collector inactive and logs a warning instead of collecting partial or misleading results.
This can happen in Kubernetes whenever DataKit collects journal files from the node while the container image ships an older libsystemd than the host journal format requires. Typical symptoms are:
- If
journalctlis installed inside the Pod, it may reportunsupported feature - DataKit starts, but the journald collector stays inactive after the compatibility warning
In container or Kubernetes environments (datakit.Docker || config.IsKubernetes()), DataKit already auto-enables host-side systemd library prepare. If you need this behavior on non-container hosts, enable:
When enabled, DataKit copies dynamic libraries from candidate system library directories under mount_dir (default "/rootfs") into its own external-libs directory, then prepends that directory to LD_LIBRARY_PATH automatically.
Copy behavior details:
- If
copy_node_libs_filesis configured and non-empty, DataKit copies only that list. - If
copy_node_libs_filesis empty in container/Kubernetes auto mode, DataKit first copieslibsystemd.so*, then probes missing dependencies withldd libsystemd.so.0under the copied library path, and copies the missing.sofiles automatically. - If
copy_node_libs_filesis empty on non-container and non-Kubernetes hosts whilecopy_node_libs=true, DataKit reports a configuration error and keeps the collector inactive. - If library prepare fails while
copy_node_libsis enabled, the journald collector stays inactive (other DataKit collectors are not affected).
After the collector opens the journal successfully, it also logs the effective libsystemd path in external journald.log, for example:
Constraints:
- The host
libsystemdis not guaranteed to be compatible with the journald external binary currently shipped in DataKit - If the host
libsystemdis too old, the external binary may fail during dynamic linking because of missing symbols or version mismatches - If the host
libsystemdis newer, it may still fail later withunsupported featurewhen reading journal files - Therefore,
copy_node_libsis only a preparation mechanism, not a guarantee that the copied libraries are compatible; the final result still needs to be verified from startup logs and probe results
Do not point LD_LIBRARY_PATH at the entire host /usr/lib64 directory. That can also pull incompatible glibc components into the collector process and create a less predictable failure mode.
If startup logs contain:
the collector is using directory-based journal opening, which is the recommended path for live journals. Avoid configuring individual .journal files as the primary input path.
Cursor file issues¶
If the cursor file becomes corrupted (e.g., after host reboot), the collector automatically falls back to tail mode and creates a new cursor. To manually reset:
# Remove cursor file
rm /usr/local/datakit/cache/journald.pos
# Restart DataKit
sudo systemctl restart datakit
High memory usage¶
Default batch size is 1000 entries. If memory usage is a concern, reduce the batch size: