Journald
The Journald collector is used to collect logs from the systemd journal (journald) on Linux systems. It uses an external binary wrapper to interface with libsystemd and efficiently collects structured log entries from the journal.
Prerequisites¶
- Linux only: Requires
systemdandjournald - libsystemd: External binary requires
libsystemddevelopment libraries - Permissions: DataKit needs read access to journal files (typically requires joining
systemd-journalgroup)
System Requirements Check¶
Before deploying the journald collector, verify your system meets the requirements:
Quick check with one-liner:
systemctl --version >/dev/null 2>&1 && journalctl -n 1 >/dev/null 2>&1 && echo "Systemd OK" || echo "Systemd not available"
Comprehensive pre-flight check script:
journald-prereq-check.sh
#!/bin/bash
# journald-prereq-check.sh - Verify systemd requirements
echo "=== Journald Collector Prerequisites Check ==="
echo
# 1. Check if systemctl exists
echo -n "1. systemctl command: "
if command -v systemctl >/dev/null 2>&1; then
VERSION=$(systemctl --version | head -1)
echo "✅ Found - $VERSION"
else
echo "❌ NOT FOUND - systemctl not installed"
exit 1
fi
# 2. Check libsystemd library
echo -n "2. libsystemd.so.0: "
if ldconfig -p 2>/dev/null | grep -q "libsystemd.so.0"; then
LIBPATH=$(ldconfig -p 2>/dev/null | grep "libsystemd.so.0" | head -1 | awk '{print $NF}')
echo "✅ Found - $LIBPATH"
else
echo "❌ NOT FOUND - libsystemd.so.0 missing"
exit 1
fi
# 3. Check journalctl access
echo -n "3. journalctl access: "
if journalctl -n 1 >/dev/null 2>&1; then
echo "✅ OK - Can read journal"
else
echo "⚠️ LIMITED - journalctl exists but no read access"
fi
# 4. Check journal directories
echo "4. Journal directories:"
for dir in "/var/log/journal" "/run/log/journal"; do
echo -n " $dir: "
if [ -d "$dir" ]; then
if [ -r "$dir" ]; then
echo "✅ Exists and readable"
else
echo "⚠️ Exists but NOT readable"
fi
else
echo "❌ NOT FOUND"
fi
done
# 5. Check systemd version
echo -n "5. systemd version: "
SYSTEMD_VERSION=$(systemctl --version | head -1 | grep -oP 'systemd \K\d+' || echo "0")
if [ "$SYSTEMD_VERSION" -ge 205 ]; then
echo "✅ v$SYSTEMD_VERSION (meets minimum v205)"
else
echo "⚠️ v$SYSTEMD_VERSION (older than recommended v205)"
fi
echo
echo "=== Check Complete ==="
Save as journald-prereq-check.sh and run:
Expected output:
=== Journald Collector Prerequisites Check ===
1. systemctl command: ✅ Found - systemd 257 (257.3-1-arch)
2. libsystemd.so.0: ✅ Found - /usr/lib/libsystemd.so.0
3. journalctl access: ✅ OK - Can read journal
4. Journal directories:
/var/log/journal: ✅ Exists and readable
/run/log/journal: ✅ Exists and readable
5. systemd version: ✅ v257 (meets minimum v205)
=== Check Complete ===
Possible troubleshooting solutions:
| Issue | Solution |
|---|---|
systemctl: command not found |
Install systemd or use alternative log collection |
libsystemd.so.0: cannot open |
Install systemd-libs: apt install libsystemd0 or yum install systemd-libs |
journalctl: no read access |
Add user to systemd-journal group: usermod -aG systemd-journal $USER |
/var/log/journal not found |
Enable persistent journal: mkdir -p /var/log/journal && systemd-tmpfiles --create |
Configuration¶
Collector Configuration¶
After successfully installing and starting DataKit, enable the Journald collector by copying the configuration file:
Go to the conf.d/samples directory under the DataKit installation directory, copy journald.conf.sample and name it journald.conf. Examples are as follows:
# Collect systemd journal logs using external binary
[[inputs.journald]]
## Name of the collector
name = 'journald'
## Run as daemon (required for journald collection)
daemon = true
http_endpoint = "http://localhost:9529"
log_level = "info"
log_path = "/usr/local/datakit/externals/journald.log"
## Path to datakit-journald binary
## Default: searches in /usr/local/datakit/externals/datakit-journald and ./externals/datakit-journald
# cmd = "/usr/local/datakit/externals/datakit-journald"
## Interval to check external process (for non-daemon mode)
# interval = "10s"
## Journal directory paths
## Host installation: use default paths
## Kubernetes: use /rootfs prefixed paths (e.g., /rootfs/var/log/journal)
paths = [
"/var/log/journal", # Persistent storage
"/run/log/journal", # Runtime storage
]
## Filter by systemd unit names (supports glob patterns)
## Empty = all units
# units = ["*.service", "docker.service", "kubelet.service"]
## Filter by priority levels
## Levels: emerg(0), alert(1), crit(2), err(3), warning(4), notice(5), info(6), debug(7)
## Empty = all priorities
# priorities = ["err", "warning", "crit", "alert", "emerg"]
## Field selection - collect all by default, exclude specific fields
exclude_fields = [
"_BOOT_ID",
"_MACHINE_ID",
"__MONOTONIC_TIMESTAMP",
]
## Collection behavior
## tail_only=true: Only collect new entries (cursor not needed)
## tail_only=false: Read from last position (cursor required)
tail_only = true
max_entries_per_batch = 1000
## Cursor management (only used when tail_only=false)
# save_cursor = true
# cursor_file = "/usr/local/datakit/cache/journald.cursor"
## Environment variables for external binary
# envs = [
# "LD_LIBRARY_PATH=/usr/local/datakit/externals:$LD_LIBRARY_PATH",
# ]
## Additional arguments for external binary
# args = []
[inputs.journald.tags]
# Add custom tags as needed
# environment = "production"
# cluster = "k8s-cluster-1"
After configuration, restart DataKit.
Can be turned on by ConfigMap Injection Collector Configuration or Config ENV_DATAKIT_INPUTS.
Configuration Options¶
| Option | Type | Default | Description |
|---|---|---|---|
paths |
[]string | ["/var/log/journal", "/run/log/journal"] |
Journal directory paths |
units |
[]string | [] |
Filter by systemd unit names (supports glob patterns, e.g., *.service) |
priorities |
[]string | [] |
Filter by priority levels: emerg, alert, crit, err, warning, notice, info, debug |
exclude_fields |
[]string | [] |
Journal fields to exclude from collection (e.g., _BOOT_ID, _MACHINE_ID) |
tail_only |
bool | true |
Only collect new entries (skip historical logs on startup) |
max_entries_per_batch |
int | 1000 |
Maximum number of entries to collect per batch |
save_cursor |
bool | true |
Persist read position to resume after restart |
cursor_file |
string | /usr/local/datakit/cache/journald.pos |
Path to store cursor position |
Log Fields¶
journald¶
Systemd journal logs. Note: Field availability varies by systemd version - refer to version hints (e.g., v188+, v205+) in each field description
| Tags & Fields | Description |
|---|---|
| host ( tag) |
Hostname (from _HOSTNAME, v188+) |
| service ( tag) |
Service identifier (from SYSLOG_IDENTIFIER, _SYSTEMD_UNIT, or _COMM) |
| CODE_FILE | Source code filename for debugging (v188+) Type: string Unit: N/A |
| CODE_FUNC | Function name for debugging (v188+) Type: string Unit: N/A |
| CODE_LINE | Source code line number for debugging (v188+) Type: int Unit: N/A |
| COREDUMP_CMDLINE | Full command line at crash time (v188+) Type: string Unit: N/A |
| COREDUMP_CWD | Current working directory at crash time (v188+) Type: string Unit: N/A |
| COREDUMP_EXE | Executable path of crashed binary (v188+) Type: string Unit: N/A |
| COREDUMP_GID | Crashed process GID (v188+) Type: int Unit: N/A |
| COREDUMP_HOSTNAME | Hostname at crash time (v188+) Type: string Unit: N/A |
| COREDUMP_PID | Crashed process PID (v188+) Type: int Unit: N/A |
| COREDUMP_ROOT | Root directory, usually / (v188+) Type: string Unit: N/A |
| COREDUMP_SIGNAL | Signal number that caused crash (v188+) Type: int Unit: N/A |
| COREDUMP_STACKTRACE | Full stack trace backtrace (v188+) Type: string Unit: N/A |
| COREDUMP_TIMESTAMP | Crash timestamp in microseconds (v188+) Type: int Unit: time,μs |
| COREDUMP_UID | Crashed process UID (v188+) Type: int Unit: N/A |
| COREDUMP_UNIT | System unit that crashed (v198+) Type: string Unit: N/A |
| COREDUMP_USER_UNIT | User unit that crashed (v198+) Type: string Unit: N/A |
| DOCUMENTATION | Documentation URL http/https/file/man/info (v246+) Type: string Unit: N/A |
| ERRNO | Unix error number associated with message (v188+) Type: int Unit: N/A |
| INVOCATION_ID | Invocation ID for systemd code messages (v245+) Type: string Unit: N/A |
| MESSAGE_ID | 128-bit message identifier (UUID format, v188+)Type: string Unit: N/A |
| OBJECT_AUDIT_LOGINUID | Target login UID (v205+) Type: int Unit: N/A |
| OBJECT_AUDIT_SESSION | Target audit session ID (v205+) Type: int Unit: N/A |
| OBJECT_CMDLINE | Target process full command line (v205+) Type: string Unit: N/A |
| OBJECT_COMM | Target process comm (v205+) Type: string Unit: N/A |
| OBJECT_EXE | Target process executable path (v205+) Type: string Unit: N/A |
| OBJECT_GID | Target process GID (v205+) Type: int Unit: N/A |
| OBJECT_PID | Target process PID, requires UID 0 to set (v205+) Type: int Unit: N/A |
| OBJECT_SYSTEMD_CGROUP | Target cgroup path (v205+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_INVOCATION_ID | Target invocation ID (v235+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_OWNER_UID | Target session owner UID (v205+) Type: int Unit: N/A |
| OBJECT_SYSTEMD_SESSION | Target session ID (v205+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_UNIT | Target unit name (v205+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_USER_UNIT | Target user unit name (v205+) Type: string Unit: N/A |
| OBJECT_UID | Target process UID (v205+) Type: int Unit: N/A |
| SYSLOG_FACILITY | Syslog facility 0-23 (v188+) Type: int Unit: N/A |
| SYSLOG_PID | Client PID from syslog, may differ from _PID (v188+)Type: int Unit: N/A |
| SYSLOG_RAW | Original syslog line if MESSAGE modified or timestamp lost (v240+)Type: string Unit: N/A |
| SYSLOG_TIMESTAMP | Original syslog timestamp as received (v188+) Type: string Unit: N/A |
| TID | Thread ID numeric (v247+) Type: int Unit: N/A |
| UNIT | Unit name user-provided alternative to _SYSTEMD_UNIT (v251+)Type: string Unit: N/A |
| USER_INVOCATION_ID | User invocation ID for user manager messages (v245+) Type: string Unit: N/A |
| USER_UNIT | User unit user-provided alternative to _SYSTEMD_USER_UNIT (v251+)Type: string Unit: N/A |
| _AUDIT_LOGINUID | Login UID from kernel audit (v188+) Type: int Unit: N/A |
| _AUDIT_SESSION | Audit session ID from kernel (v188+) Type: int Unit: N/A |
| _BOOT_ID | Boot ID 128-bit hex UUID (v188+)Type: string Unit: N/A |
| _CAP_EFFECTIVE | Effective capabilities bitmask (v206+) Type: int Unit: N/A |
| _CMDLINE | Full command line, most complete process info (v188+) Type: string Unit: N/A |
| _COMM | Command name truncated to 15 chars (v188+) Type: string Unit: N/A |
| _CONTAINER_ID | Container ID for nspawn/containers (v205+) Type: string Unit: N/A |
| _CONTAINER_IMAGE | Container image for nspawn/containers (v205+) Type: string Unit: N/A |
| _CONTAINER_NAME | Container name for nspawn/containers (v205+) Type: string Unit: N/A |
| _EXE | Executable path, full path (v188+) Type: string Unit: N/A |
| _GID | Group ID, trusted (v188+) Type: int Unit: N/A |
| _KERNEL_DEVICE | Kernel device name format: bM:N, cM:N, nN, +subsys:name (v189+)Type: string Unit: N/A |
| _KERNEL_SUBSYSTEM | Kernel subsystem e.g. block, net (v189+)Type: string Unit: N/A |
| _LINE_BREAK | Line termination info: nul, line-max, eof, pid-change (v235+)Type: string Unit: N/A |
| _MACHINE_ID | Machine ID from /etc/machine-id (v188+)Type: string Unit: N/A |
| _NAMESPACE | Journal namespace ID (v245+) Type: string Unit: N/A |
| _RUNTIME_SCOPE | Runtime scope: initrd, system, or user (v252+)Type: string Unit: N/A |
| _SELINUX_CONTEXT | SELinux security context label (v188+) Type: string Unit: N/A |
| _SOURCE_BOOTTIME_TIMESTAMP | Boottime timestamp in microseconds CLOCK_BOOTTIME (v257+)Type: int Unit: time,μs |
| _SOURCE_REALTIME_TIMESTAMP | Source timestamp in microseconds CLOCK_REALTIME (v188+)Type: int Unit: time,μs |
| _STREAM_ID | Stream connection ID 128-bit UUID for stdout streams (v235+)Type: string Unit: N/A |
| _SYSTEMD_CGROUP | Control group path (v188+) Type: string Unit: N/A |
| _SYSTEMD_INVOCATION_ID | Unit invocation ID unique per unit start (v233+) Type: string Unit: N/A |
| _SYSTEMD_OWNER_UID | Session owner UID (v188+) Type: int Unit: N/A |
| _SYSTEMD_SESSION | Login session ID (v188+) Type: string Unit: N/A |
| _SYSTEMD_SLICE | Slice unit name e.g. system.slice (v188+)Type: string Unit: N/A |
| _SYSTEMD_UNIT | Unit name e.g. sshd.service (v188+)Type: string Unit: N/A |
| _SYSTEMD_USER_SLICE | User slice name e.g. user.slice (v188+)Type: string Unit: N/A |
| _SYSTEMD_USER_UNIT | User unit name for user sessions (v188+) Type: string Unit: N/A |
| _TRANSPORT | How entry was received: audit, driver, syslog, journal, stdout, kernel (v205+)Type: string Unit: N/A |
| _UDEV_DEVLINK | Symlinks to device, can appear multiple times (v189+) Type: string Unit: N/A |
| _UDEV_DEVNODE | Device node in /dev/ full path (v189+) Type: string Unit: N/A |
| _UDEV_SYSNAME | Device name in /sys/ (v189+) Type: string Unit: N/A |
| _UID | User ID, trusted cannot be spoofed (v188+) Type: int Unit: N/A |
| __CURSOR | Entry cursor, address field export only (v188+) Type: string Unit: N/A |
| __MONOTONIC_TIMESTAMP | Monotonic timestamp in microseconds, address field export only (v188+) Type: int Unit: time,μs |
| __REALTIME_TIMESTAMP | Reception timestamp in microseconds, address field export only (v188+) Type: int Unit: time,μs |
| __SEQNUM | Sequence number, address field export only (v254+) Type: int Unit: N/A |
| __SEQNUM_ID | Sequence ID, address field export only (v254+) Type: string Unit: N/A |
| journald_timestamp | Journal entry timestamp in nanoseconds (from _SOURCE_REALTIME_TIMESTAMP or __REALTIME_TIMESTAMP, v188+)Type: int Unit: time,ns |
| message | Log message content (from MESSAGE, v188+)Type: string Unit: N/A |
| pid | Process ID (from _PID or SYSLOG_PID, v188+)Type: int Unit: N/A |
| priority | Numeric priority level 0-7 (from PRIORITY, v188+)Type: int Unit: N/A |
| status | Log status level mapped from priority: error, warn, critical, notice, info, debug, unknownType: string Unit: N/A |
Common Use Cases¶
- Collect logs from specific services
[[inputs.journald]]
units = ["nginx.service", "mysql.service", "docker.service"]
priorities = ["err", "crit", "alert", "emerg"]
tail_only = true
- Exclude verbose fields
[[inputs.journald]]
exclude_fields = [
"_BOOT_ID",
"_MACHINE_ID",
"__MONOTONIC_TIMESTAMP",
"_AUDIT_SESSION",
"_AUDIT_LOGINUID",
]
- Kubernetes node journal collection
- Collect all logs (debugging)
Troubleshooting¶
Permission errors¶
Ensure DataKit has read access to journal files:
# Add datakit user to systemd-journal group
sudo usermod -aG systemd-journal datakit
# Restart DataKit
sudo systemctl restart datakit
No logs collected¶
- Verify journald is running:
- Check journal files exist:
- Test with
journalctl:
Cursor file issues¶
If the cursor file becomes corrupted (e.g., after host reboot), the collector automatically falls back to tail mode and creates a new cursor. To manually reset:
# Remove cursor file
rm /usr/local/datakit/cache/journald.pos
# Restart DataKit
sudo systemctl restart datakit
High memory usage¶
Default batch size is 1000 entries. If memory usage is a concern, reduce the batch size: