Various Other Tool Usages¶
DataKit has many different small tools built-in for daily use. You can view the command-line help of DataKit through the following command:
Note: Due to differences between different platforms, the specific help content may vary.
If you want to see how a specific command is used (such as dql), you can use the following command:
$ datakit dql --help
DQL used to query data. If no option specified, query interactively.
Usage:
datakit dql [flags]
Flags:
--auto-json pretty output string if field/tag value is JSON
--csv string Specify the directory
-F, --force overwrite csv if file exists
-h, --help help for dql
-H, --host string specify datakit host to query
-J, --json output in JSON format
--log string log path (default "/dev/null")
-R, --run string run single DQL
-T, --token string run query for specific token(workspace)
-V, --verbose verbosity mode
Debugging Commands¶
Debugging the Blacklist¶
To debug whether a piece of data will be filtered by the centrally configured blacklist, you can use the following command:
$ datakit debug --filter=/usr/local/datakit/data/.pull --data=/path/to/lineproto.data
Dropped
ddtrace,http_url=/webproxy/api/online_status,service=web_front f1=1i 1691755988000000000
By 7th rule(cost 1.017708ms) from category "tracing":
{ service = 'web_front' and ( http_url in [ '/webproxy/api/online_status' ] )}
PS > datakit.exe debug --filter 'C:\Program Files\datakit\data\.pull' --data '\path\to\lineproto.data'
Dropped
ddtrace,http_url=/webproxy/api/online_status,service=web_front f1=1i 1691755988000000000
By 7th rule(cost 1.017708ms) from category "tracing":
{ service = 'web_front' and ( http_url in [ '/webproxy/api/online_status' ] )}
The above output indicates that the data in the file lineproto.data is matched by the 7th rule (counting from 1) in the tracing category in the .pull file. Once matched, this piece of data will be discarded.
Obtaining File Paths Using glob Rules¶
In log collection, log paths can be configured using glob rules.
You can debug the glob rules using DataKit. You need to provide a configuration file, and each line of the file is a glob statement.
An example of the configuration file is as follows:
A complete command example is as follows:
$ datakit debug --glob-conf glob-config
============= glob paths ============
/tmp/log-test/*.log
/tmp/log-test/**/*.log
========== found the files ==========
/tmp/log-test/1.log
/tmp/log-test/logfwd.log
/tmp/log-test/123/1.log
/tmp/log-test/123/2.log
Matching Text with Regular Expressions¶
In log collection, multiline log collection can be achieved by configuring regular expressions.
You can debug the regular expression rules using DataKit. You need to provide a configuration file, and the first line of the file is the regular expression, and the remaining content is the text to be matched (which can be multiple lines).
An example of the configuration file is as follows:
$ cat regex-config
^\d{4}-\d{2}-\d{2}
2020-10-23 06:41:56,688 INFO demo.py 1.0
2020-10-23 06:54:20,164 ERROR /usr/local/lib/python3.6/dist-packages/flask/app.py Exception on /0 [GET]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
ZeroDivisionError: division by zero
2020-10-23 06:41:56,688 INFO demo.py 5.0
A complete command example is as follows:
$ datakit debug --regex-conf regex-config
============= regex rule ============
^\d{4}-\d{2}-\d{2}
========== matching results ==========
Ok: 2020-10-23 06:41:56,688 INFO demo.py 1.0
Ok: 2020-10-23 06:54:20,164 ERROR /usr/local/lib/python3.6/dist-packages/flask/app.py Exception on /0 [GET]
Fail: Traceback (most recent call last):
Fail: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2447, in wsgi_app
Fail: response = self.full_dispatch_request()
Fail: ZeroDivisionError: division by zero
Ok: 2020-10-23 06:41:56,688 INFO demo.py 5.0
Viewing the Running Status of DataKit¶
For the usage of monitor, please refer to here.
Checking the Correctness of Collector Configuration¶
After editing the collector configuration file, there may be some configuration errors (such as incorrect configuration file format). You can check whether it is correct through the following command:
Viewing Workspace Information¶
To facilitate viewing workspace information on the server side, DataKit provides the following command to view it:
datakit tool --workspace-info
{
"token": {
"ws_uuid": "wksp_2dc431d6693711eb8ff97aeee04b54af",
"bill_state": "normal",
"ver_type": "pay",
"token": "tkn_2dc438b6693711eb8ff97aeee04b54af",
"db_uuid": "ifdb_c0fss9qc8kg4gj9bjjag",
"status": 0,
"creator": "",
"expire_at": -1,
"create_at": 0,
"update_at": 0,
"delete_at": 0
},
"data_usage": {
"data_metric": 96966,
"data_logging": 3253,
"data_tracing": 2868,
"data_rum": 0,
"is_over_usage": false
}
}
Debugging KV Files¶
When the collector configuration file is configured using the KV template, if you need to debug, you can use the following command for debugging.
datakit tool --parse-kv-file conf.d/host/cpu.conf --kv-file data/.kv
[[inputs.cpu]]
## Collect interval, default is 10 seconds. (optional)
interval = '10s'
## Collect CPU usage per core, default is false. (optional)
percpu = false
## Setting disable_temperature_collect to false will collect cpu temperature stats for linux. (deprecated)
# disable_temperature_collect = false
## Enable to collect core temperature data.
enable_temperature = true
## Enable gets average load information every five seconds.
enable_load5s = true
[inputs.cpu.tags]
kv = "cpu_kv_value3"
Viewing Cloud Attribute Data¶
If the machine where DataKit is installed is a cloud server (currently supports aliyun/tencent/aws/hwcloud/azure), you can view some cloud attribute data through the following command. For example (marked as - means the field is invalid):
datakit tool --show-cloud-info aws
cloud_provider: aws
description: -
instance_charge_type: -
instance_id: i-09b37dc1xxxxxxxxx
instance_name: -
instance_network_type: -
instance_status: -
instance_type: t2.nano
private_ip: 172.31.22.123
region: cn-northwest-1
security_group_id: launch-wizard-1
zone_id: cnnw1-az2
Parsing Line Protocol Data¶
You can parse line protocol data through the following command:
It can be output in JSON format:
datakit tool --parse-lp /path/to/file --json
{
"measurements": { # List of metric sets
"testing": {
"points": 7,
"time_series": 6
},
"testing_module": {
"points": 195,
"time_series": 195
}
},
"point": 202, # Total number of points
"time_serial": 201 # Total number of timelines
}
Data Recording and Replay¶
Data import is mainly used to enter existing collected data. When demonstrating or testing, additional collection is not required.
Enabling Data Recording¶
In datakit.conf, you can enable the data recording function. After enabling, DataKit will record the data to the specified directory for subsequent import:
[recorder]
enabled = true
path = "/path/to/recorder" # Absolute path, by default in the <DataKit installation directory>/recorder directory
encoding = "v2" # Use protobuf-JSON format (xxx.pbjson), and you can also choose v1 (xxx.lp) in line protocol form (the former is more readable and supports more data types)
duration = "10m" # Recording duration, starting from the startup of DataKit
inputs = ["cpu", "mem"] # Record data of specified collectors (based on the names shown in the *Inputs Info* panel of monitor), and if empty, it means recording data of all collectors
categories = ["logging", "metric"] # Recording types, and if empty, it means recording all data types
After the recording starts, the directory structure is roughly as follows (showing the pbjson format of time-series data here):
[ 416] /usr/local/datakit/recorder/
├── [ 64] custom_object
├── [ 64] dynamic_dw
├── [ 64] keyevent
├── [ 64] logging
├── [ 64] network
├── [ 64] object
├── [ 64] profiling
├── [ 64] rum
├── [ 64] security
├── [ 64] tracing
└── [1.9K] metric
├── [1.2K] cpu.1698217783322857000.pbjson
├── [1.2K] cpu.1698217793321744000.pbjson
├── [1.2K] cpu.1698217803322683000.pbjson
├── [1.2K] cpu.1698217813322834000.pbjson
└── [1.2K] cpu.1698218363360258000.pbjson
12 directories, 59 files
Warning
- After the data recording is completed, remember to turn off this function (
enable = false). Otherwise, every time DataKit starts, recording will be launched, which may consume a large amount of disk space. - The collector name is not exactly the same as the name in the collector configuration (
[[inputs.some-name]]), but the name shown in the first column of the Inputs Info panel of monitor. The name of some collectors may be like this:logging/<some-pod-name>. Here, the data directory it stores is /usr/local/datakit/recorder/logging/logging-some-pod-name.1705636073033197000.pbjson, and the/in the collector name is replaced with-(to avoid an extra directory structure).
Data Replay¶
After DataKit records the data, you can save the data in this directory using Git or other methods (make sure to keep the existing directory structure). Then, you can import these data into TrueWatch through the following command:
$ datakit import -P /usr/local/datakit/recorder -D https://openway.truewatch.com?token=tkn_xxxxxxxxx
> Uploading "/usr/local/datakit/recorder/metric/cpu.1698217783322857000.pbjson"(1 points) on metric...
+1h53m6.137855s ~ 2023-10-25 15:09:43.321559 +0800 CST
> Uploading "/usr/local/datakit/recorder/metric/cpu.1698217793321744000.pbjson"(1 points) on metric...
+1h52m56.137881s ~ 2023-10-25 15:09:53.321533 +0800 CST
> Uploading "/usr/local/datakit/recorder/metric/cpu.1698217803322683000.pbjson"(1 points) on metric...
+1h52m46.137991s ~ 2023-10-25 15:10:03.321423 +0800 CST
...
Total upload 75 kB bytes ok
Although the recorded data contains absolute timestamps (in nanoseconds), when playing back, DataKit will automatically shift these data to the current time (retaining the relative time intervals between data points), making it look like newly collected data.
You can obtain more help information about data import through the following command:
$ datakit import --help
Import used to play recorded history data to TrueWatch.
Usage:
datakit import [flags]
Flags:
-D, --dataway strings dataway list
-h, --help help for import
--log string log path (default "/dev/null")
-P, --path string point data path (default "/usr/local/datakit/recorder")
Warning
For RUM data, if there is no corresponding APP ID in the target workspace for playback, the data cannot be written. You can create a new application in the target workspace, change the APP ID to be consistent with that in the recorded data, or replace the APP ID in the existing recorded data with the APP ID of the corresponding RUM application in the target workspace.
Others¶
Telegraf Integration¶
Note: Before using Telegraf, it is recommended to confirm whether DataKit can meet the expected data collection. If DataKit already supports it, it is not recommended to use Telegraf for collection, as it may cause data conflicts and usage troubles.
Install the Telegraf integration
Start Telegraf
For usage matters of Telegraf, refer to here.
Security Checker Integration¶
Install the Security Checker
After a successful installation, it will run automatically. For the specific usage of the Security Checker, refer to here
eBPF Integration¶
Install the DataKit eBPF collector. Currently, it only supports the linux/amd64 | linux/arm64 platforms. For the usage instructions of the collector, see DataKit eBPF Collector
If the prompt open /usr/local/datakit/externals/datakit-ebpf: text file busy appears, execute this command after stopping the DataKit service.
Warning
This command has been removed in Version-1.5.6. The eBPF integration is built-in by default in the new version.
Update IP Database¶
- You can directly use the following command to install/update the IP geographic information database (here you can choose another IP address library
geolite2, just replaceiplocwithgeolite2):
- After updating the IP geographic information database, modify the datakit.conf configuration:
-
Restart DataKit to take effect
-
Test whether the IP library takes effect
datakit tool --ipinfo 1.2.3.4
ip: 1.2.3.4
city: Brisbane
province: Queensland
country: AU
isp: unknown
If the installation fails, the output is as follows:
-
Modify datakit.yaml and uncomment the content between the 4 places marked with
---iploc-startand---iploc-end. -
Reinstall DataKit:
- Enter the container and test whether the IP library takes effect
datakit tool --ipinfo 1.2.3.4
ip: 1.2.3.4
city: Brisbane
province: Queensland
country: AU
isp: unknown
If the installation fails, the output is as follows:
- Add
--set iploc.enablewhen deploying with Helm
helm install datakit datakit/datakit -n datakit \
--set datakit.dataway_url="https://openway.truewatch.com?token=<YOUR-TOKEN>" \
--set iploc.enable true \
--create-namespace
For deployment matters of Helm, refer to here.
- Enter the container and test whether the IP library takes effect
datakit tool --ipinfo 1.2.3.4
ip: 1.2.3.4
city: Brisbane
province: Queensland
country: AU
isp: unknown
If the installation fails, the output is as follows:
Automatic Command Completion¶
The new completion flow is generated from the Cobra command tree and supports
bash,zsh,fish, andpowershell. Installing or upgrading DataKit does not enable shell completion automatically. To use completion, rundatakit completion <shell>after installation.Note:
datakit completionapplies to DataKit Version-2.1.0 and later. For earlier versions, use the command syntax documented with that release.
Because DataKit has many command-line options, it now provides automatic completion.
Typical usage:
- Force bash install:
datakit completion bash --force - Force zsh install:
datakit completion zsh --force - Force fish install:
datakit completion fish --force - Force powershell install:
datakit completion powershell --force - Auto-detect current shell and install:
datakit completion --force - Print script only:
datakit completion bash --print
Specifying the shell explicitly is recommended, especially when running through sudo. datakit completion --force detects the shell from the SHELL environment variable. If sudo or another restricted environment does not preserve that variable, auto-detection will fail.
Most mainstream Linux environments support shell completion. For bash, if completion support is missing on the host or inside a container, you can install:
- Ubuntu:
apt install bash-completion - CentOS:
yum install bash-completion bash-completion-extras
When a shell is specified, datakit completion <shell> will:
- install the generated completion script to a standard path
- print the actual install path and how to activate it immediately
For example:
$ datakit completion bash --force
completion for bash installed to /usr/share/bash-completion/completions/datakit
reload your shell or run: source /usr/share/bash-completion/completions/datakit
When DataKit is running inside a Docker container, completion is installed into the container filesystem, and the output will state that explicitly.
bash Setup¶
Run:
If the script is installed to a system completion directory, it usually takes effect after opening a new shell. To enable it in the current shell, run the source command printed by DataKit.
zsh Setup¶
Run:
For zsh, DataKit installs the completion script to ~/.zfunc/_datakit by default. If your current zsh session has not loaded that directory, add it to fpath and run compinit again:
If you want zsh to load it automatically on startup, add the same configuration to ~/.zshrc, and make sure the fpath line appears before compinit. You can also copy the command printed by datakit completion zsh --force to write the configuration and load it.
fish Setup¶
Run:
Fish completion is installed to ~/.config/fish/completions/datakit.fish by default. It usually takes effect after opening a new fish session.
PowerShell Setup¶
Run:
For PowerShell, DataKit generates a standalone completion script by default and does not modify or overwrite the user's Microsoft.PowerShell_profile.ps1. To enable it in the current session, run the dot-source command printed by DataKit. If you want PowerShell to load it automatically on startup, add that dot-source command to your profile manually.
Completion usage example:
$ datakit <tab> # Enter \tab to get the following commands
check completion debug dql import install
monitor pipeline run service tool version
$ datakit dql <tab> # Enter \tab to get the following options
--auto-json --csv -F,--force --host -J,--json --log -R,--run -T,--token -V,--verbose
All the commands mentioned below can be operated in this way.
Print the Completion Script Only¶
If you want to review the script first or install it manually, use --print:
If you need a custom install path, use --path: