Dialtesting
The collector collects the data of network dialing test results, and all the data generated by dialing test are reported to TrueWatch.
Configuration¶
To deploy private dial-test nodes, you need to create private dial-test nodes on TrueWatch page. When you're done, fill in the page with the relevant information in conf.d/samples/dialtesting.conf:
Go to the conf.d/samples directory under the DataKit installation directory, copy dialtesting.conf.sample and name it dialtesting.conf. Examples are as follows:
[[inputs.dialtesting]]
# We can also configure a JSON path like "file:///your/dir/json-file-name"
server = "https://dflux-dial.truewatch.com"
# [require] node ID
region_id = "default"
# if server are dflux-dial.truewatch.com, ak/sk required
ak = ""
sk = ""
# The interval to pull the tasks.
pull_interval = "1m"
# The timeout for the HTTP request.
time_out = "30s"
# The number of the workers.
workers = 6
# Collect related metric when job execution time error interval is larger than task_exec_time_interval
task_exec_time_interval = "5s"
# Stop the task when the task failed to send data to dataway over max_send_fail_count.
max_send_fail_count = 16
# The max sleep time when send data to dataway failed.
max_send_fail_sleep_time = "30m"
# The max number of jobs sending data to dataway in parallel. Default 10.
max_job_number = 10
# The max number of job chan. Default 1000.
max_job_chan_number = 1000
# The max number of icmp packets sent at one time. Default 0, no limit.
max_icmp_concurrency = 0
# The max number of points in cache for each type of task. Default 10000.
max_cache_points_number = 10000
# Disable internal network task.
disable_internal_network_task = true
# Disable internal network cidr list.
disabled_internal_network_cidr_list = []
# Set true to enable election
election = false
[inputs.dialtesting.browser]
# Enable browser dialtesting on Linux nodes. Enabled by default.
enabled = true
# Browser engine used for browser dialtesting.
# Supported engine: lightpanda.
engine = "lightpanda"
# Optional browser engine executable path.
# If empty, the embedded browser runner will use LIGHTPANDA_EXECUTABLE_PATH or PATH.
engine_path = ""
# Max browser dialtesting tasks running at the same time. 0 means no limit.
max_concurrency = 0
# Custom tags.
[inputs.dialtesting.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
# ...
Once configured, restart DataKit.
Can be turned on by ConfigMap Injection Collector Configuration or Config ENV_DATAKIT_INPUTS .
Can also be turned on by environment variables, (needs to be added as the default collector in ENV_DEFAULT_ENABLED_INPUTS):
-
ENV_INPUT_DIALTESTING_DISABLE_INTERNAL_NETWORK_TASK
Enable or disable internal IP/service testing
Type: Boolean
input.conf:
disable_internal_network_taskExample:
trueDefault:
true -
ENV_INPUT_DIALTESTING_DISABLED_INTERNAL_NETWORK_CIDR_LIST
Disable testing on specific internal CIDR IP ranges
Type: List
input.conf:
disabled_internal_network_cidr_listExample:
["192.168.0.0/16"]Default:
- -
ENV_INPUT_DIALTESTING_ENABLE_DEBUG_API
Disable debug API on dial-testing(Default disabled)
Type: Boolean
input.conf:
enable_debug_apiExample:
falseDefault:
false -
ENV_INPUT_DIALTESTING_ELECTION
Enable election(Default disabled)
Type: Boolean
input.conf:
electionExample:
falseDefault:
false -
ENV_INPUT_DIALTESTING_BROWSER_ENABLED
Enable or disable browser dial testing
Type: Boolean
input.conf:
browser.enabledExample:
falseDefault:
true -
ENV_INPUT_DIALTESTING_BROWSER_ENGINE
Browser engine for browser dial testing. Supported value: lightpanda
Type: String
input.conf:
browser.engineExample:
lightpandaDefault:
lightpanda -
ENV_INPUT_DIALTESTING_BROWSER_ENGINE_PATH
Browser engine executable path for browser dial testing
Type: String
input.conf:
browser.engine_pathExample:
/usr/local/bin/lightpandaDefault:
- -
ENV_INPUT_DIALTESTING_BROWSER_MAX_CONCURRENCY
Maximum number of browser dial testing tasks running at the same time. 0 means no limit
Type: Int
input.conf:
browser.max_concurrencyExample:
1Default:
0
Note
Currently, only Linux dial-up nodes support, and the tracing data is stored in the traceroute field of the relevant metrics.
Note
Browser dialtesting is supported starting from DataKit Version-2.1.0.
Browser dial testing tasks (BROWSER) run by default on Linux dialtesting nodes. To disable them, set [inputs.dialtesting.browser].enabled = false. DataKit must be able to access Lightpanda when running browser tasks. To control resource peaks, set [inputs.dialtesting.browser].max_concurrency.
In Kubernetes, use the datakit:<version> image with Lightpanda built in.
For deployment, task configuration, and troubleshooting details, see Browser Dialtesting.
Dialtesting Node Deployment¶
The following is a network deployment topology for dialtesting nodes, which includes two deployment methods for dialtesting nodes:
- Public Network Nodes: Directly use the nodes deployed globally to check the healthy of public network services.
- Private Network Nodes: If you need to check private network services, you need to deploy private nodes. Of course, if the network allows, these private nodes can also check services deployed on the public network.
Note
When the node is deployed in an internal network environment and unable to access the external network, traffic forwarding can be achieved by configuring a proxy server. For specific configuration steps, please refer to the detailed instructions in the Use DataKit Proxy.
Whether it is a public or private node, they can both create probe tasks through the Web page.
If a dialtesting node needs to run browser dial testing tasks, make sure the node environment meets the following requirements:
- Lightpanda can be accessed by the DataKit process.
- The node can access the target site and the Dataway specified by task
post_url, which is used to report dial testing results.
graph TD
%% node definitions
dt_web(Probe Web UI)
dt_db(Public Task Storage)
dt_pub(Public DataKit Node)
dt_pri(Private DataKit Node)
site_inner(Private Site)
site_pub(Public Site)
dw_inner(Private Dataway)
dw_pub(Public Dataway)
server(TrueWatch)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
dt_web -->|Create Task| dt_db;
dt_db -->|Pull Tasks| dt_pub -->|Results| dw_pub --> server;
dt_db -->|Pull Tasks| dt_pri;
dt_pub <-->|Checking...| site_pub;
dt_pri <-.->|Checking...| site_pub;
dw_inner --> server;
subgraph "User's Private Network"
dt_pri <-->|Checking...| site_inner;
dt_pri -->|Results| dw_inner;
end
Log¶
All of the following data collections are appended with a global tag named host by default (the tag value is the host name of the DataKit), or can be named in the configuration by [[inputs.dialtesting.tags]] alternative host.
http_dial_testing¶
| Tags & Fields | Description |
|---|---|
| city ( tag) |
The name of the city |
| country ( tag) |
The name of the country |
| datakit_version ( tag) |
The DataKit version |
| dest_ip ( tag) |
The IP address of the destination |
| df_label ( tag) |
The label of the task |
| internal ( tag) |
The boolean value, true for domestic and false for overseas |
| isp ( tag) |
ISP, such as chinamobile, chinaunicom, chinatelecom |
| method ( tag) |
HTTP method, such as GET |
| name ( tag) |
The name of the task |
| node_name ( tag) |
The name of the node |
| owner ( tag) |
The owner name |
| proto ( tag) |
The protocol of the HTTP, such as 'HTTP/1.1' |
| province ( tag) |
The name of the province |
| status ( tag) |
The status of the task, either 'OK' or 'FAIL' |
| status_code_class ( tag) |
The class of the status code, such as '2xx' |
| status_code_string ( tag) |
The status string, such as '200 OK' |
| url ( tag) |
The URL of the endpoint to be monitored |
| config_vars | The configuration variables of the task Type: string | (gauge) Unit: N/A |
| fail_reason | The reason that leads to the failure of the task Type: string | (gauge) Unit: N/A |
| message | The message string which includes the header and the body of the request or the response Type: string | (gauge) Unit: N/A |
| response_body_size | The length of the body of the response Type: int | (gauge) Unit: digital,B |
| response_connection | HTTP connection time Type: float | (gauge) Unit: time,μs |
| response_dns | HTTP DNS parsing time Type: float | (gauge) Unit: time,μs |
| response_download | HTTP downloading time Type: float | (gauge) Unit: time,μs |
| response_ssl | HTTP ssl handshake time Type: float | (gauge) Unit: time,μs |
| response_time | The time of the response Type: int | (gauge) Unit: time,μs |
| response_ttfb | HTTP response ttfbType: float | (gauge) Unit: time,μs |
| seq_number | The sequence number of the test Type: int | (gauge) Unit: count |
| ssl_cert_expires_in_days | The SSL certificate expires in days Type: int | (gauge) Unit: time,d |
| ssl_cert_not_after | The SSL certificate not after time Type: int | (gauge) Unit: timeStamp,usec |
| status_code | The response code Type: int | (gauge) Unit: N/A |
| success | The number to specify whether is successful, 1 for success, -1 for failure Type: int | (gauge) Unit: N/A |
| task | The raw task string Type: string | (gauge) Unit: N/A |
| task_id | The dialtesting task external ID Type: string | (gauge) Unit: N/A |
tcp_dial_testing¶
| Tags & Fields | Description |
|---|---|
| city ( tag) |
The name of the city |
| country ( tag) |
The name of the country |
| datakit_version ( tag) |
The DataKit version |
| dest_host ( tag) |
The name of the host to be monitored |
| dest_ip ( tag) |
The IP address |
| dest_port ( tag) |
The port of the TCP connection |
| df_label ( tag) |
The label of the task |
| internal ( tag) |
The boolean value, true for domestic and false for overseas |
| isp ( tag) |
ISP, such as chinamobile, chinaunicom, chinatelecom |
| name ( tag) |
The name of the task |
| node_name ( tag) |
The name of the node |
| owner ( tag) |
The owner name |
| proto ( tag) |
The protocol of the task |
| province ( tag) |
The name of the province |
| status ( tag) |
The status of the task, either 'OK' or 'FAIL' |
| config_vars | The configuration variables of the task Type: string | (gauge) Unit: N/A |
| fail_reason | The reason that leads to the failure of the task Type: string | (gauge) Unit: N/A |
| message | The message string includes the response time or fail reason Type: string | (gauge) Unit: N/A |
| response_time | The time of the response Type: int | (gauge) Unit: time,μs |
| response_time_with_dns | The time of the response, which contains DNS time Type: int | (gauge) Unit: time,μs |
| seq_number | The sequence number of the test Type: int | (gauge) Unit: count |
| success | The number to specify whether is successful, 1 for success, -1 for failure Type: int | (gauge) Unit: N/A |
| task | The raw task string Type: string | (gauge) Unit: N/A |
| task_id | The dialtesting task external ID Type: string | (gauge) Unit: N/A |
| traceroute | The json string fo the traceroute resultType: string | (gauge) Unit: N/A |
icmp_dial_testing¶
| Tags & Fields | Description |
|---|---|
| city ( tag) |
The name of the city |
| country ( tag) |
The name of the country |
| datakit_version ( tag) |
The DataKit version |
| dest_host ( tag) |
The name of the host to be monitored |
| df_label ( tag) |
The label of the task |
| internal ( tag) |
The boolean value, true for domestic and false for overseas |
| isp ( tag) |
ISP, such as chinamobile, chinaunicom, chinatelecom |
| name ( tag) |
The name of the task |
| node_name ( tag) |
The name of the node |
| owner ( tag) |
The owner name |
| proto ( tag) |
The protocol of the task |
| province ( tag) |
The name of the province |
| status ( tag) |
The status of the task, either 'OK' or 'FAIL' |
| average_round_trip_time | The average time of the round trip(RTT) Type: float | (gauge) Unit: time,μs |
| average_round_trip_time_in_millis | The average time of the round trip(RTT), deprecated Type: float | (gauge) Unit: time,ms |
| config_vars | The configuration variables of the task Type: string | (gauge) Unit: N/A |
| fail_reason | The reason that leads to the failure of the task Type: string | (gauge) Unit: N/A |
| max_round_trip_time | The maximum time of the round trip(RTT) Type: float | (gauge) Unit: time,μs |
| max_round_trip_time_in_millis | The maximum time of the round trip(RTT), deprecated Type: float | (gauge) Unit: time,ms |
| message | The message string includes the average time of the round trip or the failure reason Type: string | (gauge) Unit: N/A |
| min_round_trip_time | The minimum time of the round trip(RTT) Type: float | (gauge) Unit: time,μs |
| min_round_trip_time_in_millis | The minimum time of the round trip(RTT), deprecated Type: float | (gauge) Unit: time,ms |
| packet_loss_percent | The loss percent of the packets Type: float | (gauge) Unit: percent,percent |
| packets_received | The number of the packets received Type: int | (gauge) Unit: count |
| packets_sent | The number of the packets sent Type: int | (gauge) Unit: count |
| seq_number | The sequence number of the test Type: int | (gauge) Unit: count |
| std_round_trip_time | The standard deviation of the round trip Type: float | (gauge) Unit: time,μs |
| std_round_trip_time_in_millis | The standard deviation of the round trip, deprecated Type: float | (gauge) Unit: time,ms |
| success | The number to specify whether is successful, 1 for success, -1 for failure Type: int | (gauge) Unit: N/A |
| task | The raw task string Type: string | (gauge) Unit: N/A |
| task_id | The dialtesting task external ID Type: string | (gauge) Unit: N/A |
| traceroute | The json string fo the traceroute resultType: string | (gauge) Unit: N/A |
websocket_dial_testing¶
| Tags & Fields | Description |
|---|---|
| city ( tag) |
The name of the city |
| country ( tag) |
The name of the country |
| datakit_version ( tag) |
The DataKit version |
| df_label ( tag) |
The label of the task |
| internal ( tag) |
The boolean value, true for domestic and false for overseas |
| isp ( tag) |
ISP, such as chinamobile, chinaunicom, chinatelecom |
| name ( tag) |
The name of the task |
| node_name ( tag) |
The name of the node |
| owner ( tag) |
The owner name |
| proto ( tag) |
The protocol of the task |
| province ( tag) |
The name of the province |
| status ( tag) |
The status of the task, either 'OK' or 'FAIL' |
| url ( tag) |
The URL string, such as ws://www.abc.com |
| config_vars | The configuration variables of the task Type: string | (gauge) Unit: N/A |
| fail_reason | The reason that leads to the failure of the task Type: string | (gauge) Unit: N/A |
| message | The message string includes the response time or the failure reason Type: string | (gauge) Unit: N/A |
| response_message | The message of the response Type: string | (gauge) Unit: N/A |
| response_time | The time of the response Type: int | (gauge) Unit: time,μs |
| response_time_with_dns | The time of the response, include DNS Type: int | (gauge) Unit: time,μs |
| sent_message | The sent message Type: string | (gauge) Unit: N/A |
| seq_number | The sequence number of the test Type: int | (gauge) Unit: count |
| ssl_cert_expires_in_days | The SSL certificate expires in days Type: int | (gauge) Unit: time,d |
| ssl_cert_not_after | The SSL certificate not after time Type: int | (gauge) Unit: timeStamp,usec |
| success | The number to specify whether is successful, 1 for success, -1 for failure Type: int | (gauge) Unit: N/A |
| task | The raw task string Type: string | (gauge) Unit: N/A |
| task_id | The dialtesting task external ID Type: string | (gauge) Unit: N/A |
multi_dial_testing¶
| Tags & Fields | Description |
|---|---|
| city ( tag) |
The name of the city |
| country ( tag) |
The name of the country |
| datakit_version ( tag) |
The DataKit version |
| df_label ( tag) |
The label of the task |
| internal ( tag) |
The boolean value, true for domestic and false for overseas |
| isp ( tag) |
ISP, such as chinamobile, chinaunicom, chinatelecom |
| name ( tag) |
The name of the task |
| node_name ( tag) |
The name of the node |
| owner ( tag) |
The owner name |
| province ( tag) |
The name of the province |
| status ( tag) |
The status of the task, either 'OK' or 'FAIL' |
| config_vars | The configuration variables of the task Type: string | (gauge) Unit: N/A |
| fail_reason | The reason that leads to the failure of the task Type: string | (gauge) Unit: N/A |
| last_step | The last number of the task be executed Type: int | (gauge) Unit: count |
| message | The message string which includes the header and the body of the request or the response Type: string | (gauge) Unit: N/A |
| response_time | The time of the response Type: int | (gauge) Unit: time,μs |
| seq_number | The sequence number of the test Type: int | (gauge) Unit: count |
| steps | The result of each step Type: string | (gauge) Unit: N/A |
| success | The number to specify whether is successful, 1 for success, -1 for failure Type: int | (gauge) Unit: N/A |
| task | The raw task string Type: string | (gauge) Unit: N/A |
| task_id | The dialtesting task external ID Type: string | (gauge) Unit: N/A |
grpc_dial_testing¶
| Tags & Fields | Description |
|---|---|
| city ( tag) |
The name of the city |
| country ( tag) |
The name of the country |
| datakit_version ( tag) |
The DataKit version |
| dest_host ( tag) |
The name of the host to be monitored |
| df_label ( tag) |
The label of the task |
| internal ( tag) |
The boolean value, true for domestic and false for overseas |
| isp ( tag) |
ISP, such as chinamobile, chinaunicom, chinatelecom |
| method ( tag) |
The gRPC method name |
| name ( tag) |
The name of the task |
| node_name ( tag) |
The name of the node |
| owner ( tag) |
The owner name |
| proto ( tag) |
The protocol of the task |
| province ( tag) |
The name of the province |
| server ( tag) |
The gRPC server address |
| status ( tag) |
The status of the task, either 'OK' or 'FAIL' |
| config_vars | The configuration variables of the task Type: string | (gauge) Unit: N/A |
| fail_reason | The reason that leads to the failure of the task Type: string | (gauge) Unit: N/A |
| message | The message string includes the response time or the failure reason Type: string | (gauge) Unit: N/A |
| response_time | The time of the response Type: int | (gauge) Unit: time,μs |
| seq_number | The sequence number of the test Type: int | (gauge) Unit: count |
| ssl_cert_expires_in_days | The SSL certificate expires in days Type: int | (gauge) Unit: time,d |
| ssl_cert_not_after | The SSL certificate not after time Type: int | (gauge) Unit: timeStamp,usec |
| success | The number to specify whether is successful, 1 for success, -1 for failure Type: int | (gauge) Unit: N/A |
| task | The raw task string Type: string | (gauge) Unit: N/A |
| task_id | The dialtesting task external ID Type: string | (gauge) Unit: N/A |
browser_dial_testing¶
| Tags & Fields | Description |
|---|---|
| browser_engine ( tag) |
The browser engine used to run the task |
| city ( tag) |
The name of the city |
| country ( tag) |
The name of the country |
| datakit_version ( tag) |
The DataKit version |
| df_label ( tag) |
The label of the task |
| internal ( tag) |
The boolean value, true for domestic and false for overseas |
| isp ( tag) |
ISP, such as chinamobile, chinaunicom, chinatelecom |
| name ( tag) |
The name of the task |
| node_name ( tag) |
The name of the node |
| owner ( tag) |
The owner name |
| province ( tag) |
The name of the province |
| status ( tag) |
The status of the task, either 'OK' or 'FAIL' |
| url ( tag) |
The URL of the page to be monitored |
| viewport ( tag) |
The browser viewport size, such as 1920x1080 |
| browser_config_vars | The JSON string of variables defined in browser_config Type: string | (gauge) Unit: N/A |
| browser_run_id | The browser run ID Type: string | (gauge) Unit: N/A |
| config_vars | The configuration variables of the task Type: string | (gauge) Unit: N/A |
| fail_reason | The reason that leads to the failure of the task Type: string | (gauge) Unit: N/A |
| has_screenshot | Whether the browser run has uploaded screenshots Type: bool | (gauge) Unit: N/A |
| last_step | The last browser step sequence number Type: int | (gauge) Unit: count |
| message | The message string includes success message or failure reason Type: string | (gauge) Unit: N/A |
| response_time | The browser run duration Type: int | (gauge) Unit: time,μs |
| retry_count | The retry count of the browser run Type: int | (gauge) Unit: count |
| retry_records | The JSON string of browser retry attempt records Type: string | (gauge) Unit: N/A |
| screenshot_upload_error | The browser screenshot upload error Type: string | (gauge) Unit: N/A |
| seq_number | The sequence number of the test Type: int | (gauge) Unit: count |
| steps | The JSON string of browser step results Type: string | (gauge) Unit: N/A |
| success | The number to specify whether is successful, 1 for success, -1 for failure Type: int | (gauge) Unit: N/A |
| task_id | The dialtesting task external ID Type: string | (gauge) Unit: N/A |
| trace_id | The first trace ID captured during the browser run Type: string | (gauge) Unit: N/A |
| viewport_height | The browser viewport height Type: int | (gauge) Unit: N/A |
| viewport_width | The browser viewport width Type: int | (gauge) Unit: N/A |
traceroute¶
traceroute is the JSON text of the "route trace" data, and the entire data is an array object in which each array element records a route probe, as shown in the following example:
[
{
"total": 2,
"failed": 0,
"loss": 0,
"avg_cost": 12700395,
"min_cost": 11902041,
"max_cost": 13498750,
"std_cost": 1129043,
"items": [
{
"ip": "10.8.9.1",
"response_time": 13498750
},
{
"ip": "10.8.9.1",
"response_time": 11902041
}
]
},
{
"total": 2,
"failed": 0,
"loss": 0,
"avg_cost": 13775021,
"min_cost": 13740084,
"max_cost": 13809959,
"std_cost": 49409,
"items": [
{
"ip": "10.12.168.218",
"response_time": 13740084
},
{
"ip": "10.12.168.218",
"response_time": 13809959
}
]
}
]
Field description:
| Field | Type | Description |
|---|---|---|
total |
number | Total number of detections |
failed |
number | Number of failures |
loss |
number | Percentage of failure |
avg_cost |
number | Average time spent (μs) |
min_cost |
number | Minimum time consumption (μs) |
max_cost |
number | Maximum time consumption(μs) |
std_cost |
number | Standard deviation of time consumption(μs) |
items |
Array of items | Per probe information (see following items) |
items
| Field | Type | Description |
|---|---|---|
ip |
string | IP address, if it fails, the value is * |
response_time |
number | Response time (μs) |
Metric¶
The dialtesting collector exposes Prometheus metrics. By default, the DataKit collector collects and uploads these datakit_dialtesting_* metrics to TrueWatch without additional configuration.