Skip to content

Dialtesting


The collector collects the data of network dialing test results, and all the data generated by dialing test are reported to TrueWatch.

Configuration

To deploy private dial-test nodes, you need to create private dial-test nodes on TrueWatch page. When you're done, fill in the page with the relevant information in conf.d/samples/dialtesting.conf:

Go to the conf.d/samples directory under the DataKit installation directory, copy dialtesting.conf.sample and name it dialtesting.conf. Examples are as follows:

[[inputs.dialtesting]]
  # We can also configure a JSON path like "file:///your/dir/json-file-name"
  server = "https://dflux-dial.truewatch.com"

  # [require] node ID
  region_id = "default"

  # if server are dflux-dial.truewatch.com, ak/sk required
  ak = ""
  sk = ""

  # The interval to pull the tasks.
  pull_interval = "1m"

  # The timeout for the HTTP request.
  time_out = "30s"

  # The number of the workers.
  workers = 6

  # Collect related metric when job execution time error interval is larger than task_exec_time_interval
  task_exec_time_interval = "5s"

  # Stop the task when the task failed to send data to dataway over max_send_fail_count.
  max_send_fail_count = 16

  # The max sleep time when send data to dataway failed.
  max_send_fail_sleep_time = "30m"

  # The max number of jobs sending data to dataway in parallel. Default 10.
  max_job_number = 10

  # The max number of job chan. Default 1000.
  max_job_chan_number = 1000

  # The max number of icmp packets sent at one time. Default 0, no limit.
  max_icmp_concurrency = 0

  # The max number of points in cache for each type of task. Default 10000.
  max_cache_points_number = 10000

  # Disable internal network task.
  disable_internal_network_task = true

  # Disable internal network cidr list.
  disabled_internal_network_cidr_list = []

  # Set true to enable election
  election = false

  [inputs.dialtesting.browser]
    # Enable browser dialtesting on Linux nodes. Enabled by default.
    enabled = true

    # Browser engine used for browser dialtesting.
    # Supported engine: lightpanda.
    engine = "lightpanda"

    # Optional browser engine executable path.
    # If empty, the embedded browser runner will use LIGHTPANDA_EXECUTABLE_PATH or PATH.
    engine_path = ""

    # Max browser dialtesting tasks running at the same time. 0 means no limit.
    max_concurrency = 0

  # Custom tags.
  [inputs.dialtesting.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"
  # ...

Once configured, restart DataKit.

Can be turned on by ConfigMap Injection Collector Configuration or Config ENV_DATAKIT_INPUTS .

Can also be turned on by environment variables, (needs to be added as the default collector in ENV_DEFAULT_ENABLED_INPUTS):

  • ENV_INPUT_DIALTESTING_DISABLE_INTERNAL_NETWORK_TASK

    Enable or disable internal IP/service testing

    Type: Boolean

    input.conf: disable_internal_network_task

    Example: true

    Default: true

  • ENV_INPUT_DIALTESTING_DISABLED_INTERNAL_NETWORK_CIDR_LIST

    Disable testing on specific internal CIDR IP ranges

    Type: List

    input.conf: disabled_internal_network_cidr_list

    Example: ["192.168.0.0/16"]

    Default: -

  • ENV_INPUT_DIALTESTING_ENABLE_DEBUG_API

    Disable debug API on dial-testing(Default disabled)

    Type: Boolean

    input.conf: enable_debug_api

    Example: false

    Default: false

  • ENV_INPUT_DIALTESTING_ELECTION

    Enable election(Default disabled)

    Type: Boolean

    input.conf: election

    Example: false

    Default: false

  • ENV_INPUT_DIALTESTING_BROWSER_ENABLED

    Enable or disable browser dial testing

    Type: Boolean

    input.conf: browser.enabled

    Example: false

    Default: true

  • ENV_INPUT_DIALTESTING_BROWSER_ENGINE

    Browser engine for browser dial testing. Supported value: lightpanda

    Type: String

    input.conf: browser.engine

    Example: lightpanda

    Default: lightpanda

  • ENV_INPUT_DIALTESTING_BROWSER_ENGINE_PATH

    Browser engine executable path for browser dial testing

    Type: String

    input.conf: browser.engine_path

    Example: /usr/local/bin/lightpanda

    Default: -

  • ENV_INPUT_DIALTESTING_BROWSER_MAX_CONCURRENCY

    Maximum number of browser dial testing tasks running at the same time. 0 means no limit

    Type: Int

    input.conf: browser.max_concurrency

    Example: 1

    Default: 0


Note

Currently, only Linux dial-up nodes support, and the tracing data is stored in the traceroute field of the relevant metrics.

Note

Browser dialtesting is supported starting from DataKit Version-2.1.0.

Browser dial testing tasks (BROWSER) run by default on Linux dialtesting nodes. To disable them, set [inputs.dialtesting.browser].enabled = false. DataKit must be able to access Lightpanda when running browser tasks. To control resource peaks, set [inputs.dialtesting.browser].max_concurrency.

In Kubernetes, use the datakit:<version> image with Lightpanda built in.

For deployment, task configuration, and troubleshooting details, see Browser Dialtesting.

Dialtesting Node Deployment

The following is a network deployment topology for dialtesting nodes, which includes two deployment methods for dialtesting nodes:

  • Public Network Nodes: Directly use the nodes deployed globally to check the healthy of public network services.
  • Private Network Nodes: If you need to check private network services, you need to deploy private nodes. Of course, if the network allows, these private nodes can also check services deployed on the public network.
Note

When the node is deployed in an internal network environment and unable to access the external network, traffic forwarding can be achieved by configuring a proxy server. For specific configuration steps, please refer to the detailed instructions in the Use DataKit Proxy.

Whether it is a public or private node, they can both create probe tasks through the Web page.

If a dialtesting node needs to run browser dial testing tasks, make sure the node environment meets the following requirements:

  • Lightpanda can be accessed by the DataKit process.
  • The node can access the target site and the Dataway specified by task post_url, which is used to report dial testing results.
graph TD
  %% node definitions
  dt_web(Probe Web UI)
  dt_db(Public Task Storage)
  dt_pub(Public DataKit Node)
  dt_pri(Private DataKit Node)
  site_inner(Private Site)
  site_pub(Public Site)
  dw_inner(Private Dataway)
  dw_pub(Public Dataway)
  server(TrueWatch)

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  dt_web -->|Create Task| dt_db;
  dt_db -->|Pull Tasks| dt_pub -->|Results| dw_pub --> server;
  dt_db -->|Pull Tasks| dt_pri;
  dt_pub <-->|Checking...| site_pub;

  dt_pri <-.->|Checking...| site_pub;
  dw_inner --> server;
  subgraph "User's Private Network"
  dt_pri <-->|Checking...| site_inner;
  dt_pri -->|Results| dw_inner;
  end

Log

All of the following data collections are appended with a global tag named host by default (the tag value is the host name of the DataKit), or can be named in the configuration by [[inputs.dialtesting.tags]] alternative host.

http_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
dest_ip
(tag)
The IP address of the destination
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
method
(tag)
HTTP method, such as GET
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
proto
(tag)
The protocol of the HTTP, such as 'HTTP/1.1'
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
status_code_class
(tag)
The class of the status code, such as '2xx'
status_code_string
(tag)
The status string, such as '200 OK'
url
(tag)
The URL of the endpoint to be monitored
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
message The message string which includes the header and the body of the request or the response
Type: string | (gauge)
Unit: N/A
response_body_size The length of the body of the response
Type: int | (gauge)
Unit: digital,B
response_connection HTTP connection time
Type: float | (gauge)
Unit: time,μs
response_dns HTTP DNS parsing time
Type: float | (gauge)
Unit: time,μs
response_download HTTP downloading time
Type: float | (gauge)
Unit: time,μs
response_ssl HTTP ssl handshake time
Type: float | (gauge)
Unit: time,μs
response_time The time of the response
Type: int | (gauge)
Unit: time,μs
response_ttfb HTTP response ttfb
Type: float | (gauge)
Unit: time,μs
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
ssl_cert_expires_in_days The SSL certificate expires in days
Type: int | (gauge)
Unit: time,d
ssl_cert_not_after The SSL certificate not after time
Type: int | (gauge)
Unit: timeStamp,usec
status_code The response code
Type: int | (gauge)
Unit: N/A
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A

tcp_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
dest_host
(tag)
The name of the host to be monitored
dest_ip
(tag)
The IP address
dest_port
(tag)
The port of the TCP connection
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
proto
(tag)
The protocol of the task
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
message The message string includes the response time or fail reason
Type: string | (gauge)
Unit: N/A
response_time The time of the response
Type: int | (gauge)
Unit: time,μs
response_time_with_dns The time of the response, which contains DNS time
Type: int | (gauge)
Unit: time,μs
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A
traceroute The json string fo the traceroute result
Type: string | (gauge)
Unit: N/A

icmp_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
dest_host
(tag)
The name of the host to be monitored
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
proto
(tag)
The protocol of the task
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
average_round_trip_time The average time of the round trip(RTT)
Type: float | (gauge)
Unit: time,μs
average_round_trip_time_in_millis The average time of the round trip(RTT), deprecated
Type: float | (gauge)
Unit: time,ms
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
max_round_trip_time The maximum time of the round trip(RTT)
Type: float | (gauge)
Unit: time,μs
max_round_trip_time_in_millis The maximum time of the round trip(RTT), deprecated
Type: float | (gauge)
Unit: time,ms
message The message string includes the average time of the round trip or the failure reason
Type: string | (gauge)
Unit: N/A
min_round_trip_time The minimum time of the round trip(RTT)
Type: float | (gauge)
Unit: time,μs
min_round_trip_time_in_millis The minimum time of the round trip(RTT), deprecated
Type: float | (gauge)
Unit: time,ms
packet_loss_percent The loss percent of the packets
Type: float | (gauge)
Unit: percent,percent
packets_received The number of the packets received
Type: int | (gauge)
Unit: count
packets_sent The number of the packets sent
Type: int | (gauge)
Unit: count
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
std_round_trip_time The standard deviation of the round trip
Type: float | (gauge)
Unit: time,μs
std_round_trip_time_in_millis The standard deviation of the round trip, deprecated
Type: float | (gauge)
Unit: time,ms
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A
traceroute The json string fo the traceroute result
Type: string | (gauge)
Unit: N/A

websocket_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
proto
(tag)
The protocol of the task
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
url
(tag)
The URL string, such as ws://www.abc.com
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
message The message string includes the response time or the failure reason
Type: string | (gauge)
Unit: N/A
response_message The message of the response
Type: string | (gauge)
Unit: N/A
response_time The time of the response
Type: int | (gauge)
Unit: time,μs
response_time_with_dns The time of the response, include DNS
Type: int | (gauge)
Unit: time,μs
sent_message The sent message
Type: string | (gauge)
Unit: N/A
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
ssl_cert_expires_in_days The SSL certificate expires in days
Type: int | (gauge)
Unit: time,d
ssl_cert_not_after The SSL certificate not after time
Type: int | (gauge)
Unit: timeStamp,usec
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A

multi_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
last_step The last number of the task be executed
Type: int | (gauge)
Unit: count
message The message string which includes the header and the body of the request or the response
Type: string | (gauge)
Unit: N/A
response_time The time of the response
Type: int | (gauge)
Unit: time,μs
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
steps The result of each step
Type: string | (gauge)
Unit: N/A
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A

grpc_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
dest_host
(tag)
The name of the host to be monitored
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
method
(tag)
The gRPC method name
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
proto
(tag)
The protocol of the task
province
(tag)
The name of the province
server
(tag)
The gRPC server address
status
(tag)
The status of the task, either 'OK' or 'FAIL'
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
message The message string includes the response time or the failure reason
Type: string | (gauge)
Unit: N/A
response_time The time of the response
Type: int | (gauge)
Unit: time,μs
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
ssl_cert_expires_in_days The SSL certificate expires in days
Type: int | (gauge)
Unit: time,d
ssl_cert_not_after The SSL certificate not after time
Type: int | (gauge)
Unit: timeStamp,usec
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A

browser_dial_testing

Tags & Fields Description
browser_engine
(tag)
The browser engine used to run the task
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
url
(tag)
The URL of the page to be monitored
viewport
(tag)
The browser viewport size, such as 1920x1080
browser_config_vars The JSON string of variables defined in browser_config
Type: string | (gauge)
Unit: N/A
browser_run_id The browser run ID
Type: string | (gauge)
Unit: N/A
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
has_screenshot Whether the browser run has uploaded screenshots
Type: bool | (gauge)
Unit: N/A
last_step The last browser step sequence number
Type: int | (gauge)
Unit: count
message The message string includes success message or failure reason
Type: string | (gauge)
Unit: N/A
response_time The browser run duration
Type: int | (gauge)
Unit: time,μs
retry_count The retry count of the browser run
Type: int | (gauge)
Unit: count
retry_records The JSON string of browser retry attempt records
Type: string | (gauge)
Unit: N/A
screenshot_upload_error The browser screenshot upload error
Type: string | (gauge)
Unit: N/A
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
steps The JSON string of browser step results
Type: string | (gauge)
Unit: N/A
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A
trace_id The first trace ID captured during the browser run
Type: string | (gauge)
Unit: N/A
viewport_height The browser viewport height
Type: int | (gauge)
Unit: N/A
viewport_width The browser viewport width
Type: int | (gauge)
Unit: N/A

traceroute

traceroute is the JSON text of the "route trace" data, and the entire data is an array object in which each array element records a route probe, as shown in the following example:

[
    {
        "total": 2,
        "failed": 0,
        "loss": 0,
        "avg_cost": 12700395,
        "min_cost": 11902041,
        "max_cost": 13498750,
        "std_cost": 1129043,
        "items": [
            {
                "ip": "10.8.9.1",
                "response_time": 13498750
            },
            {
                "ip": "10.8.9.1",
                "response_time": 11902041
            }
        ]
    },
    {
        "total": 2,
        "failed": 0,
        "loss": 0,
        "avg_cost": 13775021,
        "min_cost": 13740084,
        "max_cost": 13809959,
        "std_cost": 49409,
        "items": [
            {
                "ip": "10.12.168.218",
                "response_time": 13740084
            },
            {
                "ip": "10.12.168.218",
                "response_time": 13809959
            }
        ]
    }
]

Field description:

Field Type Description
total number Total number of detections
failed number Number of failures
loss number Percentage of failure
avg_cost number Average time spent (μs)
min_cost number Minimum time consumption (μs)
max_cost number Maximum time consumption(μs)
std_cost number Standard deviation of time consumption(μs)
items Array of items Per probe information (see following items)

items

Field Type Description
ip string IP address, if it fails, the value is *
response_time number Response time (μs)

Metric

The dialtesting collector exposes Prometheus metrics. By default, the DataKit collector collects and uploads these datakit_dialtesting_* metrics to TrueWatch without additional configuration.