Skip to content

dstore-dist-top-reporter

Root Node Config

dstore-dist-top-reporter configuration is based on the concepts of streams, which group the characteristics of the incoming events, and reports which are used to generate regular reports from a specified stream. There is also the concept of storage, which specifies where the generated reports will be stored. These interact as follows:

  • Streams: Input sources for data distributed via dstoredist
  • Reports: The reports which should be generated based on a stream
  • Storage: Locations where reports should be stored

The relationship between these is as follows:

  • A report is based on the data from a single stream, multiple reports can be generated from the same stream
  • All reports are stored in all configured storage, unless otherwise specified in the configuration of the storage

A simple configuration file for dstore-dist-top-reporter might look like:

http:
  address: ":8701"

streams:
  - name: all-queries
    title: "All traffic (sampled)"
    address: ":4801"
    # This needs to match the sample value configured in dstore-dist
    upstream_sampling: 1000 

# Reports are generated from streams.
reports:
- name: all-tldplusone-domains
  # This uses the public domain suffix list to remove internal subdomains
  # e.g. www.example.com and mail.example.com will both be truncated to example.com
  field: qname/suffix+1
  # We always want to oversample, otherwise the summary data will be skewed
  n: 5000
  stream: all-queries
  interval: 60s

storage:
  - name: elasticsearch
    # This is currently the only supported backend
    backend: elastic
    skip_empty: true
    url: http://elasticsearch:9200/
    # Ensure the index contains the report name and today's date
    elastic_index_template: "{{.ReportName}}-{{.TimestampDate}}"

Note that at least one stream, one report and one storage must be configured.

The following YAML key-values are supported for configuration at the root node:

Parameter Type Default Description
http.address <ip:port> The address to listen on for Prometheus metrics and for the status page. The value is an address:port string, in either v4 or v6 format. IPv6 addresses must be placed in square brackets like this [::1]. You can omit the address to listen on all local addresses.
streams List of Stream Configuration of the incoming event streams
reports List of Report Configuration of the reports to generate
storage List of Storage Configuration of storage for report

Stream

Parameters which can be used to configure a TopN stream:

Parameter Type Required Default Description
name string yes Name of the stream
title string Display friendly name of the stream
upstream_sampling integer 1 Sampling value used in the dstore-dist destination which populates this stream
address ip:port The address (optional) and port to listen on for this stream
tlsconfig TLS Config 1 Sampling value used in the dstoredist destination which populates this stream

Report

Parameters which can be used to configure a TopN report:

Parameter Type Required Default Description
entries integer 1000 Maximum number of entries to include in the report
field string "qname" Field to use as the key for the report (see below for possible values)
interval go:DurationString "300s" How often to generate the report (longer interval means longer in-memory storage of data = higher memory usage)
name string yes Name of the report
stream string yes Name of an input stream defined on this TopN instance

"field" can take the following values:

field Description
qname The lowercase DNS question name
qname/raw The raw qname, not converted to lowercase
qname/suffix The public suffix of the qname (e.g. .com, .co.uk, etc.)
qname/suffix+1 The public suffix plus one label (e.g. example.com, example.co.uk, etc.)
qname/tld The TLD (e.g. com, uk, etc.)
requestorid The subscriber’s username
ip/prefix32/prefix64 The IP address of the client, with the IP address aggregated to the v4/v6 prefix specified. For example ip/32/128 would perform no aggregation of v4 or v6 IPs.

Storage

Parameters which can be used to configure a TopN storage backend:

Parameter Type Required Default Description
backend string yes Type of storage backend.
Available options: "elastic"
See below for more configuration options specific to each backend.
name string yes Name of the storage backend
reports []string List of Report names (all reports will be stored if this list is empty)
retry_max integer 0 Maximum number of retries in case of connection errors or HTTP-500
skip_empty boolean false Skip generating a report if there are no entries
tlsconfig TLS Config {} TLS configuration options for the storage backend

Backend: Elastic

Additional parameters which are available on storage backends with backend: elastic. These should be attributes of the storage item itself. For example:

Parameter Type Required Default Description
elastic_id_template string Template used to render the Elastic IDs.
Randomly generated if not configured
elastic_index_template string {{.ReportName}} Template used to render the name of the index to use in Elastic to store the reports
elastic_single_doc boolean false Store a report in a single document in Elastic
username string Username to use for authentication with Elastic
password string Password to use for authentication with Elastic
url string yes Base URL of Elastic instance
retry_max integer Maximum number of retries. By default no retries are attempted.