Monitoring

Learn how to monitor your Calyptia Core Agent data pipelines

Calyptia Core Agent comes with built-it features to allow you to monitor the internals of your pipeline, connect to Prometheus and Grafana, Health checks and also connectors to use external services for such purposes:

HTTP Server

Calyptia Core Agent comes with a built-in HTTP Server that can be used to query internal information and monitor metrics of each running plugin.

The monitoring interface can be easily integrated with Prometheus since we support it native format.

Getting Started

To get started, the first step is to enable the HTTP Server from the configuration file:

[SERVICE]
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_PORT    2020

[INPUT]
    Name cpu

[OUTPUT]
    Name  stdout
    Match *

the above configuration snippet will instruct Calyptia Core Agent to start it HTTP Server on TCP Port 2020 and listening on all network interfaces:

$ bin/calyptia-fluent-bit -c calyptia-fluent-bit.conf
Calyptia Fluent Bit 20.10.03

[2020/03/10 19:08:24] [ info] [engine] started
[2020/03/10 19:08:24] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020

now with a simple curl command is enough to gather some information:

$ curl -s http://127.0.0.1:2020 | jq
{
  "calyptia-fluent-bit": {
    "version": "22.10.03",
    "edition": "lts",
    "flags": [
      "FLB_HAVE_TLS",
      "FLB_HAVE_METRICS",
      "FLB_HAVE_SQLDB",
      "FLB_HAVE_TRACE",
      "FLB_HAVE_HTTP_SERVER",
      "FLB_HAVE_FLUSH_LIBCO",
      "FLB_HAVE_SYSTEMD",
      "FLB_HAVE_VALGRIND",
      "FLB_HAVE_FORK",
      "FLB_HAVE_PROXY_GO",
      "FLB_HAVE_REGEX",
      "FLB_HAVE_C_TLS",
      "FLB_HAVE_SETJMP",
      "FLB_HAVE_ACCEPT4",
      "FLB_HAVE_INOTIFY"
    ]
  }
}

Note that we are sending the curl command output to the jq program which helps to make the JSON data easy to read from the terminal. Calyptia Core Agent don't aim to do JSON pretty-printing.

REST API Interface

Calyptia Core Agent aims to expose useful interfaces for monitoring, as of Calyptia Core Agent v22.10.03 the following end points are available:

Metric descriptions

The following are detailed descriptions for the metrics output in Prometheus format by /api/v1/metrics/prometheus.

The following definitions are key to understand:

  • record: a single message collected from a source, such as a single long line in a file.

  • chunk: Calyptia Core Agent input plugin instances ingest log records and store them in chunks. A batch of records in a chunk are tracked together as a single unit; the Calyptia Core Agent engine attempts to fit records into chunks of at most 2 MB, but the size can vary at runtime. Chunks are then sent to an output. An output plugin instance can either successfully send the full chunk to the destination and mark it as successful, or it can fail the chunk entirely if an unrecoverable error is encountered, or it can ask for the chunk to be retried.

The following are detailed descriptions for the metrics outputted in JSON format by /api/v1/storage.

Uptime example

Query the service uptime with the following command:

$ curl -s http://127.0.0.1:2020/api/v1/uptime | jq

It should print a similar output like this:

{
  "uptime_sec": 8950000,
  "uptime_hr": "Calyptia Fluent Bit has been running:  103 days, 14 hours, 6 minutes and 40 seconds"
}

Metrics examples

Query internal metrics in JSON format with the following command:

$ curl -s http://127.0.0.1:2020/api/v1/metrics | jq

it should print a similar output like this:

{
  "input": {
    "cpu.0": {
      "records": 8,
      "bytes": 2536
    }
  },
  "output": {
    "stdout.0": {
      "proc_records": 5,
      "proc_bytes": 1585,
      "errors": 0,
      "retries": 0,
      "retries_failed": 0
    }
  }
}

Metrics in Prometheus format

Query internal metrics in Prometheus Text 0.0.4 format:

$ curl -s http://127.0.0.1:2020/api/v1/metrics/prometheus

this time the same metrics will be in Prometheus format instead of JSON:

fluentbit_input_records_total{name="cpu.0"} 57 1509150350542
fluentbit_input_bytes_total{name="cpu.0"} 18069 1509150350542
fluentbit_output_proc_records_total{name="stdout.0"} 54 1509150350542
fluentbit_output_proc_bytes_total{name="stdout.0"} 17118 1509150350542
fluentbit_output_errors_total{name="stdout.0"} 0 1509150350542
fluentbit_output_retries_total{name="stdout.0"} 0 1509150350542
fluentbit_output_retries_failed_total{name="stdout.0"} 0 1509150350542

Configuring aliases

By default configured plugins on runtime get an internal name in the format plugin_name.ID. For monitoring purposes, this can be confusing if many plugins of the same type were configured. To make a distinction each configured input or output section can get an alias that will be used as the parent name for the metric.

The following example set an alias to the INPUT section which is using the CPU input plugin:

[SERVICE]
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_PORT    2020

[INPUT]
    Name  cpu
    Alias server1_cpu

[OUTPUT]
    Name  stdout
    Alias raw_output
    Match *

Now when querying the metrics we get the aliases in place instead of the plugin name:

{
  "input": {
    "server1_cpu": {
      "records": 8,
      "bytes": 2536
    }
  },
  "output": {
    "raw_output": {
      "proc_records": 5,
      "proc_bytes": 1585,
      "errors": 0,
      "retries": 0,
      "retries_failed": 0
    }
  }
}

Grafana Dashboard and Alerts

The exposed Prometheus-style metrics for Calyptia Core Agent can be leveraged to create dashboards and alerts.

The provided example dashboard is heavily inspired by the Banzai Cloud logging operator dashboard but with a few key differences, such as the use of the instance label (see why here), stacked graphs, and a focus on Calyptia Core Agent metrics.

Alerts

Sample alerts are available here.

Health Check for Calyptia Core Agent

Calyptia Core Agent supports four configs to set up the health check:

Note: Not every error log means an error nor be counted, the errors retry failures count only on specific errors which is the example in config table description

So the feature works as: Based on the HC_Period customer setup, if the real error number is over HC_Errors_Count or retry failure is over HC_Retry_Failure_Count, Calyptia Core Agent will be considered as unhealthy. The health endpoint will return HTTP status 500 and String error. Otherwise it's healthy, will return HTTP status 200 and string ok

The equation is:

health status = (HC_Errors_Count > HC_Errors_Count config value) OR (HC_Retry_Failure_Count > HC_Retry_Failure_Count config value) IN the HC_Period interval

Note: the HC_Errors_Count and HC_Retry_Failure_Count only count for output plugins and count a sum for errors and retry failures from all output plugins which is running.

See the config example:

[SERVICE]
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_PORT    2020
    Health_Check On
    HC_Errors_Count 5
    HC_Retry_Failure_Count 5
    HC_Period 5

[INPUT]
    Name  cpu

[OUTPUT]
    Name  stdout
    Match *

The command to call health endpoint

$ curl -s http://127.0.0.1:2020/api/v1/health

Based on the Calyptia Core Agent status, the result will be:

  • HTTP status 200 and "ok" in response to healthy status

  • HTTP status 500 and "error" in response for unhealthy status

With the example config, the health status is determined by following equation:

Health status = (HC_Errors_Count > 5) OR (HC_Retry_Failure_Count > 5) IN 5 seconds

If (HC_Errors_Count > 5) OR (HC_Retry_Failure_Count > 5) IN 5 seconds is TRUE, then it's unhealthy.

If (HC_Errors_Count > 5) OR (HC_Retry_Failure_Count > 5) IN 5 seconds is FALSE, then it's healthy.

Calyptia Cloud

Calyptia Cloud is a hosted service that allows you to monitor your Calyptia Core Agent instances, including data flow, metrics and configurations.

Get Started with Calyptia Cloud

Register your Calyptia Core Agent instances will take less than one minute, steps:

In your Calyptia Core Agent configuration file, append the following configuration section:

[CUSTOM]
    name     calyptia
    api_key  <YOUR_API_KEY>

Make sure to replace your API key in the configuration.

After a few seconds, upon restart your Calyptia Core Agent, the Calyptia Cloud Dashboard will list your agent. Metrics will take around 30 seconds to shows up.

Contact Calyptia

To get in touch with Calyptia team, just send an email to hello@calyptia.com

Last updated