Backpressure

In certain environments is common to see that logs or data being ingested is faster than the ability to flush it to some destinations. The common case is reading from big log files and dispatching the logs to a backend over the network which takes some time to respond, this generate backpressure leading to a high memory consumption in the service.

To avoid backpressure, Calyptia Core Agent implements a mechanism in the engine that restrict the amount of data than an input plugin can ingest, this is done through the configuration parameter Mem_Buf_Limit.

As described in the Buffering concepts section, Calyptia Core Agent offers an hybrid mode for data handling: in-memory and filesystem. (optional).

In memory is always available and can be restricted with Mem_Buf_Limit. If your plugin gets restricted because of the configuration and you are under a backpressure scenario, you won't be able to ingest more data until the data chunks that are in memory can flushed.

Depending of the input plugin type in use, this might lead to discard incoming data (e.g: TCP input plugin), but you can rely on the secondary filesystem buffering to be safe.

If in addition to Mem_Buf_Limit the input plugin defined a storage.type of filesystem (as described in Buffering & Storage), when the limit is reached, all the new data will be stored safety in the file system.

Mem_Buf_Limit

This option is disabled by default and can be applied to all input plugins. Let's explain it behavior using the following scenario:

  • Mem_Buf_Limit is set to 1MB (one megabyte)

  • input plugin tries to append 700KB

  • engine route the data to an output plugin

  • output plugin backend (HTTP Server) is down

  • engine scheduler will retry the flush after 10 seconds

  • input plugin tries to append 500KB

At this point, the engine will allow to append those 500 KB of data into the engine: in total we have 1.2 MB. The options works in a permissive mode before to reach the limit, but the limit is exceeded the following actions are taken:

  • block local buffers for the input plugin (cannot append more data)

  • notify the input plugin invoking a pause callback

The engine will protect it self and will not append more data coming from the input plugin in question. It's the responsibility of the plugin to keep their state and take some decisions about what to do on that paused state.

After some seconds, if the scheduler was able to flush the initial 700 KB of data or it gave up after retrying, that amount memory is released and internally the following actions happens:

  • Upon data buffer release (700KB), the internal counters get updated

  • Counters now are set at 500KB

  • Since 500 KB is less than 1 MB, it checks the input plugin state

  • If the plugin is paused, it invokes a resume callback

  • input plugin can continue appending more data

About pause and resume callbacks

Each plugin is independent and not all of them implements the pause and resume callbacks. As said, these callbacks are just a notification mechanism for the plugin.

The plugin who implements and keep a good state is the Tail Input plugin. When the pause callback is triggered, it stop their collectors and stop appending data. Upon resume, it re-enable the collectors.

Last updated