Amazon S3

Send logs, data, metrics to Amazon S3

The Amazon S3 output plugin allows you to ingest your records into the S3 cloud object store.

The plugin can upload data to S3 using the multipart upload API or using S3 PutObject. Multipart is the default and is recommended; Fluent Bit will stream data in a series of 'parts'. This limits the amount of data it has to buffer on disk at any point in time. By default, every time 5 MiB of data have been received, a new 'part' will be uploaded. The plugin can create files up to gigabytes in size from many small chunks/parts using the multipart API. All aspects of the upload process are configurable using the configuration options.

The plugin allows you to specify a maximum file size, and a timeout for uploads. A file will be created in S3 when the max size is reached, or the timeout is reached- whichever comes first.

Records are stored in files in S3 as newline delimited JSON.

See here for details on how AWS credentials are fetched.

Configuration Parameters

Key

Description

Default

TLS / SSL

To skip TLS verification, set tls.verify as false. For more details about the properties available and general configuration, please refer to the TLS/SSL section.

Permissions

The plugin requires the following AWS IAM permissions:

{
	"Version": "2012-10-17",
	"Statement": [{
		"Effect": "Allow",
		"Action": [
			"s3:PutObject"
		],
		"Resource": "*"
	}]
}

S3 Key Format and Tag Delimiters

In Fluent Bit, all logs have an associated tag. The s3_key_format option lets you inject the tag into the s3 key using the following syntax:

$TAG => the full tag
$TAG[n] => the nth part of the tag (index starting at zero). This syntax is copied from the rewrite tag filter. By default, “parts” of the tag are separated with dots, but you can change this with s3_key_format_tag_delimiters.

In the example below, assume the date is January 1st, 2020 00:00:00 and the tag associated with the logs in question is my_app_name-logs.prod.

[OUTPUT]
    Name                         s3
    Match                        *
    bucket                       my-bucket
    region                       us-west-2
    total_file_size              250M
    s3_key_format                /$TAG[2]/$TAG[0]/%Y/%m/%d/%H/%M/%S/$UUID.gz
    s3_key_format_tag_delimiters .-

With the delimiters as . and -, the tag will be split into parts as follows:

$TAG[0] = my_app_name
$TAG[1] = logs
$TAG[2] = prod

So the key in S3 will be /prod/my_app_name/2020/01/01/00/00/00/bgdHN1NM.gz.

Reliability

The store_dir is used to temporarily store data before it is uploaded. If Fluent Bit is stopped suddenly it will try to send all data and complete all uploads before it shuts down. If it can not send some data, on restart it will look in the store_dir for existing data and will try to send it.

Multipart uploads are ideal for most use cases because they allow the plugin to upload data in small chunks over time. For example, 1 GB file can be created from 200 5MB chunks. While the file size in S3 will be 1 GB, only 5 MB will be buffered on disk at any one point in time.

There is one minor drawback to multipart uploads- the file and data will not be visible in S3 until the upload is completed with a CompleteMultipartUpload call. The plugin will attempt to make this call whenever Fluent Bit is shut down to ensure your data is available in s3. It will also store metadata about each upload in the store_dir, ensuring that uploads can be completed when Fluent Bit restarts (assuming it has access to persistent disk and the store_dir files will still be present on restart).

Using S3 without persisted disk

If you run Fluent Bit in an environment without persistent disk, or without the ability to restart Fluent Bit and give it access to the data stored in the store_dir from previous executions- some considerations apply. This might occur if you run Fluent Bit on AWS Fargate.

In these situations, we recommend using the PutObject API, and sending data frequently, to avoid local buffering as much as possible. This will limit data loss in the event Fluent Bit is killed unexpectedly.

The following settings are recommended for this use case:

[OUTPUT]
     Name s3
     Match *
     bucket your-bucket
     region us-east-1
     total_file_size 1M
     upload_timeout 1m
     use_put_object On

Worker support

Fluent Bit 1.7 adds a new feature called workers which enables outputs to have dedicated threads. This s3 plugin has partial support for workers. The plugin can only support a single worker; enabling multiple workers will lead to errors/indeterminate behavior.

Example:

[OUTPUT]
     Name s3
     Match *
     bucket your-bucket
     region us-east-1
     total_file_size 1M
     upload_timeout 1m
     use_put_object On
     workers 1

If you enable a single worker, you are enabling a dedicated thread for your S3 output. We recommend starting without workers, evaluating the performance, and then enabling a worker if needed. For most users, the plugin can provide sufficient throughput without workers.

Usage with MinIO

MinIO is a high-performance, S3 compatible object storage and you can build your app with S3 functionality without S3.

Assume you run a MinIO server at localhost:9000, and create a bucket of your-bucket by referring the client docs.

Example:

[OUTPUT]
     Name s3
     Match *
     bucket your-bucket
     endpoint http://localhost:9000

Then, the records will be stored into the MinIO server.

Getting Started

In order to send records into Amazon S3, you can run the plugin from the command line or through the configuration file.

Command Line

The s3 plugin, can read the parameters from the command line through the -p argument (property), e.g:

$ fluent-bit -i cpu -o s3 -p bucket=my-bucket -p region=us-west-2 -p -m '*' -f 1

Configuration File

In your main configuration file append the following Output section:

[OUTPUT]
     Name s3
     Match *
     bucket your-bucket
     region us-east-1
     store_dir /home/ec2-user/buffer
     total_file_size 50M
     upload_timeout 10m

An example that using PutObject instead of multipart:

[OUTPUT]
     Name s3
     Match *
     bucket your-bucket
     region us-east-1
     store_dir /home/ec2-user/buffer
     use_put_object On
     total_file_size 10M
     upload_timeout 10m

AWS for Fluent Bit

Amazon distributes a container image with Fluent Bit and this plugins.

GitHub

github.com/aws/aws-for-fluent-bit

Amazon ECR Public Gallery

aws-for-fluent-bit

Our images are available in Amazon ECR Public Gallery. You can download images with different tags by following command:

docker pull public.ecr.aws/aws-observability/aws-for-fluent-bit:<tag>

For example, you can pull the image with latest version by:

docker pull public.ecr.aws/aws-observability/aws-for-fluent-bit:latest

If you see errors for image pull limits, try log into public ECR with your AWS credentials:

aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws

You can check the Amazon ECR Public official doc for more details.

Docker Hub

amazon/aws-for-fluent-bit

Amazon ECR

You can use our SSM Public Parameters to find the Amazon ECR image URI in your region:

aws ssm get-parameters-by-path --path /aws/service/aws-for-fluent-bit/

For more see the AWS for Fluent Bit github repo.

Advanced usage

Use Apache Arrow for in-memory data processing

Starting from Fluent Bit v1.8, the Amazon S3 plugin includes the support for Apache Arrow. The support is currently not enabled by default, as it depends on a shared version of libarrow as the prerequisite.

To use this feature, FLB_ARROW must be turned on at compile time:

$ cd build/
$ cmake -DFLB_ARROW=On ..
$ cmake --build .

Once compiled, Fluent Bit can upload incoming data to S3 in Apache Arrow format. For example:

[INPUT]
  Name cpu

[OUTPUT]
  Name s3
  Bucket your-bucket-name
  total_file_size 1M
  use_put_object On
  upload_timeout 60s
  Compression arrow

As shown in this example, setting Compression to arrow makes Fluent Bit to convert payload into Apache Arrow format.

The stored data is very easy to load, analyze and process using popular data processing tools (such as Python pandas, Apache Spark and Tensorflow). The following code uses pyarrow to analyze the uploaded data:

>>> import pyarrow.feather as feather
>>> import pyarrow.fs as fs
>>>
>>> s3 = fs.S3FileSystem()
>>> file = s3.open_input_file("my-bucket/fluent-bit-logs/cpu.0/2021/04/27/09/36/15-object969o67ZF")
>>> df = feather.read_feather(file)
>>> print(df.head())
                          date  cpu_p  user_p  system_p  cpu0.p_cpu  cpu0.p_user  cpu0.p_system
0  2021-04-27T09:33:53.539346Z    1.0     1.0       0.0         1.0          1.0            0.0
1  2021-04-27T09:33:54.539330Z    0.0     0.0       0.0         0.0          0.0            0.0
2  2021-04-27T09:33:55.539305Z    1.0     0.0       1.0         1.0          0.0            1.0
3  2021-04-27T09:33:56.539430Z    0.0     0.0       0.0         0.0          0.0            0.0
4  2021-04-27T09:33:57.539803Z    0.0     0.0       0.0         0.0          0.0            0.0

PreviousAmazon Kinesis Data Streams NextAzure Blob

Last updated 2 years ago