Amazon S3
Send logs, data, metrics to Amazon S3
Last updated
Send logs, data, metrics to Amazon S3
Last updated
The Amazon S3 output plugin allows you to ingest your records into the S3 cloud object store.
The plugin can upload data to S3 using the multipart upload API or using S3 PutObject. Multipart is the default and is recommended; Fluent Bit will stream data in a series of 'parts'. This limits the amount of data it has to buffer on disk at any point in time. By default, every time 5 MiB of data have been received, a new 'part' will be uploaded. The plugin can create files up to gigabytes in size from many small chunks/parts using the multipart API. All aspects of the upload process are configurable using the configuration options.
The plugin allows you to specify a maximum file size, and a timeout for uploads. A file will be created in S3 when the max size is reached, or the timeout is reached- whichever comes first.
Records are stored in files in S3 as newline delimited JSON.
See here for details on how AWS credentials are fetched.
Key | Description | Default |
---|---|---|
To skip TLS verification, set tls.verify
as false
. For more details about the properties available and general configuration, please refer to the TLS/SSL section.
The plugin requires the following AWS IAM permissions:
In Fluent Bit, all logs have an associated tag. The s3_key_format
option lets you inject the tag into the s3 key using the following syntax:
$TAG
=> the full tag
$TAG[n]
=> the nth part of the tag (index starting at zero). This syntax is copied from the rewrite tag filter. By default, “parts” of the tag are separated with dots, but you can change this with s3_key_format_tag_delimiters
.
In the example below, assume the date is January 1st, 2020 00:00:00 and the tag associated with the logs in question is my_app_name-logs.prod
.
With the delimiters as . and -, the tag will be split into parts as follows:
$TAG[0]
= my_app_name
$TAG[1]
= logs
$TAG[2]
= prod
So the key in S3 will be /prod/my_app_name/2020/01/01/00/00/00/bgdHN1NM.gz
.
The store_dir
is used to temporarily store data before it is uploaded. If Fluent Bit is stopped suddenly it will try to send all data and complete all uploads before it shuts down. If it can not send some data, on restart it will look in the store_dir
for existing data and will try to send it.
Multipart uploads are ideal for most use cases because they allow the plugin to upload data in small chunks over time. For example, 1 GB file can be created from 200 5MB chunks. While the file size in S3 will be 1 GB, only 5 MB will be buffered on disk at any one point in time.
There is one minor drawback to multipart uploads- the file and data will not be visible in S3 until the upload is completed with a CompleteMultipartUpload call. The plugin will attempt to make this call whenever Fluent Bit is shut down to ensure your data is available in s3. It will also store metadata about each upload in the store_dir
, ensuring that uploads can be completed when Fluent Bit restarts (assuming it has access to persistent disk and the store_dir
files will still be present on restart).
If you run Fluent Bit in an environment without persistent disk, or without the ability to restart Fluent Bit and give it access to the data stored in the store_dir
from previous executions- some considerations apply. This might occur if you run Fluent Bit on AWS Fargate.
In these situations, we recommend using the PutObject API, and sending data frequently, to avoid local buffering as much as possible. This will limit data loss in the event Fluent Bit is killed unexpectedly.
The following settings are recommended for this use case:
Fluent Bit 1.7 adds a new feature called workers
which enables outputs to have dedicated threads. This s3
plugin has partial support for workers. The plugin can only support a single worker; enabling multiple workers will lead to errors/indeterminate behavior.
Example:
If you enable a single worker, you are enabling a dedicated thread for your S3 output. We recommend starting without workers, evaluating the performance, and then enabling a worker if needed. For most users, the plugin can provide sufficient throughput without workers.
MinIO is a high-performance, S3 compatible object storage and you can build your app with S3 functionality without S3.
Assume you run a MinIO server at localhost:9000, and create a bucket of your-bucket
by referring the client docs.
Example:
Then, the records will be stored into the MinIO server.
In order to send records into Amazon S3, you can run the plugin from the command line or through the configuration file.
The s3 plugin, can read the parameters from the command line through the -p argument (property), e.g:
In your main configuration file append the following Output section:
An example that using PutObject instead of multipart:
Amazon distributes a container image with Fluent Bit and this plugins.
github.com/aws/aws-for-fluent-bit
Our images are available in Amazon ECR Public Gallery. You can download images with different tags by following command:
For example, you can pull the image with latest version by:
If you see errors for image pull limits, try log into public ECR with your AWS credentials:
You can check the Amazon ECR Public official doc for more details.
You can use our SSM Public Parameters to find the Amazon ECR image URI in your region:
For more see the AWS for Fluent Bit github repo.
Starting from Fluent Bit v1.8, the Amazon S3 plugin includes the support for Apache Arrow. The support is currently not enabled by default, as it depends on a shared version of libarrow
as the prerequisite.
To use this feature, FLB_ARROW
must be turned on at compile time:
Once compiled, Fluent Bit can upload incoming data to S3 in Apache Arrow format. For example:
As shown in this example, setting Compression
to arrow
makes Fluent Bit to convert payload into Apache Arrow format.
The stored data is very easy to load, analyze and process using popular data processing tools (such as Python pandas, Apache Spark and Tensorflow). The following code uses pyarrow
to analyze the uploaded data:
region
The AWS region of your S3 bucket
us-east-1
bucket
S3 Bucket name
None
json_date_key
Specify the name of the time key in the output record. To disable the time key just set the value to false
.
date
json_date_format
Specify the format of the date. Supported formats are double, epoch, iso8601 (eg: 2018-05-30T09:39:52.000681Z) and java_sql_timestamp (eg: 2018-05-30 09:39:52.000681)
iso8601
total_file_size
Specifies the size of files in S3. Maximum size is 50G, minimim is 1M.
100M
upload_chunk_size
The size of each 'part' for multipart uploads. Max: 50M
5,242,880 bytes
upload_timeout
Whenever this amount of time has elapsed, Fluent Bit will complete an upload and create a new file in S3. For example, set this value to 60m and you will get a new file every hour.
10m
store_dir
Directory to locally buffer data before sending. When multipart uploads are used, data will only be buffered until the upload_chunk_size
is reached.
/tmp/fluent-bit/s3
s3_key_format
Format string for keys in S3. This option supports a UUID, strftime time formatters, a syntax for selecting parts of the Fluent log tag using a syntax inspired by the rewrite_tag filter. Add $UUID in the format string to insert a random string. Add $INDEX in the format string to insert an integer that increments each upload. Add $TAG in the format string to insert the full log tag; add $TAG[0] to insert the first part of the tag in the s3 key. The tag is split into “parts” using the characters specified with the s3_key_format_tag_delimiters
option. Add extension directly after the last piece of the format string to insert a key suffix. If you want to specify a key suffix and you are in use_put_object
mode, you must specify $UUID as well. More explanations can be found in use_put_object
option. See the in depth examples and tutorial in the documentation.
/fluent-bit-logs/$TAG/%Y/%m/%d/%H/%M/%S
s3_key_format_tag_delimiters
A series of characters which will be used to split the tag into 'parts' for use with the s3_key_format option. See the in depth examples and tutorial in the documentation.
.
static_file_path
Disables behavior where UUID string is automatically appended to end of S3 key name when $UUID is not provided in s3_key_format. $UUID, time formatters, $TAG, and other dynamic key formatters all work as expected while this feature is set to true.
false
use_put_object
Use the S3 PutObject API, instead of the multipart upload API. When this option is on, key extension is only available when $UUID is specified in s3_key_format
. If $UUID is not included, a random string will be appended at the end of the format string and the key extension cannot be customized in this case.
false
role_arn
ARN of an IAM role to assume (ex. for cross account access).
None
endpoint
Custom endpoint for the S3 API. An endpoint can contain scheme and port.
None
sts_endpoint
Custom endpoint for the STS API.
None
canned_acl
Predefined Canned ACL policy for S3 objects.
None
compression
Compression type for S3 objects. 'gzip' is currently the only supported value. The Content-Encoding HTTP Header will be set to 'gzip'. Compression can be enabled when use_put_object
is on. If Apache Arrow support was enabled at compile time, you can set 'arrow' to this option.
None
content_type
A standard MIME type for the S3 object; this will be set as the Content-Type HTTP header.
None
send_content_md5
Send the Content-MD5 header with PutObject and UploadPart requests, as is required when Object Lock is enabled.
false
auto_retry_requests
Immediately retry failed requests to AWS services once. This option does not affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which may help improve throughput when there are transient/random networking issues.
true
log_key
By default, the whole log record will be sent to S3. If you specify a key name with this option, then only the value of that key will be sent to S3. For example, if you are using Docker, you can specify log_key log and only the log message will be sent to S3.
None
preserve_data_ordering
Normally, when an upload request fails, there is a high chance for the last received chunk to be swapped with a later chunk, resulting in data shuffling. This feature prevents this shuffling by using a queue logic for uploads.
false
storage_class
Specify the storage class for S3 objects. If this option is not specified, objects will be stored with the default 'STANDARD' storage class.
None