FetchS3Object 2.0.0

Bundle
org.apache.nifi | nifi-aws-nar
Description
Retrieves the contents of an S3 Object and writes it to the content of a FlowFile
Tags
AWS, Amazon, Fetch, Get, S3
Input Requirement
REQUIRED
Supports Sensitive Dynamic Properties
false
Properties
Relationships
Name Description
failure If the Processor is unable to process a given FlowFile, it will be routed to this Relationship.
success FlowFiles are routed to this Relationship after they have been successfully processed.
Writes Attributes
Name Description
s3.url The URL that can be used to access the S3 object
s3.bucket The name of the S3 bucket
path The path of the file
absolute.path The path of the file
filename The name of the file
hash.value The MD5 sum of the file
hash.algorithm MD5
mime.type If S3 provides the content type/MIME type, this attribute will hold that file
s3.etag The ETag that can be used to see if the file has changed
s3.exception The class name of the exception thrown during processor execution
s3.additionalDetails The S3 supplied detail from the failed operation
s3.statusCode The HTTP error code (if available) from the failed operation
s3.errorCode The S3 moniker of the failed operation
s3.errorMessage The S3 exception message from the failed operation
s3.expirationTime If the file has an expiration date, this attribute will be set, containing the milliseconds since epoch in UTC time
s3.expirationTimeRuleId The ID of the rule that dictates this object's expiration time
s3.sseAlgorithm The server side encryption algorithm of the object
s3.version The version of the S3 object
s3.encryptionStrategy The name of the encryption strategy that was used to store the S3 object (if it is encrypted)
Use Cases
  • Fetch a specific file from S3
    Description
    Fetch a specific file from S3
    Configuration
    The "Bucket" property should be set to the name of the S3 bucket that contains the file. Typically this is defined as an attribute on an incoming FlowFile, so this property is set to `${s3.bucket}`.
    The "Object Key" property denotes the fully qualified filename of the file to fetch. Typically, the FlowFile's `filename` attribute is used, so this property is set to `${filename}`.
    The "Region" property must be set to denote the S3 region that the Bucket resides in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like `#{S3_REGION}`.
    
    The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the file.
    
Use Cases Involving Other Components
  • Retrieve all files in an S3 bucket
    Description
    Retrieve all files in an S3 bucket
    Keywords
    s3, state, retrieve, fetch, all, stream
    Processor Configurations
    org.apache.nifi.processors.aws.s3.ListS3
    The "Bucket" property should be set to the name of the S3 bucket that files reside in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize     this property by setting it to something like `#{S3_SOURCE_BUCKET}`.
    The "Region" property must be set to denote the S3 region that the Bucket resides in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize     this property by setting it to something like `#{S3_SOURCE_REGION}`.
    
    The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.
    
    The 'success' Relationship of this Processor is then connected to FetchS3Object.
    
    org.apache.nifi.processors.aws.s3.FetchS3Object
    "Bucket" = "${s3.bucket}"
    "Object Key" = "${filename}"
    
    The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.
    
    The "Region" property must be set to the same value as the "Region" property of the ListS3 Processor.
    
  • Retrieve only files from S3 that meet some specified criteria
    Description
    Retrieve only files from S3 that meet some specified criteria
    Keywords
    s3, state, retrieve, filter, select, fetch, criteria
    Processor Configurations
    org.apache.nifi.processors.aws.s3.ListS3
    The "Bucket" property should be set to the name of the S3 bucket that files reside in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize     this property by setting it to something like `#{S3_SOURCE_BUCKET}`.
    The "Region" property must be set to denote the S3 region that the Bucket resides in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize     this property by setting it to something like `#{S3_SOURCE_REGION}`.
    
    The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.
    
    The 'success' Relationship of this Processor is then connected to RouteOnAttribute.
    
    org.apache.nifi.processors.standard.RouteOnAttribute
    If you would like to "OR" together all of the conditions (i.e., the file should be retrieved if any of the conditions are met), set "Routing Strategy" to "Route to 'matched' if any matches".
    If you would like to "AND" together all of the conditions (i.e., the file should only be retrieved if all of the conditions are met), set "Routing Strategy" to "Route to 'matched' if all match".
    
    For each condition that you would like to filter on, add a new property. The name of the property should describe the condition. The value of the property should be an Expression Language expression that returns `true` if the file meets the condition or `false` if the file does not meet the condition.
    
    Some attributes that you may consider filtering on are:
    - `filename` (the name of the file)
    - `s3.length` (the number of bytes in the file)
    - `s3.tag.<tag name>` (the value of the s3 tag with the name `tag name`)
    - `s3.user.metadata.<key name>` (the value of the user metadata with the key named `key name`)
    
    For example, to fetch only files that are at least 1 MB and have a filename ending in `.zip` we would set the following properties:
    - "Routing Strategy" = "Route to 'matched' if all match"
    - "At least 1 MB" = "${s3.length:ge(1000000)}"
    - "Ends in .zip" = "${filename:endsWith('.zip')}"
    
    Auto-terminate the `unmatched` Relationship.
    Connect the `matched` Relationship to the FetchS3Object processor.
    
    org.apache.nifi.processors.aws.s3.FetchS3Object
    "Bucket" = "${s3.bucket}"
    "Object Key" = "${filename}"
    
    The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.
    
    The "Region" property must be set to the same value as the "Region" property of the ListS3 Processor.
    
  • Retrieve new files as they arrive in an S3 bucket
    Description
    Retrieve new files as they arrive in an S3 bucket
    Notes
    This method of retrieving files from S3 is more efficient than using ListS3 and more cost effective. It is the pattern recommended by AWS. However, it does require that the S3 bucket be configured to place notifications on an SQS queue when new files arrive. For more information, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/ways-to-add-notification-config-to-bucket.html
    Processor Configurations
    org.apache.nifi.processors.aws.sqs.GetSQS
    The "Queue URL" must be set to the appropriate URL for the SQS queue. It is recommended that this property be parameterized, using a value such as `#{SQS_QUEUE_URL}`.
    The "Region" property must be set to denote the SQS region that the queue resides in. It's a good idea to parameterize this property by setting it to something like `#{SQS_REGION}`.
    
    The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.
    
    The 'success' relationship is connected to EvaluateJsonPath.
    
    org.apache.nifi.processors.standard.EvaluateJsonPath
    "Destination" = "flowfile-attribute"
    "s3.bucket" = "$.Records[0].s3.bucket.name"
    "filename" = "$.Records[0].s3.object.key"
    
    The 'success' relationship is connected to FetchS3Object.
    
    org.apache.nifi.processors.aws.s3.FetchS3Object
    "Bucket" = "${s3.bucket}"
    "Object Key" = "${filename}"
    
    The "Region" property must be set to the same value as the "Region" property of the GetSQS Processor.
    The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.
    
See Also