MergeContent 2.0.0

Bundle
org.apache.nifi | nifi-standard-nar
Description
Merges a Group of FlowFiles together based on a user-defined strategy and packages them into a single FlowFile. It is recommended that the Processor be configured with only a single incoming connection, as Group of FlowFiles will not be created from FlowFiles in different connections. This processor updates the mime.type attribute as appropriate. NOTE: this processor should NOT be configured with Cron Driven for the Scheduling Strategy.
Tags
archive, concatenation, content, correlation, flowfile-stream, flowfile-stream-v3, merge, stream, tar, zip
Input Requirement
REQUIRED
Supports Sensitive Dynamic Properties
false
Properties
System Resource Considerations
Resource Description
MEMORY While content is not stored in memory, the FlowFiles' attributes are. The configuration of MergeContent (maximum bin size, maximum group size, maximum bin age, max number of entries) will influence how much memory is used. If merging together many small FlowFiles, a two-stage approach may be necessary in order to avoid excessive use of memory.
Relationships
Name Description
merged The FlowFile containing the merged content
original The FlowFiles that were used to create the bundle
failure If the bundle cannot be created, all FlowFiles that would have been used to created the bundle will be transferred to failure
Reads Attributes
Name Description
fragment.identifier Applicable only if the <Merge Strategy> property is set to Defragment. All FlowFiles with the same value for this attribute will be bundled together.
fragment.index Applicable only if the <Merge Strategy> property is set to Defragment. This attribute indicates the order in which the fragments should be assembled. This attribute must be present on all FlowFiles when using the Defragment Merge Strategy and must be a unique (i.e., unique across all FlowFiles that have the same value for the "fragment.identifier" attribute) integer between 0 and the value of the fragment.count attribute. If two or more FlowFiles have the same value for the "fragment.identifier" attribute and the same value for the "fragment.index" attribute, the first FlowFile processed will be accepted and subsequent FlowFiles will not be accepted into the Bin.
fragment.count Applicable only if the <Merge Strategy> property is set to Defragment. This attribute indicates how many FlowFiles should be expected in the given bundle. At least one FlowFile must have this attribute in the bundle. If multiple FlowFiles contain the "fragment.count" attribute in a given bundle, all must have the same value.
segment.original.filename Applicable only if the <Merge Strategy> property is set to Defragment. This attribute must be present on all FlowFiles with the same value for the fragment.identifier attribute. All FlowFiles in the same bundle must have the same value for this attribute. The value of this attribute will be used for the filename of the completed merged FlowFile.
tar.permissions Applicable only if the <Merge Format> property is set to TAR. The value of this attribute must be 3 characters; each character must be in the range 0 to 7 (inclusive) and indicates the file permissions that should be used for the FlowFile's TAR entry. If this attribute is missing or has an invalid value, the default value of 644 will be used
Writes Attributes
Name Description
filename When more than 1 file is merged, the filename comes from the segment.original.filename attribute. If that attribute does not exist in the source FlowFiles, then the filename is set to the number of nanoseconds matching system time. Then a filename extension may be applied:if Merge Format is TAR, then the filename will be appended with .tar, if Merge Format is ZIP, then the filename will be appended with .zip, if Merge Format is FlowFileStream, then the filename will be appended with .pkg
merge.count The number of FlowFiles that were merged into this bundle
merge.bin.age The age of the bin, in milliseconds, when it was merged and output. Effectively this is the greatest amount of time that any FlowFile in this bundle remained waiting in this processor before it was output
merge.uuid UUID of the merged flow file that will be added to the original flow files attributes.
merge.reason This processor allows for several thresholds to be configured for merging FlowFiles. This attribute indicates which of the Thresholds resulted in the FlowFiles being merged. For an explanation of each of the possible values and their meanings, see the Processor's Usage / documentation and see the 'Additional Details' page.
Use Cases
  • Concatenate FlowFiles with textual content together in order to create fewer, larger FlowFiles.
    Description
    Concatenate FlowFiles with textual content together in order to create fewer, larger FlowFiles.
    Keywords
    concatenate, bundle, aggregate, bin, merge, combine, smash
    Configuration
    "Merge Strategy" = "Bin Packing Algorithm"
    "Merge Format" = "Binary Concatenation"
    "Delimiter Strategy" = "Text"
    "Demarcator" = "\n" (a newline can be inserted by pressing Shift + Enter)
    "Minimum Number of Entries" = "1"
    "Maximum Number of Entries" = "500000000"
    "Minimum Group Size" = the minimum amount of data to write to an output FlowFile. A reasonable value might be "128 MB"
    "Maximum Group Size" = the maximum amount of data to write to an output FlowFile. A reasonable value might be "256 MB"
    "Max Bin Age" = the maximum amount of time to wait for incoming data before timing out and transferring the FlowFile along even though it is smaller than the Max Bin Age. A reasonable value might be "5 mins"
    
  • Concatenate FlowFiles with binary content together in order to create fewer, larger FlowFiles.
    Description
    Concatenate FlowFiles with binary content together in order to create fewer, larger FlowFiles.
    Notes
    Not all binary data can be concatenated together. Whether or not this configuration is valid depends on the type of your data.
    Keywords
    concatenate, bundle, aggregate, bin, merge, combine, smash
    Configuration
    "Merge Strategy" = "Bin Packing Algorithm"
    "Merge Format" = "Binary Concatenation"
    "Delimiter Strategy" = "Text"
    "Minimum Number of Entries" = "1"
    "Maximum Number of Entries" = "500000000"
    "Minimum Group Size" = the minimum amount of data to write to an output FlowFile. A reasonable value might be "128 MB"
    "Maximum Group Size" = the maximum amount of data to write to an output FlowFile. A reasonable value might be "256 MB"
    "Max Bin Age" = the maximum amount of time to wait for incoming data before timing out and transferring the FlowFile along even though it is smaller than the Max Bin Age. A reasonable value might be "5 mins"
    
  • Reassemble a FlowFile that was previously split apart into smaller FlowFiles by a processor such as SplitText, UnpackContext, SplitRecord, etc.
    Description
    Reassemble a FlowFile that was previously split apart into smaller FlowFiles by a processor such as SplitText, UnpackContext, SplitRecord, etc.
    Keywords
    reassemble, repack, merge, recombine
    Configuration
    "Merge Strategy" = "Defragment"
    "Merge Format" = the value of Merge Format depends on the desired output format. If the file was previously zipped together and was split apart by UnpackContent,
        a Merge Format of "ZIP" makes sense. If it was previously a .tar file, a Merge Format of "TAR" makes sense. If the data is textual, "Binary Concatenation" can be
        used to combine the text into a single document.
    "Delimiter Strategy" = "Text"
    "Max Bin Age" = the maximum amount of time to wait for incoming data before timing out and transferring the fragments to 'failure'. A reasonable value might be "5 mins"
    
    For textual data, "Demarcator" should be set to a newline (\n), set by pressing Shift+Enter in the UI. For binary data, "Demarcator" should be left blank.
    
See Also