DeduplicateRecord 2.0.0

Bundle
org.apache.nifi | nifi-standard-nar
Description
This processor de-duplicates individual records within a record set. It can operate on a per-file basis using an in-memory hashset or bloom filter. When configured with a distributed map cache, it de-duplicates records across multiple files.
Tags
change, dedupe, distinct, dupe, duplicate, filter, hash, modify, record, replace, text, unique, update
Input Requirement
REQUIRED
Supports Sensitive Dynamic Properties
false
Properties
Dynamic Properties
System Resource Considerations
Resource Description
MEMORY The HashSet filter type will grow memory space proportionate to the number of unique records processed. The BloomFilter type will use constant memory regardless of the number of records processed.
CPU If a more advanced hash algorithm is chosen, the amount of time required to hash any particular record could increase substantially.
Relationships
Name Description
failure If unable to communicate with the cache, the FlowFile will be penalized and routed to this relationship
duplicate Records detected as duplicates are routed to this relationship.
non-duplicate Records not found in the cache are routed to this relationship.
original The original input FlowFile is sent to this relationship unless a fatal error occurs.
Writes Attributes
Name Description
record.count Number of records written to the destination FlowFile.
See Also