QueryDatabaseTableRecord 2.1.0

Bundle
org.apache.nifi | nifi-standard-nar
Description
Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to the format specified by the record writer. Expression Language is supported for several properties, but no incoming connections are permitted. The Environment/System properties may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for this purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected.
Tags
database, jdbc, query, record, select, sql
Input Requirement
FORBIDDEN
Supports Sensitive Dynamic Properties
false
Properties
Dynamic Properties
State Management
Scopes Description
CLUSTER After performing a query on the specified table, the maximum values for the specified column(s) will be retained for use in future executions of the query. This allows the Processor to fetch only those records that have max values greater than the retained values. This can be used for incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor per the State Management documentation
Relationships
Name Description
success Successfully created FlowFile from SQL query result set.
Writes Attributes
Name Description
tablename Name of the table being queried
querydbtable.row.count The number of rows selected by the query
fragment.identifier If 'Max Rows Per Flow File' is set then all FlowFiles from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results.
fragment.count If 'Max Rows Per Flow File' is set then this is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet. If Output Batch Size is set, then this attribute will not be populated.
fragment.index If 'Max Rows Per Flow File' is set then the position of this FlowFile in the list of outgoing FlowFiles that were all derived from the same result set FlowFile. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles originated from the same query result set and in what order FlowFiles were produced
maxvalue.* Each attribute contains the observed maximum value of a specified 'Maximum-value Column'. The suffix of the attribute is the name of the column. If Output Batch Size is set, then this attribute will not be populated.
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer.
record.count The number of records output by the Record Writer.
Use Cases
  • Retrieve all rows from a database table.
    Description
    Retrieve all rows from a database table.
    Keywords
    jdbc, rdbms, cdc, database, table, stream
    Configuration
    Configure the "Database Connection Pooling Service" to specify a Connection Pooling Service so that the Processor knows how to connect to the database.
    Set the "Database Type" property to the type of database to query, or "Generic" if the database vendor is not listed.
    Set the "Table Name" property to the name of the table to retrieve records from.
    Configure the "Record Writer" to specify a Record Writer that is appropriate for the desired output format.
    Set the "Maximum-value Columns" property to a comma-separated list of columns whose values can be used to determine which values are new. For example, this might be set to
        an `id` column that is a one-up number, or a `last_modified` column that is a timestamp of when the row was last modified.
    Set the "Initial Load Strategy" property to "Start at Beginning".
    Set the "Fetch Size" to a number that avoids loading too much data into memory on the NiFi side. For example, a value of `1000` will load up to 1,000 rows of data.
    Set the "Max Rows Per Flow File" to a value that allows efficient processing, such as `1000` or `10000`.
    Set the "Output Batch Size" property to a value greater than `0`. A smaller value, such as `1` or even `20` will result in lower latency but also slightly lower throughput.
        A larger value such as `1000` will result in higher throughput but also higher latency. It is not recommended to set the value larger than `1000` as it can cause significant
        memory utilization.
    
  • Perform an incremental load of a single database table, fetching only new rows as they are added to the table.
    Description
    Perform an incremental load of a single database table, fetching only new rows as they are added to the table.
    Keywords
    incremental load, rdbms, jdbc, cdc, database, table, stream
    Configuration
    Configure the "Database Connection Pooling Service" to specify a Connection Pooling Service so that the Processor knows how to connect to the database.
    Set the "Database Type" property to the type of database to query, or "Generic" if the database vendor is not listed.
    Set the "Table Name" property to the name of the table to retrieve records from.
    Configure the "Record Writer" to specify a Record Writer that is appropriate for the desired output format.
    Set the "Maximum-value Columns" property to a comma-separated list of columns whose values can be used to determine which values are new. For example, this might be set to
        an `id` column that is a one-up number, or a `last_modified` column that is a timestamp of when the row was last modified.
    Set the "Initial Load Strategy" property to "Start at Current Maximum Values".
    Set the "Fetch Size" to a number that avoids loading too much data into memory on the NiFi side. For example, a value of `1000` will load up to 1,000 rows of data.
    Set the "Max Rows Per Flow File" to a value that allows efficient processing, such as `1000` or `10000`.
    Set the "Output Batch Size" property to a value greater than `0`. A smaller value, such as `1` or even `20` will result in lower latency but also slightly lower throughput.
        A larger value such as `1000` will result in higher throughput but also higher latency. It is not recommended to set the value larger than `1000` as it can cause significant
        memory utilization.
    
Use Cases Involving Other Components
  • Perform an incremental load of multiple database tables, fetching only new rows as they are added to the tables.
    Description
    Perform an incremental load of multiple database tables, fetching only new rows as they are added to the tables.
    Keywords
    incremental load, rdbms, jdbc, cdc, database, table, stream
    Processor Configurations
    org.apache.nifi.processors.standard.ListDatabaseTables
    Configure the "Database Connection Pooling Service" property to specify a Connection Pool that is applicable for interacting with your database.
    
    Set the "Catalog" property to the name of the database Catalog;
    set the "Schema Pattern" property to a Java Regular Expression that matches all database Schemas that should be included; and
    set the "Table Name Pattern" property to a Java Regular Expression that matches the names of all tables that should be included.
    In order to perform an incremental load of all tables, leave the Catalog, Schema Pattern, and Table Name Pattern unset.
    
    Leave the RecordWriter property unset.
    
    Connect the 'success' relationship to QueryDatabaseTableRecord.
    
    org.apache.nifi.processors.standard.QueryDatabaseTableRecord
    Configure the "Database Connection Pooling Service" to the same Connection Pool that was used in ListDatabaseTables.
    Set the "Database Type" property to the type of database to query, or "Generic" if the database vendor is not listed.
    Set the "Table Name" property to "${db.table.fullname}"
    Configure the "Record Writer" to specify a Record Writer that is appropriate for the desired output format.
    Set the "Maximum-value Columns" property to a comma-separated list of columns whose values can be used to determine which values are new. For example, this might be set to
        an `id` column that is a one-up number, or a `last_modified` column that is a timestamp of when the row was last modified.
    Set the "Initial Load Strategy" property to "Start at Current Maximum Values".
    Set the "Fetch Size" to a number that avoids loading too much data into memory on the NiFi side. For example, a value of `1000` will load up to 1,000 rows of data.
    Set the "Max Rows Per Flow File" to a value that allows efficient processing, such as `1000` or `10000`.
    Set the "Output Batch Size" property to a value greater than `0`. A smaller value, such as `1` or even `20` will result in lower latency but also slightly lower throughput.
        A larger value such as `1000` will result in higher throughput but also higher latency. It is not recommended to set the value larger than `1000` as it can cause significant
        memory utilization.
    
See Also