How to Leverage Partition Pruning Functionality in Hive When Incrementally Building Aggregates

AtScale updates the system aggregates in two ways.  One way is a full rebuild, where aggregates are built by running a query against base tables. The base data is the underlying facts and dimensions.  The alternative to full rebuilds is to utilize incremental refreshes and append to existing aggregates using an incremental indicator column to identify the newly created records.   To optimize this process, one can leverage partition pruning in Hive - however, to leverage the partition pruning, the underlying data schema must have the incremental indicator key specified as a partition.

As an example, assume the name of the fact table is sales_transactions.  There is an incremental indicator column called incr_ind. For the Hive engine to leverage the partitions, the table must be specified to have a partition key in incr_ind.

Note: Hadoop provides limited query optimizations, but one such query optimization is partition pruning, where the SQL query has the partition key specified.  In this case, the database will only query the specific partitions, thereby improving the query's runtime by not performing full table scans.

Was this article helpful?

0 out of 0 found this helpful

How to Leverage Partition Pruning Functionality in Hive When Incrementally Building Aggregates

Was this article helpful?

<%= heading %>

<% if (block.html_url) { %> <%= block.name %> <% } else { %> <%= block.name %> <% } %>

Can't find what you're looking for?

PREVIOUS ARTICLE

NEXT ARTICLE

In this article

Toggle navigation menu

Toggle navigation menu

<%= category.name %>

Categories