Tips to Reduce Aggregate Rebuild Time to Meet the SLAs

 

Information to Collect

  • Check current customer aggregate lifecycle settings in  Summary Zip -->  engine --> aggregates -->config.json file.
  • If the config.json file is empty, then the default settings are used. The default setting can be found in Summary Zip -->  engine --> config.yaml file.
  • Suggest the customer execute the select statement above the 'Aggregates Usage' section and get a count of active aggregates and their last accessed time.
  • Grab a batchId for the latest batch rebuild of the cube in question from batches.json and grep the count of batchID in the instances.json file. This indicates the number of aggregates being built for a particular cube.

 

Example

 grep <batchId> instances.json | awk -F ':' '{print $2}' |awk -F ',' '{system("grep " $1 " statuses.json")}' | grep -v replaced | grep -c active

  OR

 grep -c <batchId> instances.json

 

Root Cause

A high Active aggregate count might be caused because of either:

  1. The maximum_system_generated_target has been set to a higher value than default 100, and the active aggregate count whose last accessed time is recent is lower than that.
  2. The maximum_concurrent_materializations is set to a very low value. 

 

Resolution

One of the below suggestions might help:

 

1) If, as per the CSV file uploaded, there are a large number of aggregates whose utilization count is low and last accessed time is way old, then:

  • Check if the total count of active aggregates is less than or equal to maximum_system_generated_target.
  • Check the value set for maximum_system_generated_allowance (default is 10).
  • Edit maximum_system_generated_target to a lower value such that the sum of (maximum_system_generated_target  +  maximum_system_generated_allowance ) covers the number of active aggregates whose utilization count is high and lastaccessedtime is recent.

2) If you feel the Hadoop cluster has lots of spare capacity, increase the maximum_concurrent_materializations settings parameter.               

 

Note:  The parameter "Maximum Concurrent Aggregates Builds" doesn't necessarily speed up your aggregation process. Hadoop already tries to parallelize to a certain degree. Adding more concurrency on top of it might not help with aggregate rebuild time. 

Also, if the aggregate tables are large, you need more cluster resources on a small Hadoop cluster then setting maximum_concurrent_materializations to a lower value will help with aggregate rebuild failures, and there is a better chance of finishing the aggregate rebuild quickly as each aggregate gets more resources to process data.

Follow the below steps to edit maximum_system_generated_target or  maximum_concurrent_materializations parameters:

  • Log into Design Center UI as an admin user
  • Navigate to Aggregates screen ---> Settings dropdown ---> Click on 'Aggregate Settings' --> Scroll down to the bottom --> Edit the setting.
  • Edit the desired setting and click Save.

Was this article helpful?

0 out of 0 found this helpful