How to Collect Aggregate Usage Details (# of hits, last-accessed-time, etc.) When the AtScale Engine Considers Aggregate as Unused

There is no simple rule to delete the aggregates based on utilization, last accessed time, etc.  Instead, unused aggregates are deleted based on usage patterns across the cube.  And the engine marks the unused aggregates for deletion if they exceed a threshold (maximum_system_generated_target by default is set to 100 aggregates).

You can either set a lower value for the maximum_system_generated_target setting or manually delete unused aggregates.   And when the number of active aggregates exceeds the Desired Number of System Aggregates property, the engine, based on an internal algorithm, will delete the aggregates which it considers the least used.
 
Also, the last accessed time may not suffice in deciding aggregate usage. For example, when comparing an aggregate used every Monday morning and an aggregate used only once this most recent Tuesday, the one used every Monday morning is vastly better.  
 
The below select statement generates utilization count (only reflects query hits), the last access time of the aggregate, and is specific to a PROD environment.
 

SELECT
   max(q.received) lastaccessedtime, ad.id aggdefId , count(*) utilization , (CASE WHEN ad.user_defined_id is null  THEN 'SystemGeneratedAgg' ELSE  'UDA->'||ad.user_defined_name END) aggtype
FROM
   atscale.aggregate_definitions ad
JOIN
   atscale.query_datasets qd
ON
   CAST(ad.id AS text) = qd.dataset_id
JOIN
   atscale.queries q
ON
   qd.query_id = q.query_id and q.service = 'user-query'
WHERE
   ad.id in (SELECT ai.definition_id From atscale.aggregate_instances ai, atscale.aggregate_instance_statuses ast where  ai.id = ast.owner_id and ast.status = 'active' and ast.replaced_at is null)
   and ad.environment_id = 'prod'
Group By
aggdefId , aggtype
ORDER BY
   lastaccessedtime;


If you want to find out the aggregates usage for a particular project or a cube, then the below query can be used:
 

SELECT
   max(q.received) lastaccessedtime, ad.id aggdefId , count(*) utilization , (CASE WHEN ad.user_defined_id is null  THEN 'SystemGeneratedAgg' ELSE  'UDA->'||ad.user_defined_name END) aggtype,
   ad.project_id ProjectId, ad.plan_json->'cube'->>'id' as cubeid, ad.environment_id EnvId
FROM
   atscale.aggregate_definitions ad
JOIN
   atscale.query_datasets qd
ON
   CAST(ad.id AS text) = qd.dataset_id
JOIN
   atscale.queries q
ON
   qd.query_id = q.query_id and q.service = 'user-query'
WHERE
   ad.id in (SELECT ai.definition_id From atscale.aggregate_instances ai, atscale.aggregate_instance_statuses ast where  ai.id = ast.owner_id and ast.status = 'active' and ast.replaced_at is null)
     and ad.project_id = '<ProjectID>' and ad.plan_json->'cube'->>'id' = '<CubeID>'
Group By
aggdefId
ORDER BY
   Lastaccessedtime;


Note:- Substitute <ProjectID> and <CubeID> with appropriate values. 
The SQL can be executed from a SQL editor or  save the SQL in a file and execute the select statement on an AtScale node using the command below, which will redirect the output to a file:

POSTGRES_PATH="/opt/atscale/versions/<version>/pkg/postgresql"
$POSTGRES_PATH/bin/psql -h localhost -p 10520 -d atscale -U atscale -f /tmp/pg.sql >./aggutltime.txt

For example,

[root@atscale-ha-node-01 /]# su atscaler
[atscaler@atscale-ha-node-01 /]$ POSTGRES_PATH="/opt/atscale/versions/2020.3.1.2003/pkg/postgresql"
[atscaler@atscale-ha-node-01 /]$ export LD_LIBRARY_PATH=$POSTGRES_PATH/lib
$POSTGRES_PATH/bin/psql -h localhost -p 10520 -d atscale -U atscale -f /tmp/pg.sql >./aggutltime.txt


Also, last accessed time can be searched through AtScale UI.
How to Collect Aggregate Usage Details (# of hits, last-accessed-time, etc.) When the AtScale Engine Considers Aggregate as Unused.jpeg

And the support bundle contains the usage details as well.

<bundle>/engine/aggregates/aggregateUtilizationStats.json

Was this article helpful?

0 out of 0 found this helpful