There is no simple rule to delete the aggregates based on utilization, last accessed time, etc. Instead, unused aggregates are deleted based on usage patterns across the cube. And the engine marks the unused aggregates for deletion if they exceed a threshold (maximum_system_generated_target by default is set to 100 aggregates).
You can either set a lower value for the maximum_system_generated_target setting or manually delete unused aggregates. And when the number of active aggregates exceeds the Desired Number of System Aggregates property, the engine, based on an internal algorithm, will delete the aggregates which it considers the least used.
Also, the last accessed time may not suffice in deciding aggregate usage. For example, when comparing an aggregate used every Monday morning and an aggregate used only once this most recent Tuesday, the one used every Monday morning is vastly better.
The below select statement generates utilization count (only reflects query hits), the last access time of the aggregate, and is specific to a PROD environment.
SELECT max(q.received) lastaccessedtime, ad.id aggdefId , count(*) utilization , (CASE WHEN ad.user_defined_id is null THEN 'SystemGeneratedAgg' ELSE 'UDA->'||ad.user_defined_name END) aggtype FROM atscale.aggregate_definitions ad JOIN atscale.query_datasets qd ON CAST(ad.id AS text) = qd.dataset_id JOIN atscale.queries q ON qd.query_id = q.query_id and q.service = 'user-query' WHERE ad.id in (SELECT ai.definition_id From atscale.aggregate_instances ai, atscale.aggregate_instance_statuses ast where ai.id = ast.owner_id and ast.status = 'active' and ast.replaced_at is null) and ad.environment_id = 'prod' Group By aggdefId , aggtype ORDER BY lastaccessedtime;
If you want to find out the aggregates usage for a particular project or a cube, then the below query can be used:
SELECT max(q.received) lastaccessedtime, ad.id aggdefId , count(*) utilization , (CASE WHEN ad.user_defined_id is null THEN 'SystemGeneratedAgg' ELSE 'UDA->'||ad.user_defined_name END) aggtype, ad.project_id ProjectId, ad.plan_json->'cube'->>'id' as cubeid, ad.environment_id EnvId FROM atscale.aggregate_definitions ad JOIN atscale.query_datasets qd ON CAST(ad.id AS text) = qd.dataset_id JOIN atscale.queries q ON qd.query_id = q.query_id and q.service = 'user-query' WHERE ad.id in (SELECT ai.definition_id From atscale.aggregate_instances ai, atscale.aggregate_instance_statuses ast where ai.id = ast.owner_id and ast.status = 'active' and ast.replaced_at is null) and ad.project_id = '<ProjectID>' and ad.plan_json->'cube'->>'id' = '<CubeID>' Group By aggdefId ORDER BY Lastaccessedtime;
Note:- Substitute <ProjectID> and <CubeID> with appropriate values.
The SQL can be executed from a SQL editor or save the SQL in a file and execute the select statement on an AtScale node using the command below, which will redirect the output to a file:
POSTGRES_PATH="/opt/atscale/versions/<version>/pkg/postgresql" $POSTGRES_PATH/bin/psql -h localhost -p 10520 -d atscale -U atscale -f /tmp/pg.sql >./aggutltime.txt For example, [root@atscale-ha-node-01 /]# su atscaler [atscaler@atscale-ha-node-01 /]$ POSTGRES_PATH="/opt/atscale/versions/2020.3.1.2003/pkg/postgresql" [atscaler@atscale-ha-node-01 /]$ export LD_LIBRARY_PATH=$POSTGRES_PATH/lib $POSTGRES_PATH/bin/psql -h localhost -p 10520 -d atscale -U atscale -f /tmp/pg.sql >./aggutltime.txt
Also, last accessed time can be searched through AtScale UI.
And the support bundle contains the usage details as well.
<bundle>/engine/aggregates/aggregateUtilizationStats.json