Introduction
This document presents an overall first look at properly creating an AtScale Cube, adding best practice recommendations to common modeling routines.
Naming Conventions
- Display Name is what will show in the BI tool.
- Query Name is the unique technical name.
- Always try to keep them similar for easier debugging.
- When using the same field on different Hierarchies, it is recommended to keep the same query name on both Hierarchies levels by duplicating it.
- Avoid spaces (on first and last position) and special characters on dimension and secondary attribute names.
- Always add “QDS” at the start of a fact or dimension from a SQL Data Source (e.g., “QDS Sales Fact”) to quickly identify them.
Measure Aggregation Types
When creating a measure, you must select an aggregation type. Available types are shown below:
- If more than one type is needed for a measure, there must be one measure for each aggregation type.
- Always add to the measure text the type used (e.g., “Sales (SUM)”) so that the BI user understands how the measure is being calculated.
Please note that AtScale will add which aggregate has been used before the measure name, as shown in the picture above, but it will not show in the BI tool.
- Distinct Count type limits AtScale’s aggregation engine
- Distinct Count Estimate (DCE) enables AtScale’s aggregation engine but is an approximation. As a best practice, always try to use the DCE unless the business needs the exact count value for analysis.
- Don’t change the aggregation type in the BI Tool, as AtScale will have already calculated the measure up to the lowest granularity level of the BI query request, and changing it will create partial aggregations using different types.
Attributes on BI Tool
Fact and Dimension attributes will only be available in the BI tool if created as a dimension hierarchy, a dimension secondary attribute, or a fact degenerate dimension.
To create a degenerate dimension in the fact table, drag and drop the field into the Dimension tab but please be aware that as a best practice, the fact table should have only key fields and measures.
Keep the secondary attribute list at the minimum business requirements for a better user experience. AtScale’s Perspective is a great feature to create customized views for each business need, displaying only relevant measures, hierarchies, dimensions, and secondary attributes for a given user group.
Query Data Source (QDS)
When adding a new Fact or Dimension in AtScale, the Data Source can be a table or a query, as shown below.
- It is recommended to avoid using QDS whenever possible, as it will limit the cube's performance. QDS calculations (example above) should be handled as calculated columns (same use case example below) instead of QDS as best practice.
-
QDS can be a great temporary solution in more complex modeling scenarios like table union and filtering and should be materialized into a table in the Data Warehouse as best practice. The example below shows a fact table with the filter “CURRENT_IND = Y” that cannot be modeled in the AtScale semantic layer. There are two best practice options for this scenario:
- If fact rows with “CURRENT_IND” different than “Y” are needed for business requirements, the fact should be changed to source type ‘Table’ on the ‘f’ table, and the ‘d’ table should become a dimension with the “current_ind” secondary attribute available for filtering in the BI Tool according to business requirements.
- If fact rows with “CURRENT_IND” different than “Y” are NOT needed for business requirements, the best practice would be to materialize the SQL statement below in the data warehouse and use the new table as a fact source.
Join Types
AtScale relationships are always created as inner joins by default, but unmatched fact or dimension values can be added as needed.
- To create a left outer join from the fact to the dimension, edit the hierarchy level and check the “Custom empty member” option, adding a key value and a description for the dimension values that have no match with records from the fact table (using the according data type) as shown below:
- To create a right outer join from the fact to the dimension, edit the hierarchy and turn off the “Exclude values with no fact data” switch as shown below. This option will always be on by default.
- To create a full outer join: set up the “Empty custom member” and turn in off “Exclude values with no fact data” options.
Modeling a Union
To create a fact table from a table union, add the SQL statement as the ‘SQL Data Source’ (QDS), but please be aware that as a best practice, the statement should be materialized into a table in the Data Warehouse.
Alternatively, the union could be created as a multi-fact model if both facts relate equally to common dimension tables. This is the better approach as no additional ETL tables will be needed, but it is limited to the dimension’s relationship with both facts (more detail in the multi-fact topic).
Common Dimension / Bridge Object
Information from separate facts can be combined by having Common/Bridge dimensions creating common key fields that relate to all facts.
Common Dimensions can be created as a Degenerate Dimension (drag and drop the key field from a fact table into the dimension tab) if no dimension table is available with the common key field and additional needed attributes.
The example below shows the common and degenerate common dimensions bridging both facts. The main difference between them is that the common dimension will add secondary attributes from a dimension table on the relationship key field granularity level, and the degenerate common dimension will make both fact measures relate at the key field level.
Multi-Fact
When creating a model with more than one fact table, the best practice is to ensure all dimensions are common to all facts to avoid errors from BI users unaware of how the cube was modeled and what information can relate.
The multi-fact model example below has two major issues:
- There are two date dimensions (one for each fact), meaning there is no common date that measures from both facts can use. The date dimension should always be a standard common dimension relating to every fact, and if additional secondary attributes are needed, they should be embedded in the common Date dimension.
- Dimensions ‘dim Resource’ and ‘dim Service Client’ will return an error if crossed with any attributes from the ‘fact 2’ table, as they don't have a direct relationship.
- The best practice is to have all dimensions common to all facts and to bring a calendar dimension joined with both facts and have specific attributes from separate tables embedded to it as shown below:
Date/Time Dimensions
When creating or reviewing a Date/Time Dimension, be aware of different settings. First, make sure the dimension type is set to “Time Dimension,” as shown below:
Another important detail is to confirm that the time unit is correctly set on each hierarchy level:
If different date hierarchies are needed, the best practice is to create separate hierarchies for each in a single Date Dimension and create one relationship to each hierarchy leaf level. The hierarchy and secondary attributes will not show in the BI Tool if the relationship isn't created.
Snapshots
Keep snapshots of working versions of the Project to maintain unsaved work. This will create the ability to roll back to a previous project version on a given date.
Training Mode
When publishing and testing a new project, it is a best practice to navigate the BI Dashboard to hint AtScale what are the important information used by that given use case, training the AtScale engine on.
How to efficiently create aggregates to improve the performance of a given cube. To turn on training mode, go to the dropdown user profile menu (top right AtScale page), select ‘PROFILE’, and enable it for as long as necessary (the training mode option will be available just under the profile menu as shown below).
Reviewing Query Performance
An important part of testing a cube is reviewing the query performance. To understand how to navigate the ‘QUERIES’ tab, let's go through the details of the example below:
Anytime the BI tool requests information from AtScale, it will create a new line at the ‘QUERIES’ tab, identifying the Cube Name and Project Name and the following detailed information:
Status: whether or not inbound and outbound queries were successful. Failed queries will display the error message just under the query header, as shown below:
Start Time, full Duration of the query, and the User logged in to the BI Tool.
And the last two columns will show the Optimization, which shows whether or not the query used an aggregate (‘aggs” if it did, “N/A” if not) and the Attributes and Measures requested by the BI Tool (‘Month1’, ‘Month2’ attributes and ‘Customer service level %_2’ measure in the example below)
Opening the detail of the query (arrow just left of the cube icon), four new lines will show up:
Inbound: the query received by AtScale from the BI Tool and the time this step took
Query Planning: how long AtScale took to plan the outbound query based on the inbound request
Outbound Query: show the breakdown of what has been sent back to the BI tool by AtScale. This is a good resource to debug why information is not being correctly displayed in the BI Tool, as it will provide insight into how AtScale sent all detail.
It can also be observed which parts of the query are using aggregates (blue highlight)
And finally the time for Result Processing completing the query duration breakdown
All those details should be observed to understand if the cube is taking full advantage of the aggregation engine and might suggest needing a modeling review or specific aggregation types needs.
AtScale Aggregate Types
AtScale accelerates analytics query performance using an engine that runs a compression coefficient on every BI query to manage aggregated views materialization in the data warehouse according to increased read times scores estimates. There are four types of Aggregates:
- Demand-defined Aggregates (Autonomous): These are standard aggregates automatically generated by the AtScale engine whenever users query data from a BI tool.
- Prediction-defined Aggregates (Autonomous): A statistical estimate of the aggregates most likely to improve performance. This can be used as a starting point for accelerating performance but is usually not recommended as best practice.
- Hinted Aggregates (Human Assisted): QDS Facts can use hinted aggregates to improve performance by creating a generic aggregate based on the key fields. Please note that demand defined aggregates on table data source facts will improve performance over a QDS-hinted aggregate fact. Additionally, only select hinted aggregation once the project is finished to avoid costly resource consumption.
- User-Defined Aggregates (Human Assisted): This option lets the user manually choose the aggregation parameters. Not recommended, it should only be used when working with distinct counts or very specific high usage queries just short of the engine compression coefficient threshold.