This document communicates the ongoing maintenance required to keep AtScale running at a high uptime by creating a repeatable schedule for a stable deployment.
This document is not definitive and must be appropriately altered for each specific environment, customer’s policies, and deployment strategies.
Initial Deployment
Topic | Item | Description |
Done? |
Infrastructure | Create AtScale VPC | Determine the infrastructure layout for the AtScale instance. Create a VPC if necessary—provision for VMs. | |
Infrastructure | Provision hardware/Virtual Machine | Determine the infrastructure layout for the AtScale instance. Create and allocate VMs. Upgrade and patch OS. | |
Infrastructure | Network connectivity | Validate connectivity from proper LANs to the VPC. | |
Infrastructure | Load balancer | Install and configure VIPs for any clustering or SQE that is required. Multiple VIPs will likely be required for multiple inbound sources. Internal communication may be required to go through load balancers. | |
Infrastructure | Firewall | Extend the firewall to include proper port forwarding for AtScale services. | |
AtScale | Install AtScale | Install AtScale binaries. |
|
AtScale | Config TLS | If necessary, configure SSL certificates on the AtScale servers. This will be organizationally determined. | |
AtScale | LDAP and/or SAML | Connect AtScale to a directory management service, typically LDAP. | |
AtScale | Configure Kerberos (Optional) | If KRB is leveraged in-house, issue a principle for AtScale, then configure the system. | |
AtScale | Establish database connectivity | ||
AtScale | Initial configuration of cubes or projects | Develop an AtScale cube using the designer. | |
AtScale | Set Up Telemetry | Submit a support bundle for telemetry purposes. In addition, this creates a record of a properly working system to reference. Install a scheduled (cron) weekly task. | |
Infrastructure | InfoSec sign off | Sign off on the AtScale system from a security group. |
Daily Tasks
Topic | Item | Description | Done? |
SysAdmin | Hot Backup | If AtScale clustered is installed, then this will happen automatically. If not, an online backup mechanism for metadata and on-disk configuration must be provided. | |
SysAdmin | Replication monitoring | Could you validate that the replication is happening properly? | |
SysAdmin | Service monitoring | Validate all services are up and running properly. If not a daily task, an alerting mechanism should be used to monitor the services. | |
SysAdmin | Log monitoring | Could you monitor logs for exceptions? Exceptions should be examined and validated as OK or investigated further. | |
SysAdmin | Log rotation and purging | On a set schedule, could you validate that the logs are rotating and being removed from the system? | |
SysAdmin | Life of a query | Could you validate that the logs inside the metadata are being purged as needed? If using elastic logging, ensure they are being rotated and/or purged. | |
AtScale | Standard aggregate build | On a set schedule, rebuild your aggregates after loading new data. | |
AtScale | Monitoring incremental aggregates | If using incremental aggregates, validate that they are complete and perform within specific boundaries. | |
SysAdmin | Metadata performance management | Validate that the postgres database has not grown too large and is operating efficiently. | |
SysAdmin | Infrastructure monitoring | Validate that there is adequate disk space, memory, and CPU. Apply SysAdmin monitoring to AtScale nodes and services. |
Weekly Tasks
Topic | Item | Description | Done? |
SysAdmin | Cold Backup | Take a complete (offline) backup of the metadata database and a complete backup of the atscale directory. | |
AtScale | Review and Document Errors | Review any outstanding issues and document any errors that were intentionally ignored. If possible, add Nuisance errors to a running list. | |
AtScale | Support Bundles | Submit a support bundle for telemetry purposes. In addition, this creates a record of a properly working system to reference. | |
AtScale | Query performance management | Examine any query that exceeds an acceptable threshold. Validate that the impact of the query does not cause other health issues. Investigate if it's a single user or widespread Begin modification of cube or parameters. There’s a way to prevent future issues. |
Monthly Tasks
Topic | Item | Description | Done? |
AtScale | Validate Telemetry | At least once a month, push telemetry into AtScale. This will allow the account teams to continue to monitor the overall growth and health for capacity planning. | |
AtScale | Full aggregate refresh | Certain data stores will require rebalancing as you add data into the aggregates. It is a good practice to get used to rebuilding the aggregates over a set period. | |
AtScale | License Validation | Verify that the license will not expire in the next 60 days. If it is, start a conversation with the AtScale team. |
Quarterly Tasks
Topic | Item | Description | Done? |
Management | Review Maintenance Plan | Review this plan and make adjustments as necessary. Validate that instructions for each task are clear and that policies or procedures have not changed. Update accordingly. | |
Management | Review Customer Roadmap | Review overall project goals. Add new goals as needed and document completed goals. | |
Management | Validate Staffing Plan | Validate coverage for all staffing. Each role should include a primary and a secondary individual familiar with their duties. Some roles may require 24 hr and on-call positions. | |
SysAdmin | Scheduled Fail Over | Quarterly a failure event should be simulated. | |
SysAdmin | Scheduled Fail Back | Typically a failback event is what ends up tripping customers up. You can be comfortable performing this in a live scenario by exercising and performing a full restore. | |
SysAdmin | Capacity planning | Gather all the measures made throughout the quarter, including CPU, memory, users, disks, performance, etc. Project the usage over the next several quarters and ensure that hardware capacity will not exceed. If it will, then begin provisioning. | |
Management | Scheduled upgrades | Examine AtScale new releases and determine if it is appropriate. Examine the software lifecycle policy and make certain your versions stay in support. | |
Management | Quantifying cost savings or functional value | Could you keep a running understanding of why you are leveraging AtScale? Value can come from functional agility, performance, and cost savings. Keeping this up to date will help determine the validity of onboarding new projects within your organization. |