Maintenance Plan

This document communicates the ongoing maintenance required to keep AtScale running at a high uptime by creating a repeatable schedule for a stable deployment.

This document is not definitive and must be appropriately altered for each specific environment, customer’s policies, and deployment strategies.

Initial Deployment

Topic Item Description

Done?

Infrastructure Create AtScale VPC Determine the infrastructure layout for the AtScale instance. Create a VPC if necessary—provision for VMs.  
Infrastructure Provision hardware/Virtual Machine Determine the infrastructure layout for the AtScale instance. Create and allocate VMs. Upgrade and patch OS.  
Infrastructure Network connectivity Validate connectivity from proper LANs to the VPC.  
Infrastructure Load balancer Install and configure VIPs for any clustering or SQE that is required. Multiple VIPs will likely be required for multiple inbound sources. Internal communication may be required to go through load balancers.  
Infrastructure Firewall Extend the firewall to include proper port forwarding for AtScale services.  
AtScale Install AtScale Install AtScale binaries.

 

AtScale Config TLS If necessary, configure SSL certificates on the AtScale servers. This will be organizationally determined.  
AtScale LDAP and/or SAML Connect AtScale to a directory management service, typically LDAP.  
AtScale Configure Kerberos (Optional) If KRB is leveraged in-house, issue a principle for AtScale, then configure the system.  
AtScale Establish database connectivity    
AtScale Initial configuration of cubes or projects Develop an AtScale cube using the designer.  
AtScale Set Up Telemetry Submit a support bundle for telemetry purposes. In addition, this creates a record of a properly working system to reference. Install a scheduled (cron) weekly task.  
Infrastructure InfoSec sign off Sign off on the AtScale system from a security group.  

Daily Tasks

Topic Item Description Done?
SysAdmin Hot Backup If AtScale clustered is installed, then this will happen automatically. If not, an online backup mechanism for metadata and on-disk configuration must be provided.  
SysAdmin Replication monitoring Could you validate that the replication is happening properly?  
SysAdmin Service monitoring Validate all services are up and running properly. If not a daily task, an alerting mechanism should be used to monitor the services.  
SysAdmin Log monitoring Could you monitor logs for exceptions? Exceptions should be examined and validated as OK or investigated further.  
SysAdmin Log rotation and purging On a set schedule, could you validate that the logs are rotating and being removed from the system?  
SysAdmin Life of a query Could you validate that the logs inside the metadata are being purged as needed? If using elastic logging, ensure they are being rotated and/or purged.  
AtScale Standard aggregate build On a set schedule, rebuild your aggregates after loading new data.  
AtScale Monitoring incremental aggregates If using incremental aggregates, validate that they are complete and perform within specific boundaries.  
SysAdmin Metadata performance management Validate that the postgres database has not grown too large and is operating efficiently.  
SysAdmin Infrastructure monitoring Validate that there is adequate disk space, memory, and CPU. Apply SysAdmin monitoring to AtScale nodes and services.  

Weekly Tasks

Topic Item Description Done?
SysAdmin Cold Backup Take a complete (offline) backup of the metadata database and a complete backup of the atscale directory.  
AtScale Review and Document Errors Review any outstanding issues and document any errors that were intentionally ignored. If possible, add Nuisance errors to a running list.  
AtScale Support Bundles Submit a support bundle for telemetry purposes. In addition, this creates a record of a properly working system to reference.  
AtScale Query performance management Examine any query that exceeds an acceptable threshold. Validate that the impact of the query does not cause other health issues. Investigate if it's a single user or widespread Begin modification of cube or parameters. There’s a way to prevent future issues.  

Monthly Tasks

Topic Item Description Done?
AtScale Validate Telemetry At least once a month, push telemetry into AtScale. This will allow the account teams to continue to monitor the overall growth and health for capacity planning.  
AtScale Full aggregate refresh Certain data stores will require rebalancing as you add data into the aggregates. It is a good practice to get used to rebuilding the aggregates over a set period.  
AtScale License Validation Verify that the license will not expire in the next 60 days. If it is, start a conversation with the AtScale team.  

Quarterly Tasks

Topic Item Description Done?
Management Review Maintenance Plan Review this plan and make adjustments as necessary. Validate that instructions for each task are clear and that policies or procedures have not changed. Update accordingly.  
Management Review Customer Roadmap Review overall project goals. Add new goals as needed and document completed goals.  
Management Validate Staffing Plan Validate coverage for all staffing. Each role should include a primary and a secondary individual familiar with their duties. Some roles may require 24 hr and on-call positions.  
SysAdmin Scheduled Fail Over Quarterly a failure event should be simulated.  
SysAdmin Scheduled Fail Back Typically a failback event is what ends up tripping customers up. You can be comfortable performing this in a live scenario by exercising and performing a full restore.  
SysAdmin Capacity planning Gather all the measures made throughout the quarter, including CPU, memory, users, disks, performance, etc. Project the usage over the next several quarters and ensure that hardware capacity will not exceed. If it will, then begin provisioning.  
Management Scheduled upgrades Examine AtScale new releases and determine if it is appropriate. Examine the software lifecycle policy and make certain your versions stay in support.  
Management Quantifying cost savings or functional value Could you keep a running understanding of why you are leveraging AtScale? Value can come from functional agility, performance, and cost savings. Keeping this up to date will help determine the validity of onboarding new projects within your organization.  

 

Was this article helpful?

2 out of 3 found this helpful