This document describes how to scale out an AtScale single instance by moving the AtScale metadata database into its own instance.
The reason for doing this is to minimize query performance degradation due to heavy query history in the database (postgres).
Pre-requisites
- AtScale 2021.x or later version
- 2-cluster node (EC2/VM within the same subnet)
- Same hardware preferred (16 CPU & 64G RAM)
Here is the yaml file to split the component into two hosts.
atscale.yaml
engine:
memory: 31G
installation_location: "/opt/atscale"
service_account: "atscale"
loadbalancer_dns_name: "atscaleengine.us-west1-b.c.atscale-sales-demo.internal"
tls:
enabled: false
certificate: "/opt/atscale/conf/server.cert"
key: "/opt/atscale/conf/server.key"
kerberos:
enabled: false
keytab: "/opt/atscale/conf/atscale.keytab"
principal: "atscale/atscaleengine.us-west1-b.c.atscale-sales-demo.internal@REALM"
hosts:
- dnsname: database.us-west1-b.c.atscale-sales-demo.internal
services:
- database
- coordinator
override:
coordinator:
id: 10
- dnsname: atscaleengine.us-west1-b.c.atscale-sales-demo.internal
services:
- engine
- modeler
- directory
- egress
- agent
- ingres
- service_registry
- gov_enforce
- gov_rules
- servicecontrol
Postgres enhancement can be made by modifying the patroni.yml.tpl before you run configuration.sh
/opt/atscale/version/<version>/bin/configurator/roles/database/templates/patroni.yml.tpl
shared_buffers: 16GB
effective_cache_size: 48GB
maintenance_work_mem: 2GB
checkpoint_completion_target: 0.9
wal_buffers: 16MB
default_statistics_target: 100
random_page_cost: 4
effective_io_concurrency: 2
work_mem: 13981kB
min_wal_size: 2GB
max_wal_size: 8GB
max_worker_processes: 16
max_parallel_workers_per_gather: 4
hot_standby: "on"
checkpoint_timeout: 30
Note: the assumption is that the host has 16 CPU and 64G RAM
Configuration
Make sure atscale.yaml is identical on both hosts.
The first database instance must be up and running.
run configuration.sh --first-time
At the database host
Once the database is complete with the configuration, run the second configuration at the engine host with the configuration.sh --activate
You can verify that the hosts are starting to communicate with each other by going to the consul URL http://<host>:10555
Note: You can ignore other warnings like egress and auth because you are running with one engine core component and the consul is expecting to have > 1 host for the authentication and messaging.
Reason
We split it into two hosts to allow Postgres to serve the metadata to AtScale without having to wait for CPU IO and Disk IO during peak hours. This works well for customers with heavy workloads and high-volume transactions (queries).
You will immediately notice the performance gain from UI operations like reviewing queries or aggregates and large models. You will also see improved performance during high concurrency/workload at peak hours and reduced latency with queries.
Why not split the engine and modeler into multiple hosts? The answer is you can split each of the components into three hosts. However, doing this will require a Load Balancer (LB) or 4th host for the proxy.
Scenario 1
Scenario 2
In theory, you could add complexity for a single node environment and split like the diagrams above, but you won’t get any extra benefits from Scenario 2, and it is not recommended.