The following is the shutdown and restart sequence for a Clustered AtScale Multi-Node Environment containing Clustered AtScale Engine nodes and Query Engine nodes (if present).
Note: If there are no Query Engine nodes in the environment, those steps can be skipped over
Shut Down Order - AtScale Nodes should be shut down in the following sequence.
- Query Engine node(s) (if present)
- Engine Node running Postgres in Replica / Standby mode
- Engine Node running Postgres in Leader node
- Coordinator node
Restart Order - AtScale Nodes should be restarted in the following sequence (opposite of shut down order)
- Coordinator node
- Engine Node running Postgres in Leader role
- Engine Node running Postgres in Replica / Standby role
- Query Engine node(s) (if present)
Procedure to Shut Down and Restart all Services - Clustered AtScale
S1. Log into an ssh session on one of the Engine nodes of the clustered AtScale environment and change to the atscale user (if needed). Execute the command /opt/atscale/current/bin/database/postgres_nodes
The following is what the output will look like
$ /opt/atscale/current/bin/database/postgres_nodes
2023-01-13 14:59:59,465 - WARNING - Using atscale-postgres14-cluster as consul service name instead of scope name atscale_postgres14_cluster
+---------------------------------------------+---------------------------------------------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: atscale_postgres14_cluster (7121824910042921340) --------------------------------------+---------+---------+----+-----------+
| atscale-ha-node-01.docker.infra.atscale.com | atscale-ha-node-01.docker.infra.atscale.com:10520 | Leader | running | 3 | |
| atscale-ha-node-02.docker.infra.atscale.com | atscale-ha-node-02.docker.infra.atscale.com:10520 | Replica | running | 3 | 0 |
+---------------------------------------------+---------------------------------------------------+---------+---------+----+-----------+
S2. From the output of the /opt/atscale/current/bin/database/postgres_nodes
command, determine which AtScale host is the “Engine Node running Postgres in Leader role” and which node is the “Engine Node running Postgres in Replica / Standby role”. From the above output, the:
-
Engine Node running Postgres in Leader role =
atscale-ha-node-01.docker.infra.atscale.com
-
Engine Node running Postgres in Replica / Standby role =
atscale-ha-node-02.docker.infra.atscale.com
Record this information for later use in this procedure
The following steps in this procedure will shut down services in AtScale resulting in failure of all queries executed against AtScale and loss of access to the AtScale Design Center User Interface. Please make sure you are prepared for this behavior and if needed that an appropriate down time window and communication of users of service unavailability is communicated if needed.
S3. If Query Engine nodes are present in the environment, shut down services on all Query Engine nodes in the environment and only the Query Engine nodes. If there are no Query Engine nodes in the environment, skip this step. Execute the following command to stop all AtScale services on the Query Engine
/opt/atscale/bin/atscale_stop
This command has shut down all services on the Query Engine. Verify all services have stopped before proceeding to the next Query Engine or the next step.
If there are multiple Query Engine nodes, repeat Step S3. for each Query Engine node.
S4. Shut down all AtScale Services on the Engine Node running Postgres in Replica / Standby role. Run the following command on the Replica / Standby postgres node:
/opt/atscale/bin/atscale_stop
Verify all services have stopped before proceeding to the next step
S5. Shut down all AtScale Services on the Engine Node running Postgres in Leader role. Run the following command on the Leader postgres node:
/opt/atscale/bin/atscale_stop
Verify all services have stopped before proceeding to the next step
S6. Shut down all AtScale Services on the Coordinator node.
/opt/atscale/bin/atscale_stop
Verify all services have stopped.
At this time, all AtScale services have been stopped. If desired, additional verification that AtScale services see the procedure “How to verify all AtScale processes have stopped”. This procedure is typically only necessary in environments where a failure has occurred, such as out of disk space, etc., and / or when additional verification that all processes have stopped is needed or desired.
To restart all services, proceed to the next step
S7. Restart the Coordinator Node
/opt/atscale/bin/atscale_start
Wait 1 to 2 minutes and verify all services are operational before proceeding
/opt/atscale/bin/atscale_service_control status
$ /opt/atscale/bin/atscale_service_control status
agent RUNNING pid 650197, uptime 0:01:38
coordinator RUNNING pid 650198, uptime 0:01:38
egress RUNNING pid 650200, uptime 0:01:38
ingress RUNNING pid 650202, uptime 0:01:38
service_registry RUNNING pid 650196, uptime 0:01:38