ARTICLE NOT POLISHEDThis articles describes the steps to take when a cluster is unable to discover its nodes. Steps in this articles assume Mongo is hosted as an add-on on Nex!™ PaaS but could in essence be applied to any Mongo hosting.
1 - Context
It may happen upon restarting a mongo addon that the replicaSet is lost - meaning that all hosts in the replicaSet are not valid anymore (because of IP/host change) therefore meaning that the cluster is stale.doing multiple restarts that a Mongo addon replica set becomes stale. In that very specific case all nodes will recognise there is a replica set but each node will have a list of outdated members in the replica set. In this case the cluster becomes stale with no node actually taking leadership for cluster.
In that kind of situation you will likely observe the following error when querying the replica set through the mongo shell:
Code Block | ||
---|---|---|
| ||
> rs.status()
{
"startupStatus" : 1,
"ok" : 0,
"errmsg" : "loading local.system.replset config (LOADINGCONFIG)"
} |
2 - Preferred steps
3 -
If this happens you can follow the instructions below to recover the replica set:
First choose a container in your mongo addon - preferrably the last master if you can identify it
Code Block nex-cli cubes --addon name_of_mongo_addon
Stop all other mongo containers
Code Block nex-cli cubes:stop other_container_id
SSH to the rack hosting the container that is still alive
Code Block nex-cli racks:ssh 10.1.1.1
Connect to the container
Code Block sudo nex-attach my_container_id
Connect to Mongo using the Shell
Info mongo -u $MONGO_USER -p $MONGO_PASSWORD admin
Delete the local replication information
Code Block use local db.dropDatabase()
- Exit the MongoShell and the container
On the rack issue a restart of the container. Upon restart the mongo instance will initiate a new replicaSet (previous one being deleted)
Code Block # Restart nex-restart --name my_container_id # Monitor logs docker logs -f my_container_id
Exit the rack and bring up the other nodes using the nex-cli
Code Block nex-cli cubes:start other_container_id