Mongodb Session Service Crash
Incident Date: 2023/05/24
- Issue with session-service reported at 3.30PM
- We identified that the mongo session service was down and had used up all disk space (20Gi) shortly. That same day we created a new session service with 100Gi storage but weren't able to recover all old sessions
- On 2023/05/26 we were able to restore old sessions. We did lose two days of saved sessions (5/24-5/25)
Remediation
Bring mongodb back
- Take snapshot of existing EBS volume using AWS (no auto snapshots were set up)
- Set up new mongo database with helm:
helm install cbioportal-session-service-mongo-20230524 --version 7.3.1 --set image.tag=4.2,persistence.size=100Gi bitnami/mongodb - Connect the session service to use that one (see commit)
Bring session data back
The mongo data was stored in an AWS snapshot in mongo's binary format, so not immediately accessible for re-import into another database. First we had to bring that back.
What didn't work:
- Tried various approach of expanding existing k8s volume, but was tricky b/c volume expansion wasn't enabled for existing PersistenceVolumeClaims
What did work:
- Instead, started a new AWS EC2 instance with that volume attached. For whatever reason i couldn’t see the attached volume within ubuntu at first (had to use lsblk and mount). Might be something you always have to do
- Once the volume was accessible, we ran
docker run bitnami/mongodbwith the correct mount location specified to load the data - From a separate shell used mongodump as described (cmds are described in cbioportal/README)
- Now that we got the dump, we set up a new mongo database in the k8s cluster to load the data:
helm install cbioportal-session-service-mongo-4dot2-20230525 --set image.tag=4.2,persistence.size=100Gi bitnami/mongodb Then we had to copy over the data into that k8s volume. Unf
kubectl cpdidn't work (some TCP timeout error). Instead we figured we could create a 2nd container int he mongodb pod with rsync to copy over the data into a container in the existing k8s deployment:kubectl edit deployment cbioportal-session-service-mongo-4dot2-20230525-mongodb # add an ubuntu container - args: - infinity command: - sleep image: ubuntu imagePullPolicy: Always name: ubuntu resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /bitnami/mongodb name: datadir # wait for it to be deployed kubectl exec -it cbioportal-session-service-mongo-4dot2-20230525-mongodb-f6wp9vx -c ubuntu -- /bin/sh # now you can apt-get install rsync, ssh, whatev into that container and rsync the mongo dump # then re-import the mongo database dump using instructions in cbioportal/README