GitLab.com, a similar site to GitHub, providing source code version control repository used by software developers, has come a cropper after an employee accidentally deleted a directory on the wrong server.
GitLab.com is currently offline due to issues with our production database.
We’re working hard to resolve the problem quickly. Follow GitLabStatus for the latest updates.
The Register describes what went wrong:
Behind the scenes, a tired sysadmin, working late at night in the Netherlands, had accidentally deleted a directory on the wrong server during a frustrating database replication process: he wiped a folder containing 300GB of live production data that was due to be replicated.
Just 4.5GB remained by the time he canceled the rm −rf command. The last potentially viable backup was taken six hours beforehand.
To be honest, that’s the kind of mistake that anyone could make all too easily if there aren’t the right safeguards in place. But it’s bad news for anyone who was relying on GitLab to look after their code and merge requests properly, and weren’t keeping their own backup.
For, as the site explains, GitLab’s own backups have just added to the headaches:
Out of 5 backup/replication techniques deployed none are working reliably or set up in the first place.
Although it would be easy to bash GitLab, it is trying to make the best out of a bad situation by showing remarkable transparency with its dev-head users.
We accidentally deleted production data and might have to restore from backup. Google Doc with live notes https://t.co/EVRbHzYlk8
— GitLab.com Status (@gitlabstatus) February 1, 2017
The incident affected the database (including issues and merge requests) but not the git repo's (repositories and wikis).
— GitLab.com Status (@gitlabstatus) February 1, 2017
Not only has the site set up a regularly updated Google Doc, containing details of the steps it is taking to try to recover data and resume regular site access, but it is even currently live streaming on YouTube so you can watch its employees slowly restore data where possible.
Remember this – don’t feel too smug if you’re remembering making backups. Yes, it’s great that you make backups, but a backup can’t be relied upon unless you have *tested* that it restores properly.
More details about the incident can be found on GitLab’s blog.
First paragraph and last paragraph are remarkably similar. I'm guessing that it wasn't removed after editing.
Thanks Bob. Now fixed. :) There's a story about how that error happened, which I might tell sometime soon… maybe I'll discuss it on the next Smashing Security podcast (we're recording one in a few hours)
when things like this happen here, we just re purpose that admin to watch youtube videos of Barney & Friends for 6 months. hasn't failed yet.
¯_(ツ)_/¯