Discord
Login
Community
DARK THEME

Back online after a server failure - and significant loss of data

microstudio.dev is back online today after a complete server failure on Saturday, July 8th. But data was lost in the process and we are back to the state of the data as it was 2 months ago.

What happened

The server went down on Saturday evening (CEST), July 8th. In the hosting provider's administration dashboard, the server was showing as running, but disconnected from the internet. We contacted them immediately and after about 1 hour, they got back to us with an answer, that the server was completely dead and all they could do was to provide a replacement. As soon as we received their answer, the server moved to a status "unrecoverable" and every option we had on the server was gone (like restarting in rescue mode). So far, no big deal though, this is something we are prepared for, all we need is to reinstall the new server and reload the data from the latest backup. Only problem is that I was far away from home / office and that I needed the (always benevolent) help of my pal @mattamore. We agreed on the phone that we would do the reinstall the next morning (Sunday).

Data is missing

On Sunday morning, we were preparing the new server instance and had a quick look at the data backups, only to notice that something was off with the dates of the files. We correctly had one "snapshot" folder created everyday, but in the last few ones, the latest file we could find was from May 9th. After some digging, we understood what had happened: everyday, our backup script was syncing files from the production server to a local server ; then a snapshot of the folder was created with the timestamp of the day. Then a notification was sent to us to confirm that the daily backup was completed. Only problem was that for some reason, the syncing (using rsync commands) was not actually working since May 9th. We had been making a snapshot of the same old folder contents everyday. This was going undetected and the script was incorrectly sending a backup confirmation everyday, despite the syncing not working. We still have to find out why the syncing stopped working at some point (we will look into this soon).

What's next

I want to say how mortified I am that I let all this happen. I should have tested the backups more thoroughly and more often. I feel terrible, knowing that many of you have lost hours or days of their work. I sincerely, deeply apologize to you all.

(We are currently trying to get some help from our hosting provider ; this is our last hope. If they somehow manage to give us access to the server, it's SSD or files, we should be able to recover the missing 2 months of data. We are currently waiting for their answer, fingers crossed.)

Update: we exhausted all the options with our service provider ; server remains unreachable ; we offered to buy back the server or SSD but they declined.

I will spend the next few days rethinking the backup system, with additional redundancy, reliable automated verifications and alerts and manual verification processes. I also want to maintain a spare server ready to kick in whenever the main server fails. So that none of this can ever happen again.

Glad you guys are back, hopefully you can retrieve the missing data

You are doing a wonderful job Gilles. You dedicate so much of your life to this project and I am so grateful. When several things fail at the same time, there is not much you can do to save the situation. Thank you for your care and concern for everyone, and your tireless work to get things fixed.

@gilles - I have your projects - screen Capture (dated 2023.07.04) and confetti (dated 2023.06.16). I can send them in a zip file.

@Loginus Oh that's nice! Yes please do send them back to me :-)

Update:

  1. we got a shot at trying to reboot and reconnect to the server, in rescue mode and in normal mode, but it remained unreachable. We were completely unable to connect to the server.

  2. I offered to buy back the server or it's SSD, but they declined the offer.

So we are pretty much stuck with no solution. I am afraid we will have to live with that 2 months hole in our microStudio life :-(

Ok, so that's what happened. I just wanted to know what happened, and it's completely fine. :)

@gilles - contact@microstudio.dev - Will this address be ok??

@Loginus yes sure! Thanks

@gilles Thank you for the titanic work

Hope you can fix it! Keep up the hard work.

I'm still learning, so coding my stuff again == more practice and that's a good thing. Don't worry about me, but I hope everyone else's projects out there are able to be fully recovered. Good luck to everyone on that. Also, thanks so much gilles for your tireless dedication to microStudio and it's users!

Gilles, its Ok, can happen with anyone.

Server failure was not your fault, but mistake of provider.

I think most of people here get agreed, that you do marvellous work round here, make great studio & look up for data backups. We all want to say: "Thanks, Gilles! We are apreciate your hard work & wish you have a nice days ;) Keep up and stay still ;)"

Post a reply

Progress

Status

Preview
Cancel
Post
Validate your e-mail address to participate in the community