Bungie has released an in-depth technical blog explaining Destiny 2's first-ever server rollbacks and how it will prevent similar issues in the future.
On January 28 and again on February 11, Destiny 2 was abruptly taken offline for emergency maintenance due to a bug which deleted currencies and materials from players' inventories. In both cases, the game was offline for eight to ten hours, with the outage ultimately ending in a total server rollback.
In the blog, Bungie said, "We wanted to give you a picture of what went wrong, how we fixed it, and how we’re planning on making sure this doesn’t happen again in the future. First, let’s look at what caused this problem in the first place: a game bug involving inventory management and a series of server configurations that re-introduced the bug after it was fixed."
There's a lot of technical mumbo-jumbo to work through here, so we'll bring you the short version. A few months ago, Bungie changed the way Destiny 2 tracks quests as inventory items. The game was getting hung up on auto-cleanup procedures which were breaking chronological sorting, so Bungie disabled some of those procedures to make things simpler. However, this change had the unintended effect of altering the way the game tracks other inventory items, namely stackable currencies and materials, and could cause it to misread them. Bungie noticed this bug before the first incident, but "incorrectly concluded that it was caused by a tooling failure with debug workflows we use for testing, and not an actual bug within the game," hence the January 28 rollback.
The origins of the bug are one thing, but its reappearance is a real doozy which goes back to October and the launch of Destiny 2 Shadowkeep.
To prepare for Shadowkeep, Bungie spun up more servers in October. As a result of the increased server load, "less than 1%" of these servers would occasionally crash, but these crashes could be fixed with a simple restart. These servers have quietly been crashing in the background ever since, but that never really mattered until February 11 with update 18.104.22.168.
"After launch [on February 11], some of the WorldServers once again crashed on startup because of a high volume of servers starting simultaneously," Bungie explained. "Once again we manually restarted those servers and thought everything was fine. We were wrong. Unbeknownst to us, this crash resulted in those WorldServers not applying the previous character data corruption fix. This meant that a small percentage of WorldServers were running the old code and the bug that was corrupting character data."
Due to the nature of this issue, Bungie didn't spot it with internal testing because it had the misfortune of connecting to good servers that didn't crash. It was only after hundreds of thousands of players tried logging on that the currency bug's reappearance was spotted. Voila, time for another server rollback.
The good news is that Bungie has now identified and fixed the root cause of the currency bug. This will be applied as a permanent update (which won't be missed in the event of a server crash) in the game's next hotfix. Additionally, Bungie says it's "investigating ways to speed up our rollback and recovery mechanisms," so if there is another rollback in the future, hopefully it won't take all day next time. In the same vein, the studio says it's "updating our development methodologies to catch issues like this earlier in the release pipeline," so if we're lucky, future issues like this one will be spotted before they make it to the live game.
Destiny 2's string of rollbacks is another reminder that, sometimes, issues with big online games simply can't be prevented.