Minecraft Earth and Azure Cosmos DB part 2
This post is part 2 of a two-part series about out how organizations are using Azure Cosmos DB to meet real world needs and the difference it's making to them. In part 1, we explored the challenges that led service developers for Minecraft Earth to choose Azure Cosmos DB and how they're using it to capture almost every action taken by every player around the globe—with ultra-low latency. In part 2 we examine the solution's workload and how Minecraft Earth service developers have benefited from building it on Azure Cosmos DB.
Geographic distribution and multi-region writes
Minecraft Earth service developers used the turnkey geographic distribution feature in Azure Cosmos DB to achieve three goals: fault tolerance, disaster recovery, and minimal latency—the latter achieved by also using the multi-master capabilities of Azure Cosmos DB to enable multi-region writes. Each supported geography has at least two service instances. For example, in North America, the Minecraft Earth service runs in the West US and East US Azure regions, with other components of Azure used to determine which is closer to the user and route traffic accordingly.
Nathan Sosnovske, a Senior Software Engineer on the Minecraft Earth services development team explains:
"With Azure available in so many global regions, we were able to easily establish a worldwide footprint that ensures a low-latency gaming experience on a global scale. That said, people mostly travel within one geography, which is why we have multi-master writes setup between all of the service instances in each geography. That's not to say that a player who lives in San Francisco can't travel to Europe and still play Minecraft Earth—it's just that we're using a different mechanism to minimize round-trip latency in such cases."
Request units per second (RU/s) consumption
In Azure Cosmos DB, request units per second (RU/s) is the "currency" used to reserve guaranteed database throughput. For Minecraft Earth, a typical write request consumers about 10 RU/s, with an additional 2-3 RU/s used for background processing of the append-only event log, which is driven by Azure Service Bus.
"We've found that our RU/s usage scales quite linearly; we only need to increase capacity when we have a commensurate increase in write requests per second. At first, we thought we would need more throughput, but it turned out there was a lot of optimization to be done," says Sosnovske. "Our original design handled request volumes and complexity relatively well, but it didn't handle the case where the system would shard—that is, physically repartition itself internally—because of overall data volumes."
The reason for this was because allocated RU/s are equally distributed across physical partitions, and the physical partition with the most current data was running a lot hotter than the rest.
"Fortunately, because our system is modeled as an append only log that gets materialized into views for the client, we very rarely read old data directly from Azure Cosmos DB," explains Sosnovske. "Our data model was flexible enough to allow us to archive events to cold storage after they were processed them into views, and then delete them from Azure Cosmos DB using its Time to Live feature."
Today, with the service's current architecture, Sosnovske isn't worried about scalability at all.
"During development, we tested the scalability of Azure Cosmos DB up to one million RU/s, and it delivered that throughput without a problem," Sosnovske says.
Initial launch of Minecraft Earth
Minecraft Earth was formally released in one geography in October 2019, with its global rollout across all other geographies completed over the following weeks. For Minecraft fans, Minecraft Earth provides a means of experiencing the game they know and love at an entirely new level, in the world of augmented reality.
And for Sosnovske and all the other developers who helped bring Minecraft Earth to life, the opportunity to extend one of the most popular games of all time into the realm of augmented reality has been equally rewarding.
"A lot of us are gamers ourselves and jumped on the opportunity to be a part of it all," Sosnovske recalls. "Looking back, everything went pretty well—and we're all quite satisfied with the results."
Benefits of using Azure Cosmos DB
Although Azure Cosmos DB is just one of several Azure services that support Minecraft Earth, it plays a pivotal role.
"I can't think of another way we could have delivered what we did without building something incredibly complex completely from scratch," says Sosnovske. "Azure Cosmos DB provided all the functionality we needed, including low latency, global distribution, multi-master writes, and more. All we had to do was properly put it to use."
Specific benefits of using Azure Cosmos DB to build the Minecraft Earth service included the following:
Easy adoption and implementation. According to Sosnovske, Azure Cosmos DB was easy to adopt.
"Getting started with Azure Cosmos DB was incredibly easy, especially within the context of the .NET ecosystem," Sosnovske says. "We simply had to install the Nuget package and point it at the proper endpoint. Documentation for the service is very thorough; we haven't had any major issues due to misunderstanding how the SDK works."
Zero maintenance. As part of Microsoft Azure, Azure Cosmos DB is a fully managed service, which means that nobody on the Minecraft Earth services team needs to worry about patching servers, maintaining backups, data center failures, and so on.
"Not having to deal with day-to-day operations is a huge bonus," says Sosnovske. "However, this is really a benefit of building on Azure in general."
Guaranteed low latency. A big reason developers chose Azure Cosmos DB was because it provides a guaranteed single-digit (<10ms) latency SLA for reads and writes at the 99th percentile, at any scale, anywhere in the world. In comparison, Table storage latency would have been higher—with no guaranteed upper bound.
"Azure Cosmos DB is delivering as promised, in that we're seeing an average latency of 7 milliseconds for reads," says Sosnovske.
Elastic scalability. Thanks to the elastic scalability provided by Azure Cosmos DB, the game enjoyed a frictionless launch.
"At no point was Azure Cosmos DB the bottleneck in scaling our service," says Sosnovske. "We've done a lot of work to optimize performance since initial release and knowing that we wouldn't hit any scalability limits as we did that work was a huge benefit. We may have paid a bit more for throughput then we had to at first, but that's a lot better than having a service that can't keep up with growth in user demand."
Turnkey geographic distribution. With Azure Cosmos DB, geographic distribution was a trivial task for Minecraft Earth service developers. Adjustments to provisioned throughput (in RU/s) are just as easy because Azure Cosmos DB transparently performs the necessary internal operations across all the regions, continuing to provide a single system image.
"Turnkey geo-distribution was a huge benefit," says Sosnovske. "We did have to think a bit more carefully about how to model our system when turning on multi-master support, but it was orders of magnitude less work than solving the problem ourselves."
Compliance. Through their use of Time-to-Live within Azure Cosmos DB, developers can safely store location-based gameplay data for short periods of time without having to worry about violating compliance mandates like Europe's General Data Protection Regulation (GDPR).
"It lets us drive workflows like 'This player should only be able to redeem this location once in a given period of time,' after which Azure Cosmos DB automatically cleans up the data within our set TTL," explains Sosnovske.
In summarizing his experience with Azure Cosmos DB, Sosnovske says it was quite positive.
"Azure Cosmos DB is highly reliable, easy to use after you take the time to understand the basic concepts, and, best of all, it stays out of the way when you're writing code. When junior developers on my team are working on features, they don't need to think about the database or how data is stored; they can simply write code for a domain and have it just work."