Google Stadia: The Cloud and the Client
My earliest network gaming experience was playing Netrek in a campus computer lab. The game was played on UNIX workstations with up to sixteen simultaneous players. It was incredibly fun. It was also terrible for the local network. Players were banned from playing during ‘business’ hours, and even then, the truly hardcore preferred to play late at night, when as few people as possible were using the systems. Students with access to more powerful servers restricted to CS majors and grad students were envied. The problem, then as now, was lag. Lag slowed down the game between the client and the server, and if you were unlucky enough to be playing on an X terminal, it even slowed down the response between clicking a mouse and having the player’s client register that mouse click. Enough lag and the game became unplayable.
Computing is cyclical, and it seems strange to me that Google’s latest, much heralded Stadia gaming platform reminds me of nothing so much as playing Netrek on an X terminal hooked up to a Sun workstation.
The recent release of Google’s Stadia gaming platform have been mixed. Some reviewers have pleased by, even amazed at, how well it works for what it does. Others have found certain games difficult to play because of client-perceived lag. Given pre-release concerns about lag, and that gaming is one of the most lag-sensitive use cases for user interface, the question becomes: why does Google think it has a business case for Stadia?
Google Stadia works on the strength of Google’s cloud services. It effectively virtualizes the gaming user interface onto a ‘thin gaming client’ which runs on any device the user has to hand: a computer, a television with a gaming controller, even a phone. The rest of the service then becomes a cloud application, playing to Google’s undoubted strengths in creating, managing, and running cloud services. The value proposition: the target customer wants to have their gaming library available ‘anywhere’, assuming a fast network connection, effectively across multiple devices. The gaming library is to include resource heavy games which show best at high resolution and high frame rates, can have very large assets (over 100 GB/game), and often have a multiplayer component which Google claims their service improves by having the connection between the ‘real’ game client and the game server be faster and more reliable than a home connection. The games themselves will be kept fully up to date, without the need to download and patch potentially taking away gaming time.
The rest of the service (trophies, family sharing of games, etc.) is close enough to other, more established gaming services such as Xbox Live that it has to be able to compete on the strength of that thin gaming client, and that client has to be good enough that they would buy that game on Stadia rather than a competing service with a ‘thicker’ client such as a console, PC, or even mobile device. If the thin client isn’t good enough, Stadia fails in the market.
The History of the Thin Client
The tension between ‘thin’ clients and ‘thick’ clients goes far back in computing. Even in mainframe environments, stated benefits of the IBM 3270 terminal over its predecessors included its ability to transfer large blocks of data at a time, rather than simply echo individual characters, minimizing IO interrupts further up the chain and pushing certain operations to the display terminal itself. This allowed a very large number of terminals per mainframe computer, reducing centralized cost.
The X windowing system allows the use of cheap keyboard/mouse/display terminals, called X terminals, which were popular during a period where fully-functional UNIX workstations were prohibitively expensive for many use cases. While relatively inexpensive, these X terminals had many of the same problems other ‘dumb terminal’ applications tend to have: they were very sensitive to network latency, and they had very limited local processing power. The actual server tended to get overwhelmed when too many X terminals were in use simultaneously, and lag became a major issue.
Various technical solutions to combat this were created, such as Sun’s Network extensible Window System. Like the 3270, they pushed more processing work to the local client, optimized client-to-server communications, and reduced perceived user lag. Unlike the 3270, they were not commercially successful. The X terminal problem was eventually replaced by the cost of UNIX workstations dropping, or their replacement on the user-end with PC type hardware.
This push to a ‘Thin Client’ has since happened multiple times, with various levels of success: Thin Java Clients, Citrix Virtual Desktops, and even early web browsers could be considered ‘thin clients’ compared to downloading and displaying fully pre-rendered documents such as postcript files. The benefits of centralized administration and control clearly have their place, but each generation has also faced the push to make each client more and more fully functional. Browsers, in particular, have gone from relatively simple rendering and display applications to having some of the most sophisticated computing environments ever created in order to handle complex and possibly hostile web applications.
Gaming and Lag
By contrast, network gaming has historically tended to lean toward a ‘thick client’: PCs, gaming consoles, or mobile devices running a local client application. This derives partially from computer games starting off primarily as non-networked applications, often on dedicated hardware. (Networked games go back at least as far as 1973, with Empire and Maze War). Gaming has also tended to be locally compute-intensive, with each generation of PCs and gaming consoles becoming faster and more powerful, and the games that are built for those platforms using more and more of those resources. One of the limiting factors for each generation has been lag: how quickly do inputs by the user show up in the game and take effect.
Leaving aside networking entirely, it’s clear that humans are extremely sensitive to input lag. The difference between 100ms, 10ms, and 1ms lag is apparent even to an average user. For certain games, lag is less important than others, but for many of the games Google Stadia is targeting in the fighting, action/adventure and first person shooter categories, lag is critical. Possibly the most pathological case for this are fighting games: Street Fighter, Mortal Kombat, Super Smash Bros., etc. In these games, certain actions and responses come down to within a few frames of detail, where each frame only lasts roughly 16.67 ms (at 60Hz). In professional esports, fighting game tournaments at the top level are conducted face-to-face, and there are stories of players carrying CRTs to tournaments for older games in order to avoid the additional latency introduced by converting an older game’s output to digital input.
Fighting games also have some of the most sophisticated network code around in order to deal with network latency, some of which work better than others. The best results appear to come from architectures which use ‘rollback’ netcode. In short: user inputs appear to happen immediately, but the result can, if necessary, be ‘fixed’ after it appears to happen and replaced with a new result. This keeps the perceived user latency acceptable, while allowing for delay and other problems at the network level. This sort of speculative display isn’t limited to gaming. Any systems administrator who has had to login to a server far away or over a busy or unreliable network knows the pain of trying to interact with that server, be it through ssh or RDP. One terminal client, Mosh, emulates a local copy of what it believes is going on with a user’s input on that remote server, and it’s able to instantly update that local copy, then later true it with the actual output from the server.
However, for these types of clients, the local client has to be able to accept and immediately display the result of user input. Stadia’s mechanism cannot do this: user input must go from the thin client, through the network, to the ‘actual’ client in Google’s data center, the result rendered, and that rendered result sent back to the client for display and feedback. On a fast enough, reliable enough network, with the datacenter physically close to the user, it can be acceptable. But it puts user input lag, the most sensitive kind of lag, at the mercy of the network between the user and Google, the thing Google least controls. The ‘obvious’ fix, a more powerful client which can deal with local input and display the results, is already available: PCs, gaming consoles, and mobile devices. The cost of these devices is sufficiently low that a gamer choosing between them and the Google Stadia, and looking at the additional cost of a low latency, high bandwidth Internet connection to play on, would be hard pressed to see the benefit of Stadia. And a more casual gamer, one who doesn’t already have a console or PC and would play on a TV or mobile device, is unlikely to be the sort of customer who wants to buy the kind of high resolution, multiplayer online game that Goole Stadia is featuring.
What, then, is a good ‘cloud gaming’ architecture? Gaming services such as Xbox Live predate the ‘cloud’, as do massively multiplayer online games like World of Warcraft or EVE Online. A very large part of the cost of newer games, at least at the top of the market, is asset creation. Large amounts of these assets are hand-made and client updates have become so large because of the increasing size of these assets. Similarly, digital storefronts and digital libraries of games have become the norm. Here we see cloud-oriented services such as CDNs enter, along with the migration of gaming services to cloud infrastructure or being built natively in the cloud. What would be different would be scalable, real-time creation of procedurally generated content specifically for a given client. And we may finally be about to see that sort of cloud gaming service arrive with Microsoft Flight Simulator 2020. The developer claims that it will use satellite data, processed by Azure, to generate content which will allow a player to fly a plane ‘anywhere in the world’, and see what they would see if they actually flew over the real location, including local daylight and weather conditions.
Here the full potential dataset (the entire world) is too large to download to even a very large client, and real-time weather conditions necessitate real-time client updates. It still requires a powerful client, and a fast internet connection, but the cloud-specific strengths being leveraged are collecting and processing large amounts of data, then distributing that data to many different clients, each with individual needs, with the number of clients fluctuating based on demand. Microsoft is fortunate enough to ‘own’ both sides of the problem, having both its own cloud infrastructure and its own client base, but by owning both they’re able to build something that properly uses both, rather than trying to force a business strength into an area where it’s less able to compete.
I don’t know if Microsoft Flight Simulator 2020 will be a good game, or a commercial success. But it’s the first game I’ve seen that shouts ‘cloud gaming’ at me, and it has me seriously considering putting together a PC with a yoke and rudder pedals, and seeing how it feels compared to the real thing.