I’d like to share a story about the challenges of implementing multiplayer in an RTS using a lockstep model. In this model, you only exchange actions between the peers and all devices need to do the exact same simulation to stay in sync. If anything differs, the simulation goes out of sync which means you will see a totally different game on both devices.
It is extremely easy to get out of sync and I had to hunt down a lot of desync bugs during development. This ain’t exactly easy because you can’t debug the code in multiplayer because the other device would automatically get a disc.
I don’t want to bore you with all desync’s I had to fix but one that kept me busy for two nights in the first week of 2013 is particularly interesting. Basically, I did a test game between an iPad 4 and an iPad 2 and they were getting out of sync. I could not reproduce the same problem in matches between other devices so something was going on here.
During previous desync bugs, I was often wondering if the Turning machine was still a valid concept and software was deterministic, and I was often blaming floating point arithmetic when there was actually another source.
However, in this case, it was really a hardware problem. Basically, the iPad 4 (as well as the iPhone 5) have a different CPU (and thus a different FPU). It seems like single precision floating point trigonomy functions were not working correctly in the new A6X CPU. Most of the time, the calculations were identical but for some angles, the new CPU came up with different values that are actually less precise compared to the “real” result.
|Pre A6X, e.g. iPad 2, iPad 3,iPhone 4||sin(2.42680406570f)||0.65545779467|
|A6X, e.g. iPad 4, iPhone 5||sin(2.42680406570f)||0.65545773506|
The (albeit small) difference quickly sums up to get the whole game totally out-of-sync.
Had to go through a lot of huge trace files to finally figure that one out and fix it.