ffmpeg was a PITA to cross-compile correctly. I suppose I should post my binary, which has almost everything enabled. I had to leave ALSA out, but you can pipe the output to aplay for sound playback.
Anyway, the gmplay video has a 130msec average framerate, with a variable latency up to 1 second. Synchronizing sound to that will be difficult, and syncing video to sound will cause a lot more dropped frames. I needed to allow up to 1 second lag before framedropping to prevent too many dropped frames. The eink drivers speed up and slow down depending on scene complexity. 7.8 FPS is smooth enough on eink, but dropped frames when it cannot quite keep up cut that in HALF, which looks terrible. I chose to allow 1 second latency instead, which looks nice. But sound sync will be difficult.
A choice of nice sound with jerky video, or smooth video with glitchy shound, is a difficult decision. My thoughts are to implement variable audio playback speed with a 3-phase ring buffer with phase fadeout across boundary transitions (similar to changing TTS playback speed). That way we can vary sound playback speed while maintaining pitch accuracy, to sync sound to the laggy video. So the choices are complex "DSP" code, or simple code with either glitchy audio or glitchy video. I like KISS, but complexity wins in this case... Some day RSN.