If it looks impossible to do, I am thinking at two other ways:
1- putting together in a cadre the image and the caption (top or down), but this should be code-heavy and probably unstable
2- maybe letting the image floating alone (this is easy) and providing the caption with an hyperlink. There should be no problem clicking on the image even with a standard Kobo...