View Single Post
Old 05-13-2014, 11:30 AM   #40
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,623
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Hi

Forgive me, I did a very unorthodox try using your latest 46 1.7.GNU commit.

I took a big odt file (a whole book, a little under 300k) which I previously used as a source for an EPUB. I used your odt2html and got a sizable output.html file. I guessed I could open it after about 30 seconds but I got no warning information about when the processing exactly ended.

Since I did not know how to follow, I created a new EPUB with the Calibre editor out of this file. Then I imported the two stylesheets from my original EPUB. Within the EPUB, I linked this output.html file to these two stylesheets.

I checked with Calibre. It only complained that the text file was too big. I split it in two and I had a working EPUB which certainly I could read anywhere.

I noticed of course some defects in the display.
- the paragraph styles were nearly properly reported though some stylenames were interspersed with _20_ like Text_20_body instead of Textbody, or Ital_20_droite instead of Italdroite. This was easily corrected. Other paragraph names were properly transcribed (Quotation, Centrage,...).

The main missing points are theses ones:
- the titles were treated as plain paragraph (p class="Heading"). Intermediate h2 tags (chapters) disappeared. So I could not produce a usable toc.ncx
- the small fry (I mean the i, sup, /br, ... tags) were all treated like a common span without any parameter, which means there is of course some transcription work to do in this area.

All in all, I did not expect to get such a quick and workable result with an odt file of this size.

Congratulations!

Last edited by roger64; 05-13-2014 at 12:06 PM.
roger64 is offline   Reply With Quote