MobileRead Forums - View Single Post - Automated Processing Workflows as and with Free Software

roger64 · 05-13-2014, 11:30 AM

Hi

Forgive me, I did a very unorthodox try using your latest 46 1.7.GNU commit.

I took a big odt file (a whole book, a little under 300k) which I previously used as a source for an EPUB. I used your odt2html and got a sizable output.html file. I guessed I could open it after about 30 seconds but I got no warning information about when the processing exactly ended.

Since I did not know how to follow, I created a new EPUB with the Calibre editor out of this file. Then I imported the two stylesheets from my original EPUB. Within the EPUB, I linked this output.html file to these two stylesheets.

I checked with Calibre. It only complained that the text file was too big. I split it in two and I had a working EPUB which certainly I could read anywhere.

I noticed of course some defects in the display.
- the paragraph styles were nearly properly reported though some stylenames were interspersed with _20_ like Text_20_body instead of Textbody, or Ital_20_droite instead of Italdroite. This was easily corrected. Other paragraph names were properly transcribed (Quotation, Centrage,...).

The main missing points are theses ones:
- the titles were treated as plain paragraph (p class="Heading"). Intermediate h2 tags (chapters) disappeared. So I could not produce a usable toc.ncx
- the small fry (I mean the i, sup, /br, ... tags) were all treated like a common span without any parameter, which means there is of course some transcription work to do in this area.

All in all, I did not expect to get such a quick and workable result with an odt file of this size.

Congratulations!

05-13-2014, 11:30 AM	#40
roger64 Wizard Posts: 2,625 Karma: 3120635 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	Hi Forgive me, I did a very unorthodox try using your latest 46 1.7.GNU commit. I took a big odt file (a whole book, a little under 300k) which I previously used as a source for an EPUB. I used your odt2html and got a sizable output.html file. I guessed I could open it after about 30 seconds but I got no warning information about when the processing exactly ended. Since I did not know how to follow, I created a new EPUB with the Calibre editor out of this file. Then I imported the two stylesheets from my original EPUB. Within the EPUB, I linked this output.html file to these two stylesheets. I checked with Calibre. It only complained that the text file was too big. I split it in two and I had a working EPUB which certainly I could read anywhere. I noticed of course some defects in the display. - the paragraph styles were nearly properly reported though some stylenames were interspersed with _20_ like Text_20_body instead of Textbody, or Ital_20_droite instead of Italdroite. This was easily corrected. Other paragraph names were properly transcribed (Quotation, Centrage,...). The main missing points are theses ones: - the titles were treated as plain paragraph (p class="Heading"). Intermediate h2 tags (chapters) disappeared. So I could not produce a usable toc.ncx - the small fry (I mean the i, sup, /br, ... tags) were all treated like a common span without any parameter, which means there is of course some transcription work to do in this area. All in all, I did not expect to get such a quick and workable result with an odt file of this size. Congratulations! Last edited by roger64; 05-13-2014 at 12:06 PM.