View Single Post
Old 03-11-2010, 09:07 AM   #1
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Importing Open Office HTML in Sigil

In another thread a user asked how to make an ePub out of an Open Office file. Valloric responded "just export to HTML and import in Sigil". It's a bit more complicated than that

- A first remark is that OpenOffice Writer exports to HTML and not to XHTML. Sigil transforms a lot of HTML elements (P, DIV, H1...) into their lower case equivalents (p, div, h1), but a lot of them are not touched.

- The first group of elements that are not touched are the A-elements. If your OpenOffice document has notes, they are exported in this way:

Code:
<A CLASS="sdfootnoteanc" NAME="sdfootnote1anc" HREF="#sdfootnote1sym"><SUP>1</SUP></A>
Sigil transforms this to:

Code:
<a CLASS="sdfootnoteanc" HREF="#sdfootnote1sym" NAME="sdfootnote1anc"><span><sup>1</sup></span></a>
So CLASS and HREF are still in their upper case form, and NAME is not changed to "id". Hence most of the readers do not understand that this is a footnote and a first job is to search and replace all occurences of "NAME" with "id", "CLASS" with "class" and "HREF" with "href". After doing that, you'll see that the notes suddenly are blue links and are working.
Is this a job Sigil could do automatically?

- The second problem is about images. If the original OpenOffice document has images, they are exported as different files with links from within the HTML document, e.g.

Code:
<IMG SRC="../Provizore/Grafo_html_m26feaff4.jpg" NAME="Afbeeldingen4" ALIGN=LEFT WIDTH=310 HEIGHT=281 BORDER=0>
After import into Sigil, the only thing changed is "IMG" which is now "img". But even if you change "SRC" to "src", Sigil does not find the images. I haven't found an easy way to deal with this problem so far.

- Last there is a big group of elements that remains in the Sigil file such as DIR, LANG, ALIGN, CLASS, CONTENT, HTTP-EQUIV etc. Many of them you can just remove, for others such as STYLE you may want to adapt the CSS file.

I'm not complaining about Sigil. It does a great job. But it leaves a lot of work for us!
paulpeer is offline   Reply With Quote