Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 03-11-2010, 09:07 AM   #1
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Importing Open Office HTML in Sigil

In another thread a user asked how to make an ePub out of an Open Office file. Valloric responded "just export to HTML and import in Sigil". It's a bit more complicated than that

- A first remark is that OpenOffice Writer exports to HTML and not to XHTML. Sigil transforms a lot of HTML elements (P, DIV, H1...) into their lower case equivalents (p, div, h1), but a lot of them are not touched.

- The first group of elements that are not touched are the A-elements. If your OpenOffice document has notes, they are exported in this way:

Code:
<A CLASS="sdfootnoteanc" NAME="sdfootnote1anc" HREF="#sdfootnote1sym"><SUP>1</SUP></A>
Sigil transforms this to:

Code:
<a CLASS="sdfootnoteanc" HREF="#sdfootnote1sym" NAME="sdfootnote1anc"><span><sup>1</sup></span></a>
So CLASS and HREF are still in their upper case form, and NAME is not changed to "id". Hence most of the readers do not understand that this is a footnote and a first job is to search and replace all occurences of "NAME" with "id", "CLASS" with "class" and "HREF" with "href". After doing that, you'll see that the notes suddenly are blue links and are working.
Is this a job Sigil could do automatically?

- The second problem is about images. If the original OpenOffice document has images, they are exported as different files with links from within the HTML document, e.g.

Code:
<IMG SRC="../Provizore/Grafo_html_m26feaff4.jpg" NAME="Afbeeldingen4" ALIGN=LEFT WIDTH=310 HEIGHT=281 BORDER=0>
After import into Sigil, the only thing changed is "IMG" which is now "img". But even if you change "SRC" to "src", Sigil does not find the images. I haven't found an easy way to deal with this problem so far.

- Last there is a big group of elements that remains in the Sigil file such as DIR, LANG, ALIGN, CLASS, CONTENT, HTTP-EQUIV etc. Many of them you can just remove, for others such as STYLE you may want to adapt the CSS file.

I'm not complaining about Sigil. It does a great job. But it leaves a lot of work for us!
paulpeer is offline   Reply With Quote
Old 03-11-2010, 09:42 AM   #2
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Most of your problems stem from attributes being in all uppercase. I used to let Tidy convert them all to their lowercase equivalents (it does this by default), but this ended up wreaking havoc on SVG attributes. SVG has lovely case-sensitive attributes like "viewBox", and "viewbox" or "VIEWBOX" don't work. So I had to hack Tidy into leaving attributes in whatever case they came in.

I made this change months ago, and no one complained thus far. I plan on taking a look into making Tidy convert uppercase attributes to lowercase, but leaving mixed-case attributes alone. Sounds simple, but if you've ever taken a look into Tidy source code, you'd quickly realize it's not, mostly because Tidy source is a horrible mess of unreadable spaghetti C code.
Valloric is offline   Reply With Quote
Old 03-11-2010, 10:03 AM   #3
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Quote:
Originally Posted by Valloric View Post
I made this change months ago, and no one complained thus far. I plan on taking a look into making Tidy convert uppercase attributes to lowercase, but leaving mixed-case attributes alone.
Maybe it's easier to persuade the OpenOffice guys to make their program export to plain vanilla XHTML instead of the old HTML ...
paulpeer is offline   Reply With Quote
Old 03-11-2010, 11:25 AM   #4
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
I've just fixed this in Tidy. Future versions of Sigil will fold uppercase attributes to lowercase, and mixed-case attributes will be left as is.
Valloric is offline   Reply With Quote
Old 03-11-2010, 11:33 AM   #5
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Quote:
Originally Posted by Valloric View Post
I've just fixed this in Tidy. Future versions of Sigil will fold uppercase attributes to lowercase, and mixed-case attributes will be left as is.
Uauuu! You're amazing. Thank you!
And have you seen the part about images in my post? This isn't just a uppercase/lowercase question, is it?

Last edited by paulpeer; 03-11-2010 at 11:33 AM. Reason: typo
paulpeer is offline   Reply With Quote
Old 03-11-2010, 11:41 AM   #6
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Quote:
Originally Posted by paulpeer View Post
Uauuu! You're amazing. Thank you!
And have you seen the part about images in my post? This isn't just a uppercase/lowercase question, is it?
It's caused by the same issue.
Valloric is offline   Reply With Quote
Old 03-16-2010, 09:40 AM   #7
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,436
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
I just tried to export direct from OpenOffice as html. I've got very unsatisfactory results. Text looks scrambled on Sigil and so on. I gave up.

I then used an OpenOffice extension called "writer2xhtml", which gives more control to the user and export odt file as so-called "strict html". The results look far better.
http://extensions.services.openoffic...t/writer2xhtml

Sigil opened the file without any complaint nor showing any visible defect. I could process it easily (checking TOC, filling meta,...) and save as an epub file. But once the epub was on my PRS-505, I've got an "Error Page!" and the file can't load.

I tried many small changes to no avail.

Last edited by roger64; 03-16-2010 at 09:43 AM. Reason: adding writer2xhtml url
roger64 is online now   Reply With Quote
Old 03-16-2010, 10:31 AM   #8
KevinH
Guru
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 895
Karma: 410248
Join Date: Nov 2009
Device: many
Hi roger64,

It seems there are many reasons for an "Error Page" but -

one possible reason is that you have not properly split the file into sections that are smaller that 260K bytes. That seems to be the upper limit on chapter size for the Sony family of e-readers and many others.

So make sure you have broken the single large html file into sections, 1 file for each chapter.

If you have already done that, that look for a particularly large or long chapter and split that one as well as some appropriate point.

KevinH
KevinH is offline   Reply With Quote
Old 03-16-2010, 10:36 AM   #9
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Have you tried checking your ePub with Validator? http://threepress.org/document/epub-validate/
This often gives a good hint.
paulpeer is offline   Reply With Quote
Old 03-16-2010, 10:44 AM   #10
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,436
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
OK Solved. I split my file in five "chapter breaks", including one for one image.
I also suppressed the second image which was in .gif format.

One of these two things did the trick. BTW the end result is perfect. So I would recommend using this extension which works well with Sigil.
I use OpenOffice 3.2 with Linux.

So I did not have to test the Validator. I keep it for the next time.

Thanks very much for your help.

Last edited by roger64; 03-16-2010 at 11:15 AM. Reason: Report
roger64 is online now   Reply With Quote
Old 03-17-2010, 11:17 AM   #11
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,436
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
About charsets

Another little hitch, with another file. I am learning slowly.

Starting from a file in another format, I made an odt out of it some time ago, then exported it yesterday to xhtml from OpenOffice, them made an epub out of it with Sigil.

Validator gives me many "errrors", mostly about "lang" attribute unneeded but I can open the epub on my Sony. I have one hitch though.

What I see is not exactly what I get. While in Sigil, it looks perfect, on my PRS-505, I have a lot of characters replaced by question marks.

I think there is probably a wrong charset somewhere.

Looking on the epub meta data, I read:
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" /> I am not sure though that the file uses really a utf-8 charset.

In Linux, I know it's possible to ascertain the charset of a text file, using the command line with file

I do not know how to proceed with html, or epub files and how to set them on the right track.

Any hint?
roger64 is online now   Reply With Quote
Old 03-17-2010, 11:27 AM   #12
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Quote:
Originally Posted by roger64 View Post
I think there is probably a wrong charset somewhere.
The Sony reader has only a very limited character set. It cannot show languages like Polish or Greek or Rumanian. In what language is your book written?

Options are embedding a good font into the ePub, or installing a good system font in your reader. Neither of both is very easy, but it can be done.
paulpeer is offline   Reply With Quote
Old 03-17-2010, 12:01 PM   #13
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,436
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
answer: Text is English, font is Times New Roman...All very basic.

I just read from Sigil FAQ that many of my question marks proceed from "soft hyphens" in the html text. I will take care to avoid them in the future.

But not all.
So, if you could point me some documentation on how to embed a font into an epub? (customizing the reader would make me nervous.. )
roger64 is online now   Reply With Quote
Old 03-17-2010, 12:10 PM   #14
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Quote:
Originally Posted by roger64 View Post
So, if you could point me some documentation on how to embed a font into an epub? (customizing the reader would make me nervous.. )
That's the way I have learnt it: http://blog.threepress.org/2009/09/1...in-epub-files/

I use to do all those things in a text editor (capable of UTF-8), but I think a part of the job can be done in Sigil now (since version 0.2.0).
paulpeer is offline   Reply With Quote
Old 03-17-2010, 12:57 PM   #15
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Quote:
Originally Posted by roger64 View Post
While in Sigil, it looks perfect, on my PRS-505, I have a lot of characters replaced by question marks.
The PRS-505 doesn't have the required fonts to display those characters, while your computer does.

You need to embed the required fonts and reference them using the @font-face CSS rules.
Valloric is offline   Reply With Quote
Reply

Tags
export, html2epub, import, sigil

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sigil shows a blank document when importing valid HTML walter2 Sigil 15 03-25-2010 07:17 AM
Open Office ePub for Sony PRS600 Harryplopper ePub 0 02-23-2010 05:10 AM
How To Correctly Format In Open Office ? gargoyle67 Workshop 19 10-07-2009 03:58 AM
SmartQ 7 - Open Office ? Nation.A.List Alternative Devices 5 06-29-2009 12:51 PM
Open Office 3.0 released hidari Lounge 18 10-20-2008 05:05 PM


All times are GMT -4. The time now is 02:49 AM.


MobileRead.com is a privately owned, operated and funded community.