Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 04-24-2010, 12:14 AM   #1
ficbot
Wizard
ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.
 
Posts: 2,409
Karma: 4132096
Join Date: Sep 2008
Device: Kindle Paperwhite/iOS Kindle App
Question about using Sigil to convert HTML

I have 200 HTML files (extracted from secure eReader) that I want to convert---once and for all. I have experimented with manually cleaning up the HTML. I have experimented with converting it to RTF. I wind up with issues every time, and I am tired of re-doing them every time I switch devices. I want to make one more push through them, do it all now and wind up with a master file I can convert to mobi or to anything else, indefinitely and forever, without having further glitches and needing to do more work.

If I have a file which looks good in Sigil and I save it as epub, is it a clean file? Can I replace my buggy HTML master with it and use it forever like a commercial epub book (all of my commercial epub books have converted just fine to mobi). Is the epub code it generates cleaner than plain old HTML and I can be assured that things like em-dashes or curly quotes I might have forgotten to manually edit out will appear and be fine forever after?

I am just really frustrated with this seemingly never-ending editing I keep having to do, and it's a lot of files. I want a solution where I can slog through this one more time and have files I can keep forever and re-use on any device I have, error-free. I posted earlier about RTF being perhaps better than HTML but I did some experimenting since then and am still having issues.
ficbot is offline   Reply With Quote
Old 04-24-2010, 07:41 AM   #2
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by ficbot View Post
If I have a file which looks good in Sigil and I save it as epub, is it a clean file? Can I replace my buggy HTML master with it and use it forever like a commercial epub book (all of my commercial epub books have converted just fine to mobi). Is the epub code it generates cleaner than plain old HTML and I can be assured that things like em-dashes or curly quotes I might have forgotten to manually edit out will appear and be fine forever after?
Sigil will preserve your HTML to the best of its abilities. Usually it's pretty good.

It will convert it all to Unicode; if the file has em-dashes, curly quotes or other special characters, they will be preserved. But if there were problems with them in the original file, the problems will remain after it's processed by Sigil. There's no free lunch.

But if it all worked before, it should all work in Sigil too. Sigil being WYSIWYG, if there are any problems, you'll see them in the Book View. If it looks good in Sigil, your probably quite safe.
Valloric is offline   Reply With Quote
Old 04-24-2010, 05:13 PM   #3
FizzyWater
You kids get off my lawn!
FizzyWater ought to be getting tired of karma fortunes by now.FizzyWater ought to be getting tired of karma fortunes by now.FizzyWater ought to be getting tired of karma fortunes by now.FizzyWater ought to be getting tired of karma fortunes by now.FizzyWater ought to be getting tired of karma fortunes by now.FizzyWater ought to be getting tired of karma fortunes by now.FizzyWater ought to be getting tired of karma fortunes by now.FizzyWater ought to be getting tired of karma fortunes by now.FizzyWater ought to be getting tired of karma fortunes by now.FizzyWater ought to be getting tired of karma fortunes by now.FizzyWater ought to be getting tired of karma fortunes by now.
 
FizzyWater's Avatar
 
Posts: 4,220
Karma: 73492664
Join Date: Aug 2007
Location: Columbus, Ohio
Device: Oasis 2 and Libra H2O and half a dozen older models I can't let go of
Valoric, I have epubs that look fine in Calibre, but when I open them in Sigil (1.9), all the special characters are displayed as gobblydegook.
FizzyWater is offline   Reply With Quote
Old 04-24-2010, 09:07 PM   #4
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by FizzyWater View Post
Valoric, I have epubs that look fine in Calibre, but when I open them in Sigil (1.9), all the special characters are displayed as gobblydegook.
Create a new issue on the tracker and attach one such file.

As a guess, your file probably specifies two encoding for the XHTML files: one in the XML declaration, and one in the meta tag. Sigil picks the meta tag encoding first (Firefox-like behavior) and Calibre picks the XML tag encoding first.

Goes without saying that a file that specifies two encodings is faulty (it's impossible to have two encodings for one byte stream). The odds that a Reading System will pick the correct one (since only one encoding can be correct) in such a situation is 50-50.
Valloric is offline   Reply With Quote
Old 03-02-2011, 06:13 AM   #5
BMaloney
Connoisseur
BMaloney doesn't litterBMaloney doesn't litter
 
Posts: 52
Karma: 126
Join Date: Jan 2011
Device: PRS-650
Hi all. I'm just a newbie here, so it's all a bit over my head , but this conversation is very interesting.

I'd like some more info, so I can follow up on what you guys are saying, maybe start reducing all the "gobbledy-gook" I, like ficbot, really HATE to deal with every time I transfer devices/applications.

How do I see/edit "XML declarations" and "meta tags" for html files? Is there a template/help section that might tell me what normal encoding looks like?

And, if I can take a step back, what about getting files into html format in the first place? I know that OpenOffice saves as html, but what about converting other files types, espcially pdf? Adobe Acrobat XPro will save pdf as html but then, of course, you have all that manual editing and coding to do afterwards. Any better ideas?

Thanks very much for sharing your time and experience, everyone!
BMaloney is offline   Reply With Quote
Old 03-02-2011, 09:45 AM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,778
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by BMaloney View Post
Hi all. I'm just a newbie here, so it's all a bit over my head , but this conversation is very interesting.

I'd like some more info, so I can follow up on what you guys are saying, maybe start reducing all the "gobbledy-gook" I, like ficbot, really HATE to deal with every time I transfer devices/applications.

How do I see/edit "XML declarations" and "meta tags" for html files? Is there a template/help section that might tell me what normal encoding looks like?

And, if I can take a step back, what about getting files into html format in the first place? I know that OpenOffice saves as html, but what about converting other files types, espcially pdf? Adobe Acrobat XPro will save pdf as html but then, of course, you have all that manual editing and coding to do afterwards. Any better ideas?


Thanks very much for sharing your time and experience, everyone!
2 widely different questions.

Sigil is for EPUB OUTPUT.

Sigil is not a converter. (Look at Calibre for that function (along with book management). EPUB is a good input source.).

Sigil has a Metadata Editor. (Press F8), Job partly done (You have to do the difficult part and supply the Data, spelled correctly, too. )
theducks is offline   Reply With Quote
Old 03-02-2011, 10:03 AM   #7
BMaloney
Connoisseur
BMaloney doesn't litterBMaloney doesn't litter
 
Posts: 52
Karma: 126
Join Date: Jan 2011
Device: PRS-650
Thanks for your quick reply, the ducks, but I'm not sure that's what I was looking for...

I realize that Sigil does not convert anything. But it does open up html files, which can be edited and saved as epub. That's how I've been doing it so far. It allows me to use the same method, more or less, if I write something on OOo, if I dload something to read from a web site, or if I convert a pdf to carry with me on my reader.

Is there something wrong with that, or is there a better way?

I also knew that Sigil has a metadata editor, but that data includes basic tags like author, publication date, etc. (and the option to add many more). None of it seems to address the issue of the encoding that affects special characters, which you guys were talking about earlier, and which I have to deal with often... Or did you mean the language option I see there?

My main goal is to streamline the whole process. If I spend two hours editing what takes me two hours to read, I'm not being efficient with my time, and the special characters issue can be an obstacle at times.
BMaloney is offline   Reply With Quote
Old 03-02-2011, 10:31 AM   #8
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,778
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Calibre does the "encoding" (user-optional) thing as part of the convert.

I am not sure what else "Language" meta-data may do. It may just be a required (to validate) Metadata entry in the manifest.
theducks is offline   Reply With Quote
Old 03-04-2011, 03:06 AM   #9
BMaloney
Connoisseur
BMaloney doesn't litterBMaloney doesn't litter
 
Posts: 52
Karma: 126
Join Date: Jan 2011
Device: PRS-650
Sorry for taking so long to reply, theducks, but life can be complicated!

So, too, is all this progamming/encoding stuff. To put it briefly, let's just say that I now see how confused I was, using the term "convert" wrong and everything.

I'm going to have to look into this a bit further... Like looking into the "user-optional" encoding you mentioned.

And if I have any more specific questions, I'll just start another thread.

Thanks again for your time!
BMaloney is offline   Reply With Quote
Old 03-06-2011, 08:12 AM   #10
DMSmillie
Enquiring Mind
DMSmillie understands when you whisper 'The dog barks at midnight.'DMSmillie understands when you whisper 'The dog barks at midnight.'DMSmillie understands when you whisper 'The dog barks at midnight.'DMSmillie understands when you whisper 'The dog barks at midnight.'DMSmillie understands when you whisper 'The dog barks at midnight.'DMSmillie understands when you whisper 'The dog barks at midnight.'DMSmillie understands when you whisper 'The dog barks at midnight.'DMSmillie understands when you whisper 'The dog barks at midnight.'DMSmillie understands when you whisper 'The dog barks at midnight.'DMSmillie understands when you whisper 'The dog barks at midnight.'DMSmillie understands when you whisper 'The dog barks at midnight.'
 
DMSmillie's Avatar
 
Posts: 562
Karma: 42350
Join Date: Aug 2010
Location: London, UK
Device: Kindle 3 (WiFi)
Quote:
Originally Posted by BMaloney View Post
How do I see/edit "XML declarations" and "meta tags" for html files? Is there a template/help section that might tell me what normal encoding looks like?
Hi BMaloney - there's no quick or easy answers to this question, because understanding it kind of covers several different topics - character encodings, (X)HTML meta tags and XML syntax.

However, here are some links to pages in the W3Schools website, which might help, as a starting point :
DMSmillie is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sigil, HTML Tidy or PEBCAK? DTM Sigil 6 09-26-2010 08:49 PM
HTML Loads To Sigil w/ Errors FlooseMan Dave Sigil 1 08-21-2010 09:15 PM
Sigil freezes when I + HTML ebook. Anarel Sigil 4 08-16-2010 11:13 AM
Sigil loses all text after an html error grumbles Sigil 3 05-13-2010 10:28 AM
Importing Open Office HTML in Sigil paulpeer Sigil 17 03-18-2010 04:23 AM


All times are GMT -4. The time now is 06:58 PM.


MobileRead.com is a privately owned, operated and funded community.