Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 04-23-2010, 11:25 PM   #1
ficbot
Wizard
ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.
 
Posts: 2,409
Karma: 4132096
Join Date: Sep 2008
Device: Kindle Paperwhite/iOS Kindle App
RTF vs HTML---best way to convert my files?

I have been using HTML for my converted secure eReader files and this has lately been problematic. The HTML is very messy and has required numerous conversions---what was fine on the Sony was not fine on the Kindle, which was not fine on the Libre etc. etc. etc. I just want one basic file I can re-convert to any future format and read on all devices now. Presently, I want to convert to mobi and have an error-free file.

After going through a dozen HTML files, I found issues with line breaks, straight vs curly quotes and numerous inconsistencies. It seems I think all is fixed and then I find some other error. I am running it through Kompozer, copy and pasting the result from Firefox into a clean file, and I guess I just don't know enough about which problems to catch. I am wondering if it might be better to just copy the HTML into an RTF file and convert THAT in the future?

So what should I do? Copy and paste from firefox into Word and make them all RTF files, or develop some sort of HTML checklist I can use to verify---once and for all---the perfection of my files and then make HTML my archival format? I no longer buy secure eReader but I have about 200 files already and just don't have the heart to keep going through them all again every time I want to use a different reader (I review for Teleread and often test new ones). I just want one base file which is fine that I can re-convert forever and ever.
ficbot is offline   Reply With Quote
Old 04-24-2010, 12:38 AM   #2
ficbot
Wizard
ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.
 
Posts: 2,409
Karma: 4132096
Join Date: Sep 2008
Device: Kindle Paperwhite/iOS Kindle App
Just replying to say that I have done some experimenting and both are imperfect. I think I am going to stick with HTML unless anyone has any better ideas. I have tried Sigil too and it didn't really help. Can someone point me toward a checklist of things I need to run a find and replace on? So far, I have figured out I need to replace double line breaks with proper <p> and </p? paragraphing, and also check apostrophes, em-dashes and curly quotes and do find and replaces for those. What else? My goal is to go through all these files---thoroughly---one more time and make sure they are as perfect as possible. I will be VERY dismayed to do all that checking on 267 files and then find something else I need to fix! I am at my wit's end here, I am not prepared to buy all these books again just to get a proper epub. I regret ever getting involved int he secure eReader racket. I just want files that will display on all my devices with proper paragraphs and without funny symbols where the quote marks should be. I am prepared so spend some time on the fixing, but not if I get a new reader down the road and will have to do it all again!
ficbot is offline   Reply With Quote
Advert
Old 04-24-2010, 05:00 AM   #3
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by ficbot View Post
I am wondering if it might be better to just copy the HTML into an RTF file and convert THAT in the future?
Certainly not. RTF is, if anything, even worse to get into a "clean state" than HTML. What I would recommend is to stick to XHTML (because the extra limitations over basic HTML allow for easier automatic conversions), and with a very limited version of it - the fewer tags you use, the less likely you are to encounter problems. If you want an inspiration, download this:
http://www.pepak.net/files/e-books/u...ble_people.zip
It produces a file which is reasonably simple to convert to any format I tried.

Quote:
So what should I do? Copy and paste from firefox into Word and make them all RTF files, or develop some sort of HTML checklist I can use to verify---once and for all---the perfection of my files and then make HTML my archival format?
I would recommend HTML. Unfortunately, you will need to descend to the roots and do all coding yourself - if you use some graphical editors, the result will likely be poor.

Quote:
I just want one base file which is fine that I can re-convert forever and ever.
HTML+CSS is a good solution, as is, with certain limitations, XML+XSLT. You may also want to look at my H2LRF, which I use for precisely the task you want - one source format which I can easily (read: changing one parameter of the command, or changing one source file for all books) convert to any format.
pepak is offline   Reply With Quote
Old 04-24-2010, 05:06 AM   #4
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Also, this article of mine might be of interest:
http://www.pepak.net/e-books/vycisteni-html-knihy/
It deals with cleaning up HTML source (from FineReader) to the state you see in that Unspeakable People demo using regular expressions. Unfortunately, it is written in Czech language, but you may be OK with Google Translation. Quick look reveals gems such as "Cutting off heads" (="Remove headers"), but it will give you an idea (you MUST combine it with the Czech version, though, because Google Translator destroys all CODE blocks) and besides, regular expressions and HTML are the same in all languages. Also, I provide ZIPped source files before and after each cleanup step, which will guide you a bit more.

If there is enough interest, I may be willing to translate the article to english eventually.
pepak is offline   Reply With Quote
Old 04-24-2010, 05:59 AM   #5
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
I read Czech a little but so poorly that I gave up on your interesting article some time ago.

A translation in English would indeed be very much appreciated.
roger64 is offline   Reply With Quote
Advert
Old 04-24-2010, 08:49 AM   #6
ficbot
Wizard
ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.
 
Posts: 2,409
Karma: 4132096
Join Date: Sep 2008
Device: Kindle Paperwhite/iOS Kindle App
So what would be your checklist of eventual things to fix? So far I have found issues with curly quotes and apostrophes, so I went through and fixed it and then had trouble with em-dashes. I tried saving as plain text and they didn't convert tor regular ones. So if I have to do a manual find and replace, I need a comprehensive list of what to look for so I only have to go through this once.
ficbot is offline   Reply With Quote
Old 04-24-2010, 09:30 AM   #7
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
I agree that HTML is the best source file format these days for easy conversion to others.

Have you tried just saving the HTML file directly from inside Firefox?

I wouldn't go through KompoZer or any other WYSIWYG editor if I could possibly help it. In fact, the constantly trouble I had with KompoZer screwing up my HTML files was the reason I finally decided to ditch WYSIWYG editors altogether.

And the terribly quality of Word-generated HTML files is legendary.

If you really must use a Word Processor, I've found that AbiWord tends to generate somewhat-decent HTML output for converting.

The issues with quotation marks and en/em-dashes is probably a matter of saving the file in the wrong character encoding. I would think that saving the HTML file through Firefox itself would keep it in its original encoding. I guess you could do it manually by looking at what encoding Firefox is using (under View>Character encoding while viewing the page), and then copy and paste the source code into a sophisticated text editor (NOT something like Notepad!... but maybe, e.g., Notepad++), and then make sure it saves it in the same encoding. (I don't really know what the good editors are for Windows or mac, since I use linux.) But I would hope Firefox would take care of that for you if you just File > Save Page As...

But if you're really interested in this stuff, learning the HTML/CSS yourself. The tutorials at w3schools.com are quick, free, and probably thorough enough for your purposes.

Last edited by frabjous; 04-24-2010 at 09:33 AM.
frabjous is offline   Reply With Quote
Old 04-24-2010, 09:53 AM   #8
ficbot
Wizard
ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.
 
Posts: 2,409
Karma: 4132096
Join Date: Sep 2008
Device: Kindle Paperwhite/iOS Kindle App
Tried saving from Firefox and got no line breaks on the Libre and the same issue with things like em-dashes. I think what I need is a checklist like replace smart quotes, replace apostrophe, replace em-dash etc. but I don't know what else to add. It's frustrating because some of these looked fine on the Kindle and I read them there so I don't want to spend precious reading time re-reading them line by line on another device just to check them all when I have so much else to read. I just want to know my source files are in order for future conversions and want to get them in order once and for all.
ficbot is offline   Reply With Quote
Old 04-24-2010, 10:37 AM   #9
ficbot
Wizard
ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.
 
Posts: 2,409
Karma: 4132096
Join Date: Sep 2008
Device: Kindle Paperwhite/iOS Kindle App
Update: someone suggested I save to mobi from epub instead of HTML and it looks like that solved all the problems. But I don't know what's going on behind the scenes. Is the resulting epub and/or mobi file 'clean' now and can it be my master file? I am just so sick of dealing with all of this. I don't buy this format anymore but am not prepared to throw away the books I have already. Can converting to epub and then using the mobi from that really solve all my problems? If so---

1) Do I still need to keep the original HTML?
2) If not, can I ditch the epub too and convert from mobi in the future?
3) Or should I save the epub (converted from HTML) for some other reason?
4) Will the epub or mobi master be better than the original HTML for future use?
5) Anything going on behind the scenes wit these files which might be a problem later?
ficbot is offline   Reply With Quote
Old 04-24-2010, 12:58 PM   #10
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
I'd suggest keeping the ePub as the "master" file. ePub is easily edited, and easily converted to other formats. Additionally, compared to HTML, it packages everything together into a single file - text, images, metadata, etc.
HarryT is offline   Reply With Quote
Old 04-24-2010, 01:32 PM   #11
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Unfortunately, the current EPUB-generating tools leave a LOT to be desired. For example, Calibre-generated EPUB files are OK for display but almost useless for conversion as they contain too much junk.
pepak is offline   Reply With Quote
Old 04-24-2010, 03:18 PM   #12
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Sounds to me like there's something wrong with the Libre's ePub rendering. It's hard to understand why you'd get such bad results with it. Do the same ePubs look OK in Adobe Digital Editions?
frabjous is offline   Reply With Quote
Old 04-24-2010, 05:37 PM   #13
ficbot
Wizard
ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.
 
Posts: 2,409
Karma: 4132096
Join Date: Sep 2008
Device: Kindle Paperwhite/iOS Kindle App
The epubs look fine but I don't prefer to use epubs since the page turning button on the right side does not work with the epub files, only with mobi. It is the mobi files I am having trouble with. For example:

- Standard HTML converted to mobi (fine on the Kindle) had no page line breaks
- RTF converted to mobi (terrible on both) lots of errors for em-dashes and such
- RTF saved to HTML and then converted to mobi (fine on Kindle) had line breaks but also had formatting glitches

Best so far has been HTML converted to epub and then the epub converted to mobi (i.e. not converting the HTML to mobi but using the epub file as the source). This will take awhile though since Calibre is slow in doing conversions for me. So before I go ahead and do them all, I want to make sure nothing is going on behind the hood that will force me re-do all of this later for some reason.
ficbot is offline   Reply With Quote
Old 04-25-2010, 12:19 AM   #14
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
It seems to me you are looking for some magical conversion tool that will take a crappy input and produce a clean and easy-to-convert output. Unfortunately, there is no such tool. All converters try to handle such a situation, but the results are mixed and usually much worse than you would get by hand-editing. Sure, hand-editing is a lot of work and needs a lot of knowledge, but it can be done and once you try it a few times, the process is quite easy and straightforward. Unfortunately, it is not something that could be summarized into "replace A with B" list - a lot of the necessary steps are done in a "I look at it and see the solution right" way.
pepak is offline   Reply With Quote
Old 04-25-2010, 12:33 AM   #15
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Try playing around with HTML Tidy - it does a lot of these things, but it may have a steep learning curve... not really sure; haven't played around with it enough myself.
frabjous is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
LRFTools. Convert LRF to EPUB, HTML, PDF and RTF elinares LRF 279 07-30-2011 11:48 PM
Unable to convert RTF files to ePub Chrysanthemum Calibre 14 07-07-2010 01:57 PM
Cannot Convert HTML to RTF LightGuard Calibre 1 06-27-2010 10:37 AM
Can't convert RTF files sglinert Calibre 10 06-08-2010 11:03 AM
Can't convert RTF files sglinert Calibre 0 06-06-2010 10:14 PM


All times are GMT -4. The time now is 08:52 AM.


MobileRead.com is a privately owned, operated and funded community.