09-16-2010, 02:47 PM | #1 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
converting htm(l) format books for kindle
I found a collection of old sci-fi in .htm and .html formats. Calibre can convert them but where they have fixed carriage returns embedded, i.e. are set to look good only on a particular line width, then they still look bad in kindle, as they only partially reflow.
sending them to Amazon to convert does not work - you end up with html source code on your reader. I thought I'd found a workaround - open in MS Word, apply auto-format, save as .rtf, then have calibre re convert them. that's better ,but word's auto format picks out odd bits of text and enlarges them into headers, somewhat arbitrarily, and it's not practical to manually check all of a 1000 page book in word, especially if you don't want to see any plot spoilers. so I am wondering if there's a better way to get such books to fully re-flow, or if a future calibre release could have a more intelligent convert routine that strips out single carriage returns and only leaves in double or triple ones which are likely to be true paragraph ends ? the other annoyances that are sometimes present in old conversions are repetitiveheaders/footers with e.g. a full file path name on every original "page". or in one case a spam " converted by program x " message & URL link. any way to automate removing such things ? |
09-16-2010, 05:21 PM | #2 |
Wizard
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
try preprocess input to improve... under structure detection. Might need to convert to txt first
|
Advert | |
|
09-16-2010, 06:31 PM | #3 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
For the second annoyance you can also use the header/footer removal options under structure detection.
|
09-17-2010, 02:15 AM | #4 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
thanks guys - with "convert to txt first" - would that be use calibre to convert from htm to txt then use calibre again for txt to mobi.
the exact spam message that I want to remove ( which may not be a header/footer) but is throughout some books is this ( in extra-annoying bold text! ) "generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html" could I create a rule somhow that yould nuke all instances of that phrase ? |
09-17-2010, 09:11 AM | #5 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
You shouldn't need to convert to text first to use the preprocess option. That said, if it doesn't work then you may need to open a bug with the file.
While that message is a header/footer, it doesn't need to be to create a rule to get rid of it. Can't tell you exactly what the rule will be, you need to look at the source code of your file. There is a little magic wand next to the header or footer regex text-box. Just click it there, then write your pattern. It's probably something like: Code:
<p[^>]*>\s*(<span[^>]*>)?\s*(<b>)?\s*generated\sby\sABC.*?</p> |
Advert | |
|
09-17-2010, 09:18 AM | #6 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
|
||
09-17-2010, 09:59 AM | #7 | |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
I tried converting formats with al the header/ footer/ pre-process boxes ticked & they don't shift it. |
|
09-29-2010, 08:01 AM | #8 |
Wizard
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
When the header removal doesn't work for me and my neck starts to hurt, I convert to rtf and search and replace.
If you have several files with same string, you can open them all and switch between them and use same search box. Helen |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
htm and html show as Zip format? | charlie94 | Calibre | 6 | 07-19-2010 01:44 PM |
Converting PD Books to Kindle Help | doc_mart | Amazon Kindle | 6 | 11-25-2009 10:00 AM |
Converting PDF Books to an easily readable format for PRS505 | DaveC | Sony Reader | 3 | 10-01-2009 12:13 PM |
Converting e-reader secured books to sony format | Eliab | Sony Reader | 5 | 06-26-2009 11:13 AM |
Any thoughts on converting from Sony (lrf) to Kindle (prc) format? | Greg G | Sony Reader | 4 | 12-06-2007 11:59 PM |