Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-16-2010, 02:47 PM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
converting htm(l) format books for kindle

I found a collection of old sci-fi in .htm and .html formats. Calibre can convert them but where they have fixed carriage returns embedded, i.e. are set to look good only on a particular line width, then they still look bad in kindle, as they only partially reflow.

sending them to Amazon to convert does not work - you end up with html source code on your reader.

I thought I'd found a workaround - open in MS Word, apply auto-format, save as .rtf, then have calibre re convert them. that's better ,but word's auto format picks out odd bits of text and enlarges them into headers, somewhat arbitrarily, and it's not practical to manually check all of a 1000 page book in word, especially if you don't want to see any plot spoilers.

so I am wondering if there's a better way to get such books to fully re-flow, or if a future calibre release could have a more intelligent convert routine that strips out single carriage returns and only leaves in double or triple ones which are likely to be true paragraph ends ?

the other annoyances that are sometimes present in old conversions are repetitiveheaders/footers with e.g. a full file path name on every original "page". or in one case a spam " converted by program x " message & URL link. any way to automate removing such things ?
cybmole is offline   Reply With Quote
Old 09-16-2010, 05:21 PM   #2
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
try preprocess input to improve... under structure detection. Might need to convert to txt first
speakingtohe is offline   Reply With Quote
Advert
Old 09-16-2010, 06:31 PM   #3
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
For the second annoyance you can also use the header/footer removal options under structure detection.
ldolse is offline   Reply With Quote
Old 09-17-2010, 02:15 AM   #4
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
thanks guys - with "convert to txt first" - would that be use calibre to convert from htm to txt then use calibre again for txt to mobi.

the exact spam message that I want to remove ( which may not be a header/footer) but is throughout some books is this ( in extra-annoying bold text! )
"generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html"

could I create a rule somhow that yould nuke all instances of that phrase ?
cybmole is offline   Reply With Quote
Old 09-17-2010, 09:11 AM   #5
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
You shouldn't need to convert to text first to use the preprocess option. That said, if it doesn't work then you may need to open a bug with the file.

While that message is a header/footer, it doesn't need to be to create a rule to get rid of it. Can't tell you exactly what the rule will be, you need to look at the source code of your file. There is a little magic wand next to the header or footer regex text-box. Just click it there, then write your pattern. It's probably something like:
Code:
<p[^>]*>\s*(<span[^>]*>)?\s*(<b>)?\s*generated\sby\sABC.*?</p>
but you'll need to make it work for whatever is actually in your file.
ldolse is offline   Reply With Quote
Advert
Old 09-17-2010, 09:18 AM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by cybmole View Post
thanks guys - with "convert to txt first" - would that be use calibre to convert from htm to txt then use calibre again for txt to mobi.
I don't use mobi, but I'd say the answer is yes. Having fixed width html is very unusual, but relatively common for txt. Calibre has a special switch to handle that case for txt - "Use print formatting." I believe the idea is to convert the html to txt, which will bring in all the "bad" CR/LF paragraphs, then strip them in the conversion with that option.

Quote:
the exact spam message that I want to remove ( which may not be a header/footer) but is throughout some books is this ( in extra-annoying bold text! )
"generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html"

could I create a rule somhow that yould nuke all instances of that phrase ?
Yes. It's been asked about many times and I've seen several posts on it, I'd suggest you search here for the best nuke string.
Starson17 is offline   Reply With Quote
Old 09-17-2010, 09:59 AM   #7
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by Starson17 View Post

Yes. It's been asked about many times and I've seen several posts on it, I'd suggest you search here for the best nuke string.
search for "Generated by ABC Amber LIT ...." ? can I search for an exact phrase as a search on generated by finds hundreds of posts ?

I tried converting formats with al the header/ footer/ pre-process boxes ticked & they don't shift it.
cybmole is offline   Reply With Quote
Old 09-29-2010, 08:01 AM   #8
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
When the header removal doesn't work for me and my neck starts to hurt, I convert to rtf and search and replace.

If you have several files with same string, you can open them all and switch between them and use same search box.

Helen
speakingtohe is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
htm and html show as Zip format? charlie94 Calibre 6 07-19-2010 01:44 PM
Converting PD Books to Kindle Help doc_mart Amazon Kindle 6 11-25-2009 10:00 AM
Converting PDF Books to an easily readable format for PRS505 DaveC Sony Reader 3 10-01-2009 12:13 PM
Converting e-reader secured books to sony format Eliab Sony Reader 5 06-26-2009 11:13 AM
Any thoughts on converting from Sony (lrf) to Kindle (prc) format? Greg G Sony Reader 4 12-06-2007 11:59 PM


All times are GMT -4. The time now is 03:25 AM.


MobileRead.com is a privately owned, operated and funded community.