Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 03-29-2009, 05:18 PM   #1
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,866
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Best way to get clean HTML

I'm wondering is there a way to get clean HTML out of a Mobipocket eBook that does not have all the junk you get from a Mobipocket HTML file?

Every paragraph is loaded with junk. What I'd like is if this junk could somehow be converted into CSS so it would be easy to edit the CSS instead of having to fool around with all the junk.

Here is an example of what I mean...

Code:
<div style="margin-top: 6"/><div style="text-indent: 1em"><font size="3">“What I was going to ask your boss, Charley, is if there is some good reason you can’t go to Buenos Aires right now.”</font></div><div style="margin-top: 6"/>
JSWolf is offline   Reply With Quote
Old 03-29-2009, 06:13 PM   #2
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
TidyHTML maybe ?
Hadrien is offline   Reply With Quote
Old 03-29-2009, 06:37 PM   #3
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linkpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by JSWolf View Post
I'm wondering is there a way to get clean HTML out of a Mobipocket eBook that does not have all the junk you get from a Mobipocket HTML file?
If I remember correctly this depends on the book. You do not get it for all books so it is nothing inherent in the format.
tompe is offline   Reply With Quote
Old 03-29-2009, 06:44 PM   #4
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by tompe View Post
If I remember correctly this depends on the book. You do not get it for all books so it is nothing inherent in the format.
I disagree, for some things you do not have much of a choice and you need to use junk with Mobipocket.
Hadrien is offline   Reply With Quote
Old 03-31-2009, 07:55 AM   #5
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linkpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by Hadrien View Post
I disagree, for some things you do not have much of a choice and you need to use junk with Mobipocket.
Yes, if you want to force a specific formatting. But for a straightforward book formatted as a standard paperback you should be able to use clean html.
tompe is offline   Reply With Quote
Old 03-31-2009, 08:08 AM   #6
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,866
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
I think it's time Mobipocket & AZW all went away. They mess they make of well formatted HTML is not nice.
JSWolf is offline   Reply With Quote
Old 03-31-2009, 02:21 PM   #7
All4Fun
Zealot
All4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-books
 
Posts: 149
Karma: 937
Join Date: Mar 2009
Device: iPad, Blackberry Bold
Quote:
Originally Posted by Hadrien View Post
TidyHTML maybe ?
So will TidyHTML do the trick?
All4Fun is offline   Reply With Quote
Old 03-31-2009, 04:53 PM   #8
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
Quote:
Originally Posted by JSWolf View Post
... What I'd like is if this junk could somehow be converted into CSS so it would be easy to edit the CSS ...
Yes there is a tool called cssutils that parses out style in HTML and creates CSS file.

=X=
=X= is offline   Reply With Quote
Old 03-31-2009, 04:56 PM   #9
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linkpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by JSWolf View Post
I think it's time Mobipocket & AZW all went away. They mess they make of well formatted HTML is not nice.
Yes, a good idea. DRM stopping working on all MobiPocket files will be the final death of DRM.
tompe is offline   Reply With Quote
Old 04-01-2009, 04:45 AM   #10
ericshliao
Guru
ericshliao will become famous soon enoughericshliao will become famous soon enoughericshliao will become famous soon enoughericshliao will become famous soon enoughericshliao will become famous soon enoughericshliao will become famous soon enough
 
Posts: 976
Karma: 687
Join Date: Nov 2007
Device: Dell X51v; iLiad v2
Quote:
Originally Posted by JSWolf View Post
I'm wondering is there a way to get clean HTML out of a Mobipocket eBook that does not have all the junk you get from a Mobipocket HTML file?
It's a problem bothering me for some time, too. I just want a clean html file with simple html tag, such as <H1>, <H2>,<P> from MS Word file.
ericshliao is offline   Reply With Quote
Old 04-01-2009, 06:00 AM   #11
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,450
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
html Tidy
and
demoroniser
kacir is offline   Reply With Quote
Old 04-01-2009, 07:32 AM   #12
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by Hadrien View Post
I disagree, for some things you do not have much of a choice and you need to use junk with Mobipocket.
Even so, the amount of junk you have to use (and which is recognized by mobipocket readers) is very limited. The use of the normal <P>, <DIV>, <I>, etc. tags plus properties like WIDTH, HEIGHT and ALIGN is often enough. Add <FONT> with SIZE and COLOR and I think that's about the only needed junk.
Jellby is offline   Reply With Quote
Old 04-01-2009, 07:57 AM   #13
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
If the input and output are consistent, I could write a specific cleanup program for it. Anyone interested?
Nate the great is offline   Reply With Quote
Old 04-01-2009, 09:34 AM   #14
Sweetpea
Grand Sorcerer
Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.
 
Sweetpea's Avatar
 
Posts: 9,707
Karma: 32763414
Join Date: Dec 2008
Location: Krewerd
Device: Pocketbook Inkpad 4 Color; Samsung Galaxy Tab S6
I generally use regular expression search and replace...

To take your example (replaced your weird characters with the quotes for readability):

Code:
<div style="margin-top: 6"/>
<div style="text-indent: 1em"><font size="3">"What I was going to ask your boss, Charley, is if there is some good reason you can't go to Buenos Aires right now."</font></div>
<div style="margin-top: 6"/>
in my style:

.emptyLine { margin-top: 6em; }
p { text-indent: 1em; font-size: normal; }


<div style="margin-top: 6" /> would be replaced with <div class="emptyLine" />
<div style="text-indent: 1em"><font size="3"> would be replaced with <p>
</font></div> would be replaced by </p>



I generally start with headers and other exceptions (there are less headers than paragraphs, generally ). Then I create an epub out of it, check it, fix any errors and repeat the checking process until it's clean.
Sweetpea is offline   Reply With Quote
Old 04-01-2009, 12:31 PM   #15
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
Quote:
Originally Posted by Nate the great View Post
If the input and output are consistent, I could write a specific cleanup program for it. Anyone interested?
Yes I would be very interested.

=X=
=X= is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
clean HTML or PDF before mobi conversion in Calibre mark235 Calibre 9 12-25-2010 09:37 PM
BookDesigner HTML0 to clean HTML conversion utility Pablo Workshop 15 08-24-2010 12:05 PM
Clean and compress HTML before making ebook eping Workshop 4 01-13-2010 07:51 PM
Tool to easily clean and refurbish html-text before conversion Pulp Workshop 3 10-13-2008 10:16 AM
Docvert 2.0 converts MS Word files to clean HTML Alexander Turcic Lounge 0 03-16-2006 04:50 AM


All times are GMT -4. The time now is 01:25 AM.


MobileRead.com is a privately owned, operated and funded community.