Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 03-29-2009, 05:18 PM   #1
JSWolf
Suspended
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
Posts: 35,392
Karma: 16147088
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
Best way to get clean HTML

I'm wondering is there a way to get clean HTML out of a Mobipocket eBook that does not have all the junk you get from a Mobipocket HTML file?

Every paragraph is loaded with junk. What I'd like is if this junk could somehow be converted into CSS so it would be easy to edit the CSS instead of having to fool around with all the junk.

Here is an example of what I mean...

Code:
<div style="margin-top: 6"/><div style="text-indent: 1em"><font size="3">“What I was going to ask your boss, Charley, is if there is some good reason you can’t go to Buenos Aires right now.”</font></div><div style="margin-top: 6"/>
JSWolf is offline   Reply With Quote
Old 03-29-2009, 06:13 PM   #2
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,265
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
TidyHTML maybe ?
Hadrien is offline   Reply With Quote
Old 03-29-2009, 06:37 PM   #3
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 6,886
Karma: 2753841
Join Date: Oct 2007
Location: Linkpng, Sweden
Device: Nexus 7, Nexus 4, iPad 2, Notion Ink Adam Qi, Kindle WiFi, Kindle PW
Quote:
Originally Posted by JSWolf View Post
I'm wondering is there a way to get clean HTML out of a Mobipocket eBook that does not have all the junk you get from a Mobipocket HTML file?
If I remember correctly this depends on the book. You do not get it for all books so it is nothing inherent in the format.
tompe is online now   Reply With Quote
Old 03-29-2009, 06:44 PM   #4
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,265
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by tompe View Post
If I remember correctly this depends on the book. You do not get it for all books so it is nothing inherent in the format.
I disagree, for some things you do not have much of a choice and you need to use junk with Mobipocket.
Hadrien is offline   Reply With Quote
Old 03-31-2009, 07:55 AM   #5
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 6,886
Karma: 2753841
Join Date: Oct 2007
Location: Linkpng, Sweden
Device: Nexus 7, Nexus 4, iPad 2, Notion Ink Adam Qi, Kindle WiFi, Kindle PW
Quote:
Originally Posted by Hadrien View Post
I disagree, for some things you do not have much of a choice and you need to use junk with Mobipocket.
Yes, if you want to force a specific formatting. But for a straightforward book formatted as a standard paperback you should be able to use clean html.
tompe is online now   Reply With Quote
Old 03-31-2009, 08:08 AM   #6
JSWolf
Suspended
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
Posts: 35,392
Karma: 16147088
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
I think it's time Mobipocket & AZW all went away. They mess they make of well formatted HTML is not nice.
JSWolf is offline   Reply With Quote
Old 03-31-2009, 02:21 PM   #7
All4Fun
Zealot
All4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-booksAll4Fun has learned how to read e-books
 
Posts: 149
Karma: 937
Join Date: Mar 2009
Device: iPad, Blackberry Bold
Quote:
Originally Posted by Hadrien View Post
TidyHTML maybe ?
So will TidyHTML do the trick?
All4Fun is offline   Reply With Quote
Old 03-31-2009, 04:53 PM   #8
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,672
Karma: 12205306
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
Quote:
Originally Posted by JSWolf View Post
... What I'd like is if this junk could somehow be converted into CSS so it would be easy to edit the CSS ...
Yes there is a tool called cssutils that parses out style in HTML and creates CSS file.

=X=
=X= is offline   Reply With Quote
Old 03-31-2009, 04:56 PM   #9
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 6,886
Karma: 2753841
Join Date: Oct 2007
Location: Linkpng, Sweden
Device: Nexus 7, Nexus 4, iPad 2, Notion Ink Adam Qi, Kindle WiFi, Kindle PW
Quote:
Originally Posted by JSWolf View Post
I think it's time Mobipocket & AZW all went away. They mess they make of well formatted HTML is not nice.
Yes, a good idea. DRM stopping working on all MobiPocket files will be the final death of DRM.
tompe is online now   Reply With Quote
Old 04-01-2009, 04:45 AM   #10
ericshliao
Guru
ericshliao will become famous soon enoughericshliao will become famous soon enoughericshliao will become famous soon enoughericshliao will become famous soon enoughericshliao will become famous soon enoughericshliao will become famous soon enough
 
Posts: 973
Karma: 687
Join Date: Nov 2007
Device: Dell X51v; iLiad v2
Quote:
Originally Posted by JSWolf View Post
I'm wondering is there a way to get clean HTML out of a Mobipocket eBook that does not have all the junk you get from a Mobipocket HTML file?
It's a problem bothering me for some time, too. I just want a clean html file with simple html tag, such as <H1>, <H2>,<P> from MS Word file.
ericshliao is offline   Reply With Quote
Old 04-01-2009, 06:00 AM   #11
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 2,680
Karma: 2799391
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
html Tidy
and
demoroniser
kacir is offline   Reply With Quote
Old 04-01-2009, 07:32 AM   #12
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 5,794
Karma: 4027751
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon
Quote:
Originally Posted by Hadrien View Post
I disagree, for some things you do not have much of a choice and you need to use junk with Mobipocket.
Even so, the amount of junk you have to use (and which is recognized by mobipocket readers) is very limited. The use of the normal <P>, <DIV>, <I>, etc. tags plus properties like WIDTH, HEIGHT and ALIGN is often enough. Add <FONT> with SIZE and COLOR and I think that's about the only needed junk.
Jellby is online now   Reply With Quote
Old 04-01-2009, 07:57 AM   #13
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 10,327
Karma: 2897207
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
If the input and output are consistent, I could write a specific cleanup program for it. Anyone interested?
Nate the great is offline   Reply With Quote
Old 04-01-2009, 09:34 AM   #14
Sweetpea
Grand Sorcerer
Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.
 
Sweetpea's Avatar
 
Posts: 7,914
Karma: 22621808
Join Date: Dec 2008
Location: Krewerd
Device: HTC Flyer; BBMini; Sony PRS650
I generally use regular expression search and replace...

To take your example (replaced your weird characters with the quotes for readability):

Code:
<div style="margin-top: 6"/>
<div style="text-indent: 1em"><font size="3">"What I was going to ask your boss, Charley, is if there is some good reason you can't go to Buenos Aires right now."</font></div>
<div style="margin-top: 6"/>
in my style:

.emptyLine { margin-top: 6em; }
p { text-indent: 1em; font-size: normal; }


<div style="margin-top: 6" /> would be replaced with <div class="emptyLine" />
<div style="text-indent: 1em"><font size="3"> would be replaced with <p>
</font></div> would be replaced by </p>



I generally start with headers and other exceptions (there are less headers than paragraphs, generally ). Then I create an epub out of it, check it, fix any errors and repeat the checking process until it's clean.
Sweetpea is offline   Reply With Quote
Old 04-01-2009, 12:31 PM   #15
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,672
Karma: 12205306
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
Quote:
Originally Posted by Nate the great View Post
If the input and output are consistent, I could write a specific cleanup program for it. Anyone interested?
Yes I would be very interested.

=X=
=X= is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
clean HTML or PDF before mobi conversion in Calibre mark235 Calibre 9 12-25-2010 09:37 PM
BookDesigner HTML0 to clean HTML conversion utility Pablo Workshop 15 08-24-2010 12:05 PM
Clean and compress HTML before making ebook eping Workshop 4 01-13-2010 07:51 PM
Tool to easily clean and refurbish html-text before conversion Pulp Workshop 3 10-13-2008 10:16 AM
Docvert 2.0 converts MS Word files to clean HTML Alexander Turcic Lounge 0 03-16-2006 04:50 AM


All times are GMT -4. The time now is 11:26 AM.


MobileRead.com is a privately owned, operated and funded community.