Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-23-2010, 01:02 PM   #1
purecharger
Junior Member
purecharger began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2010
Device: Kindle
HTML eBook punctuaction

I have an eBook in HTML format that uses numeric tags for many kinds of punctuation. Here is an example:

The driver<92>s side door blew wide

Where <92> is supposed to be an apostrophe.

This doesnt seem to follow any standard, and the HTML document does not specify an encoding.

Naturally when I convert this to MOBI using Calibre, there is no punctuation in the output.

Has anyone seen this before? And how should I handle conversion?
purecharger is offline   Reply With Quote
Old 09-23-2010, 01:10 PM   #2
cmdahler
Addict
cmdahler can name that song in three notescmdahler can name that song in three notescmdahler can name that song in three notescmdahler can name that song in three notescmdahler can name that song in three notescmdahler can name that song in three notescmdahler can name that song in three notescmdahler can name that song in three notescmdahler can name that song in three notescmdahler can name that song in three notescmdahler can name that song in three notes
 
Posts: 292
Karma: 24688
Join Date: Aug 2009
Device: Sony PRS-505, iPad
Open the HTML file in your favorite text editor and do a simple find and replace. If you are comfortable with a straight single quote, just find <92> and replace with ' and just hit find and replace all. Save the file and then run it through Calibre.

If you want a curled left quote, replace the <92> with "& # 1 4 6 ;" (without the quotes, and remove the spaces between the characters that I included) and you should be good to go.

For a handy table of HTML codes for various extended ASCII characters, go here.

Last edited by cmdahler; 09-23-2010 at 01:12 PM.
cmdahler is offline   Reply With Quote
Advert
Old 09-23-2010, 01:11 PM   #3
purecharger
Junior Member
purecharger began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2010
Device: Kindle
Quote:
Originally Posted by cmdahler View Post
Open the HTML file in your favorite text editor and do a simple find and replace. If you are comfortable with a straight single quote, just find <92> and replace with ' and just hit find and replace all. Save the file and then run it through Calibre.

If you want a curled left quote, replace the <92> with "’" (without the quote marks) and you should be good to go.

For a handy table of HTML codes for various extended ASCII characters, go here.
Thanks for the reply. I can/will do that, but I'm also curious if anyone has seen this before? 92 is not the code for apostrophe in anything I've seen.
purecharger is offline   Reply With Quote
Old 09-23-2010, 01:33 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
that's an apostrophe encode in cp1252 encoding. See http://calibre-ebook.com/user_manual...r-smart-quotes

for how to handle these kinds of files
kovidgoyal is offline   Reply With Quote
Old 09-23-2010, 01:43 PM   #5
purecharger
Junior Member
purecharger began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2010
Device: Kindle
Quote:
Originally Posted by kovidgoyal View Post
that's an apostrophe encode in cp1252 encoding. See http://calibre-ebook.com/user_manual...r-smart-quotes

for how to handle these kinds of files
Kovid, that was exactly it! I had seen this FAQ and followed the directions incorrectly (specifying encoding during conversion, not import). Works perfectly now.
purecharger is offline   Reply With Quote
Advert
Old 09-23-2010, 02:57 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by purecharger View Post
Thanks for the reply. I can/will do that, but I'm also curious if anyone has seen this before? 92 is not the code for apostrophe in anything I've seen.
It's the code for a "smart" apostrophe (curled to the left single quote) in several character encodings. ASCII has 27 for the generic apostrophe, but CP1252 uses 92 (hex) for the curved to the left single quote.
Starson17 is offline   Reply With Quote
Reply

Tags
html


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre Recipe HTML content differs from raw html of index.html. krunk Calibre 4 09-20-2010 09:48 PM
Sigil freezes when I + HTML ebook. Anarel Sigil 4 08-16-2010 11:13 AM
Von HTML zum eBook Hokuspokus Erste Hilfe 4 07-18-2010 11:52 AM
ebook-convert html to lrf dicknskip Calibre 1 05-11-2010 05:45 PM
(x)html ebook specification rogue_ronin Other formats 60 07-12-2009 01:13 AM


All times are GMT -4. The time now is 02:10 PM.


MobileRead.com is a privately owned, operated and funded community.