Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-06-2011, 02:36 AM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,774
Karma: 1089170
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
<TT>–</TT>

same regex book as in another thread, but this question got buried at end of thread, so I an reposting it for help:

- I am losing the hyphen from this line
"using the – metacharacter"
that is a copy + paste from the CHM source - but no matter how I do the conversion to epub or to mobi, i end up with this when I view the output:
"using the metacharacter"

I've tried ticking transliterate, tried cp1252 encoding ....

using view source on the chm I see this
Code:
 using the <TT>–</TT> metacharacter
in the output epub .xhtml it is simply missing. all that TT stuff is just not there and I see plain text -

but if I convert to mobi with same settings and send to Kindle then I see a question mark inside a box character, where the dash should be !

how do I get the line to convert correctly into epub ?

PS - what is even more puzzling is that elsewhere in the book, what seems to be the same html DOES convert OK - e.g. this line converted OK into epub.
- (hyphen) is a special metacharacter
source code followed by epub code - all correct
Code:
class="docText"><TT>-</TT> (hyphen) is a special metacharacter
( the calibre13 bits just sets a colour)
Code:
<p class="docText1"><tt class="calibre13">-</tt> (hyphen)
aha - when I look real closely - the 2 examples have different length hyphens - that explains why the 2nd example is OK but not what the special character in the 1st example is, or how to convert it. but i think it shows that the issue is with the hyphen, not with the TT stuff ?

Last edited by cybmole; 01-06-2011 at 03:01 AM.
cybmole is offline   Reply With Quote
Old 01-06-2011, 04:54 AM   #2
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 694
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
Calibre doesn't like soft hyphens and tends to strip them out. If you can edit the chm source, try changing the character that's being used to a different type of hyphen.
toddos is offline   Reply With Quote
 
Enthusiast
Old 01-06-2011, 04:59 AM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,774
Karma: 1089170
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
Quote:
Originally Posted by toddos View Post
Calibre doesn't like soft hyphens and tends to strip them out. If you can edit the chm source, try changing the character that's being used to a different type of hyphen.
what programs will open/edit .chm ?
cybmole is offline   Reply With Quote
Old 01-06-2011, 05:31 AM   #4
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 694
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
Quote:
Originally Posted by cybmole View Post
what programs will open/edit .chm ?
No idea, as I've never had to do that. Google shows some options. You could also try using Calibre's debug output (on the Conversion dialog, choose the Debug section on the left and give it a path). That will save the intermediate output steps that Calibre goes through during conversion. The resulting HTML might not yet have had the soft hyphens removed, in which case you could take a copy of the HTML output, edit it appropriately, and use that as input for an epub conversion.
toddos is offline   Reply With Quote
Old 01-06-2011, 06:15 AM   #5
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,774
Karma: 1089170
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
well i can patch it up manually with sigil. i think it is just 1 occurence in 1 book.
i was hoping to learn how to auto-fix it but that is looking unlikely .

the epub conversion is much easier to read/scroll through on pc than the original .chm

any idea why calibre strips out this soft hyphen ( if that is what it is) only on convert to epub - & not on convert to mobi ?
cybmole is offline   Reply With Quote
Old 01-06-2011, 11:34 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,810
Karma: 5006091
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
because many readers do not handle soft hyphens correctly.
kovidgoyal is online now   Reply With Quote
Old 01-06-2011, 01:33 PM   #7
sourcejedi
Groupie
sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.
 
sourcejedi's Avatar
 
Posts: 155
Karma: 200000
Join Date: Dec 2009
Location: Britania
Device: FBReader/OpenInkpot on Hanvon N516.
huh? Why would this be a soft hyphen?

Soft hyphens are used to indicate possible hyphenation points within a word, e.g. count-ing. The idea is they'll only be rendered if the 'reader has to break the word at that point.

If this was a soft hyphen, there'd be no reason to expect it to display at all, because its not inside a word!

Python says that the characters in this thread are not soft hyphens:

>>> import unicodedata
>>> unicodedata.name(u"-")
'HYPHEN-MINUS'
>>> unicodedata.name(u"–")
'EN DASH'

Maybe something mangled it on the way, but I can't imagine why it would use an en-dash instead of a normal hyphen.
sourcejedi is offline   Reply With Quote
Old 01-06-2011, 04:45 PM   #8
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
You're right - the character you initially posted is an en-dash, not a soft-hyphen. That said, PHPBB may be doing something as well - you should double-check the hyphen/dash displayed in the original source doc. Calibre often strips soft-hyphens during conversion, but en-dashes should be preserved.

Wikipedia article to explain which type to use when:
http://en.wikipedia.org/wiki/Dash
ldolse is offline   Reply With Quote
Old 01-07-2011, 07:41 AM   #9
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,774
Karma: 1089170
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
what i posted was obtained by view source ( a right click option within the displayed CHM page ) - copy - paste to thread
so should be an accurate reproduction of what is in the .chm source.

I buy in to the idea that it is actually an en-dash. it looks like one and it explains what I see on Kindle - Kindle does not do en-dash so it does the quesionmark in a box substitution.

that leaves us with the question of why calibre chm to epub conversion is discarding an en-dash ?. I don't know how to build a simple test .chm.

maybe its a bug which is specific to .chm sources. if it occurred with, say, html source, it would surely have been noticed and reported on already.
cybmole is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump


All times are GMT -4. The time now is 09:56 AM.


MobileRead.com is a privately owned, operated and funded community.