Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 08-01-2010, 01:47 AM   #1
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 694
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
Stripping out dashes on epub convert?

Running 0.7.12, I'm seeing conversion to epub strip (and sometimes paragraph-break) dashes. Source was originally PDF, but I captured debug output and inspected the HTML and the dashes were in the intermediate steps. I tried importing the "processed" HTML and converting to epub with that and the dashes were still stripped in the resulting epub.
toddos is offline   Reply With Quote
Old 08-01-2010, 02:58 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,438
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
are these dashes normal hyphens or a special hyphen emdash or the like.
kovidgoyal is offline   Reply With Quote
 
Advertisement
Old 08-01-2010, 03:32 AM   #3
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 694
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
It looks like pdf2html converted them to 0x00AD, soft-hyphen rather than 0x002D, hyphen-minus. They're then stripped out between the final output from debugging and the actual epub creation.

The paragraph breaks at some of these hyphens appear to be bad line unwrapping on PDF conversion. I could play with line unwrapping to get a better PDF conversion and then manually convert the soft hyphens to regular hyphens.
toddos is offline   Reply With Quote
Old 08-01-2010, 11:39 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,438
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Ah, well that explains it. IIRC calibre strips soft hyphens because various readers render them incorrectly as normal hyphens making text unreadabe.
kovidgoyal is offline   Reply With Quote
Old 08-01-2010, 04:27 PM   #5
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 694
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
I'm not sure why pdf2html converted those to soft hyphens. I guess that's how the PDF was made?

Manually fixing them worked, but of course it was a pain (turn on debugging, convert PDF to epub, grab the html output, modify that, import it, convert html to epub, clean up poor PDF line unwrapping in the epub with Sigil). I know it's my fault for wanting to convert PDF, but I hate hate hate PDF as a format
toddos is offline   Reply With Quote
Old 08-01-2010, 04:29 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,438
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
With PDF everything is black magic. I've never seen such a messy format. My attitude is that if pdftohtml can't handle it, then I don't care about it
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Where is the stripping of DRM legal? duckbill News 38 09-02-2011 02:27 PM
Noobie and DRM-stripping thecyberphotog Workshop 7 12-17-2009 09:17 PM
BD and dashes problem Otter Sony Reader 1 09-25-2007 06:47 AM


All times are GMT -4. The time now is 05:01 PM.


MobileRead.com is a privately owned, operated and funded community.