Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 08-01-2010, 12:47 AM   #1
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 692
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
Stripping out dashes on epub convert?

Running 0.7.12, I'm seeing conversion to epub strip (and sometimes paragraph-break) dashes. Source was originally PDF, but I captured debug output and inspected the HTML and the dashes were in the intermediate steps. I tried importing the "processed" HTML and converting to epub with that and the dashes were still stripped in the resulting epub.
toddos is online now   Reply With Quote
Old 08-01-2010, 01:58 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 24,762
Karma: 4369667
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
are these dashes normal hyphens or a special hyphen emdash or the like.
kovidgoyal is online now   Reply With Quote
Old 08-01-2010, 02:32 AM   #3
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 692
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
It looks like pdf2html converted them to 0x00AD, soft-hyphen rather than 0x002D, hyphen-minus. They're then stripped out between the final output from debugging and the actual epub creation.

The paragraph breaks at some of these hyphens appear to be bad line unwrapping on PDF conversion. I could play with line unwrapping to get a better PDF conversion and then manually convert the soft hyphens to regular hyphens.
toddos is online now   Reply With Quote
Old 08-01-2010, 10:39 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 24,762
Karma: 4369667
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Ah, well that explains it. IIRC calibre strips soft hyphens because various readers render them incorrectly as normal hyphens making text unreadabe.
kovidgoyal is online now   Reply With Quote
Old 08-01-2010, 03:27 PM   #5
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 692
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
I'm not sure why pdf2html converted those to soft hyphens. I guess that's how the PDF was made?

Manually fixing them worked, but of course it was a pain (turn on debugging, convert PDF to epub, grab the html output, modify that, import it, convert html to epub, clean up poor PDF line unwrapping in the epub with Sigil). I know it's my fault for wanting to convert PDF, but I hate hate hate PDF as a format
toddos is online now   Reply With Quote
Old 08-01-2010, 03:29 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 24,762
Karma: 4369667
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
With PDF everything is black magic. I've never seen such a messy format. My attitude is that if pdftohtml can't handle it, then I don't care about it
kovidgoyal is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Where is the stripping of DRM legal? duckbill News 38 09-02-2011 01:27 PM
Noobie and DRM-stripping thecyberphotog Workshop 7 12-17-2009 08:17 PM
BD and dashes problem Otter Sony Reader 1 09-25-2007 05:47 AM


All times are GMT -4. The time now is 12:05 AM.


MobileRead.com is a privately owned, operated and funded community.