Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-18-2010, 12:00 PM   #1
TheFakeMoonMan
Junior Member
TheFakeMoonMan began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2010
Device: Sony PRS300
Problem with double L's converting PDF to EPUB

Hello,

I checked the FAQs, but couldn't find anything on this problem.

I converted a few books from PDFs to EPUBs and it seems that some of the double L's in certain words (Examples: carefully, filled, and well) don't convert properly. The second L in the word is missing and in place of it is a space.

However, it's not all of the double L's. I found some words like 'totally' and 'falling' which worked perfectly.

This isn't a huge problem as I can typically figure out what the word is, but still, if anyone knows how to fix this I'd greatly appreciate it.

Thanks,
Jeff
TheFakeMoonMan is offline   Reply With Quote
Old 05-18-2010, 12:02 PM   #2
TheFakeMoonMan
Junior Member
TheFakeMoonMan began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2010
Device: Sony PRS300
Oh, I forgot to mention this: I checked the PDF version on my computer and it look fine, but the EPUB version, again checked on my computer, had the problem.
TheFakeMoonMan is offline   Reply With Quote
 
Enthusiast
Old 05-18-2010, 12:09 PM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TheFakeMoonMan View Post
The second L in the word is missing and in place of it is a space.However, it's not all of the double L's.
It may be that the double ll is a grapheme in which two letters are combined into a single glyph. Where it worked, they just used two l's. I'm not enough of a font expert to know how to fix this correctly. If you can search and replace in a format that you like, that's where I'd start. You can also try opening the file in a hex editor (I use UltraEdit) to find what code is used and then S&R there.
Starson17 is offline   Reply With Quote
Old 05-18-2010, 12:11 PM   #4
TheFakeMoonMan
Junior Member
TheFakeMoonMan began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2010
Device: Sony PRS300
Thanks for your quick reply!

What file should I try opening in the hex editor?
TheFakeMoonMan is offline   Reply With Quote
Old 05-18-2010, 12:32 PM   #5
chaley
"chaley", not "charley"
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 5,639
Karma: 1137414
Join Date: Jan 2010
Location: France
Device: Many android devices
Quote:
Originally Posted by TheFakeMoonMan View Post
I converted a few books from PDFs to EPUBs and it seems that some of the double L's in certain words (Examples: carefully, filled, and well) don't convert properly. The second L in the word is missing and in place of it is a space.
The document is using what are called ligatures, which make certain character pairs smaller. Common are ff, fi, fl, ffl, ffi. There are others, such as the one you noted, ll.

IIRC, Kovid said that the new PDF conversion engine handles ligatures correctly. I don't know its release status, though.

Last edited by chaley; 05-18-2010 at 12:33 PM. Reason: note ll as a ligature.
chaley is offline   Reply With Quote
Old 05-18-2010, 12:47 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TheFakeMoonMan View Post
Thanks for your quick reply!

What file should I try opening in the hex editor?
Which file shows the ligatures? Wait - I see it's the pdf. That's not an easy format to handle. Can you convert to any other format that shows the ligatures? If not, look at the EPUB contents. (Add .zip to the end and extract it - it's just a zip file) Sometimes the "spaces" where the ligatures are will be codes you can search and replace.

You'll have to try various formats, and even then, this may not work. Also try exporting from the pdf as text using Acrobat. There are lots of things that might work, but none I can say with certainty will.
Starson17 is offline   Reply With Quote
Old 05-18-2010, 02:52 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,125
Karma: 5381911
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by chaley View Post
The document is using what are called ligatures, which make certain character pairs smaller. Common are ff, fi, fl, ffl, ffi. There are others, such as the one you noted, ll.

IIRC, Kovid said that the new PDF conversion engine handles ligatures correctly. I don't know its release status, though.
Yes, the new engine does (at least on the test pdf docs I have tried). But it's going to be a while before I can find the time to finish it.
kovidgoyal is offline   Reply With Quote
Old 05-18-2010, 05:09 PM   #8
dmapr
Addict
dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.
 
Posts: 259
Karma: 90958
Join Date: Sep 2009
Device: PRS-950, Kobo Aura HD
IMHO, the easiest way to handle that is use a font that contains the necessary ligatures and embed it in your epub file (or if you're up to hacking your reader, replace the default fonts on the reader). Alternatively you can convert from PDF to HTML first and clean up a bit, including the ligatures.
dmapr is offline   Reply With Quote
Old 05-20-2010, 07:20 PM   #9
TheFakeMoonMan
Junior Member
TheFakeMoonMan began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2010
Device: Sony PRS300
Quote:
Originally Posted by dmapr View Post
IMHO, the easiest way to handle that is use a font that contains the necessary ligatures and embed it in your epub file (or if you're up to hacking your reader, replace the default fonts on the reader). Alternatively you can convert from PDF to HTML first and clean up a bit, including the ligatures.
Thanks for your reply, but I don't have a clue about how to do this. Is there a guide out there outlining this process?

Thanks,
Jeff
TheFakeMoonMan is offline   Reply With Quote
Old 05-20-2010, 07:39 PM   #10
dmapr
Addict
dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.
 
Posts: 259
Karma: 90958
Join Date: Sep 2009
Device: PRS-950, Kobo Aura HD
Quote:
Originally Posted by TheFakeMoonMan View Post
Thanks for your reply, but I don't have a clue about how to do this. Is there a guide out there outlining this process?

Thanks,
Jeff
Jeff,

check out these threads here:
http://www.mobileread.com/forums/showthread.php?t=36361
http://www.mobileread.com/forums/showthread.php?t=66102
dmapr is offline   Reply With Quote
Old 05-20-2010, 10:00 PM   #11
TheFakeMoonMan
Junior Member
TheFakeMoonMan began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2010
Device: Sony PRS300
Okay, fantastic!

I've successfully replaced the font in the EPUB with Fontin (like the guy used in his tutorial).

However, I still have the problem.

How should I go about finding a font that will work? Just trial and error?

Thanks again
TheFakeMoonMan is offline   Reply With Quote
Old 05-21-2010, 08:47 AM   #12
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TheFakeMoonMan View Post
How should I go about finding a font that will work? Just trial and error?
It appears you see the ligatures in the pdf? If so, you can identify the font used in the original pdf with Acrobat. Here's instuction s for Acrobat 9 Professional:

1. Open the PDF, and choose Advanced > Print Production > Output Preview

2. Select "Object Inspector" for the Preview.

3. Click on the ligature, and the font that was used should be displayed in the Output Preview panel.

Then you can try to find that font elsewhere. This is the "correct" method, and works for most normal fonts. However, there's no guarantee that you'll be able to find the font you need for the EPUB.
Starson17 is offline   Reply With Quote
Old 05-21-2010, 10:20 AM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,125
Karma: 5381911
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
the next release of calibre will automatically convert ligatures to normal characters.
kovidgoyal is offline   Reply With Quote
Old 05-21-2010, 03:32 PM   #14
dmapr
Addict
dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.dmapr composes epic poetry in binary.
 
Posts: 259
Karma: 90958
Join Date: Sep 2009
Device: PRS-950, Kobo Aura HD
Quote:
Originally Posted by TheFakeMoonMan View Post
Okay, fantastic!

I've successfully replaced the font in the EPUB with Fontin (like the guy used in his tutorial).

However, I still have the problem.

How should I go about finding a font that will work? Just trial and error?

Thanks again
I've found that the Cambria font has all the ligatures that I ever ran into in PDF to EPUB conversion and looks pretty well on the reader. Do a Google search for cambria.ttf if you don't have one handy.

Another option is to check dafont.com, they allow you to preview the fonts. Copy/Paste the ligatures from the PDF and see which fonts support them.

Last edited by dmapr; 05-21-2010 at 03:42 PM.
dmapr is offline   Reply With Quote
Old 06-15-2010, 02:56 PM   #15
WHA1949
Junior Member
WHA1949 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jun 2010
Device: PALM Zire
PDF with Minion Pro Font gives problems

I just want to add that I have a similar problem when I convert a PDF containg the font Minion Pro. The original file is in german and could be easily mapped to a standard character set.

But combination like Th or ff or fi cause the conversion to HTML to go wrong.

I am looking forward to the new PDF conversion.

I am using calibre 0-72.
WHA1949 is offline   Reply With Quote
Reply

Tags
conversion, epub, letter, letter l, ligatures, pdf

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
double l's jjansen Conversion 33 04-29-2013 07:52 AM
Problem converting pdf to epub (size) using calibre abadguy PDF 6 03-23-2012 05:33 AM
Problem with accents converting PDF to EPUB madeira Calibre 0 07-09-2010 05:15 PM
Problem converting PDF to EPUB in calibre adgpro Calibre 2 07-09-2010 01:10 AM
Problem converting pdf to epub smartin Calibre 3 05-02-2010 06:55 AM


All times are GMT -4. The time now is 08:51 PM.


MobileRead.com is a privately owned, operated and funded community.