Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-24-2011, 05:26 AM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
mysterious case of lost double Ls - and other letters

I have a pdf which is not converting properly to epub.

e.g. PDF source has the name Ellen,but in th epub it has become El en.

I cannot see why

Code:
 <p class="calibre3">One flight down, on the fourth landing, ear pressed against the door of 4B, El en Swenson clasped a hand over her mouth, suppressing the urge to cal out to her husband, Mike, who dozed sporadically in their apartment, behind their currently unlocked door. El en had left
some instances of Ellen are fine, but there are several like the above.

I cannot directly paste source text as it appears as an image on clip board.

is calibre attempting OCR and glitching on the double L's ? is there a workaround ?

another example - from the calibre epub viewer - see all the double L issues, whih are not in the pdf source:

Safely on the other side of the stairwel housing, Ruth tilted her head up and let the cataract wash over her cataracts. She’d been scheduled to have phacoemulsification the week after martial law was declared. Now she was stuck with cloudy vision of a cloudy sky. She pul ed some matted strands of hair away from her eyes, her fingers straying up her forehead, which seemed to go al the way to the back of her head. Maybe it was better she couldn’t see that wel . In her mind she could stil picture herself as she was. Abe, too.

fixing this with regex would be a nightmare
Attached Thumbnails
Click image for larger version

Name:	New Picture.jpg
Views:	327
Size:	12.7 KB
ID:	65396  

Last edited by cybmole; 01-24-2011 at 05:33 AM.
cybmole is offline   Reply With Quote
Old 01-24-2011, 05:42 AM   #2
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
This is called ligatures. It's a typographical feature. A simple forum search should turn up multiple threads offering solutions on how to deal with them.
Manichean is offline   Reply With Quote
Advert
Old 01-24-2011, 05:51 AM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by Manichean View Post
This is called ligatures. It's a typographical feature. A simple forum search should turn up multiple threads offering solutions on how to deal with them.
Does not compute - the name Ellen occurs hundreds of times in the text, only about 1 % of the time is is transformed to El en by the conversion. surely the word is typeset or whatever the same way every time.
I will go google & search on ligatures as you are often annoyingly correct. meantime I've sent the file to amazon to send to my kindle to see how their conversion software gets on

I am also running it thru pdftoepub as I type - that program shows a table of "pdf glyphs" and what it plans to do with them, before it gets going, I did not see any funnies in there

PS - quote - Koval - last May - "the next release of calibre will automatically convert ligatures to normal characters."

Last edited by cybmole; 01-24-2011 at 05:54 AM.
cybmole is offline   Reply With Quote
Old 01-24-2011, 06:09 AM   #4
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
OK - so I ran it through pfdtoepub, opened in sigil , searched for El en - no hits, searched for text from the above samples - they have converted OK.


do I file a calibre bug ? the pdf book is 1.5MB -not sure if that will attach to bug report
cybmole is offline   Reply With Quote
Old 01-24-2011, 06:25 AM   #5
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by cybmole View Post
Does not compute - the name Ellen occurs hundreds of times in the text, only about 1 % of the time is is transformed to El en by the conversion. surely the word is typeset or whatever the same way every time.
I will go google & search on ligatures as you are often annoyingly correct. meantime I've sent the file to amazon to send to my kindle to see how their conversion software gets on
Well, excuse me for saying so, but those are, as I said, ligatures. Look it up. I'm not that firm on typography, but I imagine your document is either badly laid out or it is not usual to contract every possible occurence into a ligature.
Also, only because Kovid said that Calibre would automatically convert those, doesn't mean it does that absolutely correct in every case- maybe you found a bug. Or, maybe, you ticked "keep ligatures" in the layout options.
Anyway, I'm going to shut up now before I annoy you further. Good luck in solving your issue.
Manichean is offline   Reply With Quote
Advert
Old 01-24-2011, 06:30 AM   #6
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by cybmole View Post
PS - quote - Koval - last May - "the next release of calibre will automatically convert ligatures to normal characters."
Did you check the box marked Keep ligatures under look and feel during conversion? This doesn't always work but it is worth a shot.

Quote:
Originally Posted by cybmole View Post
do I file a calibre bug ? the pdf book is 1.5MB -not sure if that will attach to bug report
You can file a bug report and attach the file, otherwise it will get lost and never have a chance of being corrected.
DoctorOhh is offline   Reply With Quote
Old 01-24-2011, 06:45 AM   #7
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by Manichean View Post
Well, excuse me for saying so, but those are, as I said, ligatures. Look it up. I'm not that firm on typography, but I imagine your document is either badly laid out or it is not usual to contract every possible occurence into a ligature.
Also, only because Kovid said that Calibre would automatically convert those, doesn't mean it does that absolutely correct in every case- maybe you found a bug. Or, maybe, you ticked "keep ligatures" in the layout options.
Anyway, I'm going to shut up now before I annoy you further. Good luck in solving your issue.
OK
i did not tick keep ligatures -
i did look stuff up
- I do have a good conversion via the other program.
your advice was helpful, if brusque.
I now know what a ligature is.

suspect it's combination of poor quality source + maybe a residual bug in calibre. I'll file a bug. - ticket 8459

Last edited by cybmole; 01-24-2011 at 06:49 AM.
cybmole is offline   Reply With Quote
Old 01-24-2011, 06:51 AM   #8
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
ps "keep ligatures" box

the default is not ticked
I take that to mean " replace ligatures with normal character pairs".

that seems the right choice for maximum compatibility with other ereader hardware & software ?

I'm done experimenting for now , I want to read go read the book on Kindle
but googling ligature ll
suggests a wrong diagnosis above

Last edited by cybmole; 01-24-2011 at 11:36 AM.
cybmole is offline   Reply With Quote
Old 01-24-2011, 03:53 PM   #9
Jabby
Jr. - Junior Member
Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.Jabby ought to be getting tired of karma fortunes by now.
 
Posts: 586
Karma: 2000358
Join Date: Aug 2010
Location: Alabama
Device: Archos, Asus, HP, Lenovo, Nexus and Samsung tablets in 7,8 and 10"
Quote:
Originally Posted by cybmole View Post
ps "keep ligatures" box

the default is not ticked
I take that to mean " replace ligatures with normal character pairs".

that seems the right choice for maximum compatibility with other ereader hardware & software ?

I'm done experimenting for now , I want to read go read the book on Kindle
but googling ligature ll
suggests a wrong diagnosis above
There might be a problem with the calibre conversion to mobi. I just had to drop back to 7.40 from 7.42 to get a clean epub to mobi conversion.

In my case the header wasn't centered and the font size was out of whack. As soon as I dropped back, all was well.

Suggest you drop back to 7.40 and try converting again to see what happens.

Regards -John
Jabby is offline   Reply With Quote
Old 01-24-2011, 09:35 PM   #10
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by Jabby View Post
There might be a problem with the calibre conversion to mobi. I just had to drop back to 7.40 from 7.42 to get a clean epub to mobi conversion.
It is always nice to reach out and lend a hand and I hope you keep on helping others. This thread though was specifically about a known problem that has been around forever when converting some PDFs to anything else. This thread hasn't even mentioned converting books to Mobi.

I would suggest that you please create a ticket with the info that you have gathered so this mobi conversion bug can be corrected. Please attach to the ticket the epub, the resultant mobi file and the job details available by clicking the jobs icon in the lower right area of calibre. This will be of immense help to everyone who is using calibre to convert to mobi.
DoctorOhh is offline   Reply With Quote
Old 03-05-2011, 02:58 PM   #11
alexandroid
Junior Member
alexandroid began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2011
Device: iPad
The bug for Calibre was posted a long time ago - http://bugs.calibre-ebook.com/ticket/1366 and it is waiting for a fix in underlying pdf2html engine Calibre uses. The bug for PDF2HTML also old and sits there since 2008: https://sourceforge.net/tracker/?fun...39&atid=444239
alexandroid is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Classic PDF on the nook - a lost case? LorenzL Barnes & Noble NOOK 11 09-07-2010 12:07 AM
Another Mysterious Screen Issue MelC Bookeen 16 08-18-2009 09:44 AM
Mysterious Island mtravellerh Deals and Resources (No Self-Promotion or Affiliate Links) 0 12-27-2008 05:28 AM
Mysterious Companion software sarikan iRex 6 11-27-2008 04:37 AM
The Mysterious Island kevinofengland Sony Reader 6 09-27-2008 08:52 AM


All times are GMT -4. The time now is 09:52 AM.


MobileRead.com is a privately owned, operated and funded community.