![]() |
#1 |
J
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 12590
Join Date: Mar 2009
Location: Canada
Device: SONY PRS 505/300, iRex DR800SG, Nook
|
double l's
Like everyone else in this forum I have to say that Calibre is GREAT. I have referred every friend I have that owns an ereader and told them that they need to help keep Calibre in business. Thanks for the great product.
![]() I have two quick how-to questions that I haven't figured out yet. I should probably post two threads but they are short questions. 1) I just converted a PDF to epub and nearly all my double l's (lowercase LL) get converted to a single l and a space. I have seen this before and I always just assumed that it was a problem with the PDF. I checked the PDF today and it looked fine. Any thoughts? 2) I have an epub file with some images in it. I would like to remove those images (for space not decency reasons!!). I have that option when I convert from PDF to epub. Is there anyway to do this when my starting point is epub? Thanks to everyone who makes this a great forum............Jackie |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
Litagures under look and feel for the double l's?
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
J
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 12590
Join Date: Mar 2009
Location: Canada
Device: SONY PRS 505/300, iRex DR800SG, Nook
|
Litagures??? According to Google this word is used when discussing Egyptian mathematics, child porn, rectal wash and/or artificial teeth. In the context of double l's I am stuck.
|
![]() |
![]() |
![]() |
#4 | ||
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,889
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
You didn't check any dictionaries, did you? Quote:
|
||
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 692
Karma: 27532
Join Date: Dec 2007
Device: Ebookwise 1150 / 1200
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,889
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#7 |
J
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 12590
Join Date: Mar 2009
Location: Canada
Device: SONY PRS 505/300, iRex DR800SG, Nook
|
Well if nothing else I learned a new word. Ligatures didn't fix the problem. I found out that my document had been written in another country. I tried removing the metadata from the pdf and lost half my letters. I assume that it is some sort of code page problem but I don't know for sure. I converted to RTF and fixed some of the really obvious ones myself and left the rest. Thanks for pointing that feature out to me...........Jackie
|
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
coming here in frustratin from my other thread. I cannot find any google referecnces to double-L being a ligature. only ff fi & some other combinations.
My bug report was thrown out as a duplicate issue and others are for sure having problems with ll become ing l+space in PDF convert. I have tried convertt to txt - still exists- examine pdfohtml intermediate output via calibre S&R wizard ( damage already done at that stage), tried ligatures on/ off - no difference. examined all PDF glyphs in source ( with another program) no special ll characters. searched forum ( hence arriving here ) though cannot have l or ll as search terms which does not help. this must be a calibre bug but I cannot find the original bug ticket which presumably is still open & being worked on by someone ? and yes I know this is an old thread but as it's still open old thread week, I believe... it could be an OCR issue, so if someone could kindly tell me whether calibre is using OCR in PDF conversion that would be progress, of sorts |
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,198
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
#10 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Calibre does not use OCR. Your PDF may have OCR text behind images of words. Calibre converts text not images of text, so if the text is wrong, Calibre's conversion is wrong. The text may have a single l, while the image of teh text has a double l. Or you may have ligatures. The engine Calibre uses can't handle all ligatures. Start by selecting the double-l word in your pdf, copy it, and paste it into notepad to see if that word pastes in correctly with a double or single l.
|
![]() |
![]() |
![]() |
#11 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
( google, OTOH ,has nothing relevant to say about ""ll glyph" ) BUT I have examined a list of all source document glyphs with another tool & see nothing resembling this case, so Is there another possible cause ? I have also converted with other software and all LL converted OK, but the other overall conversion was inferior to calibre's in other ways, ( Like losing images, warapping stuff in div tags, not scaling fonts well ) |
|
![]() |
![]() |
![]() |
#12 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
using acrobat, not my default pdfexch viewer, I can search for and exract text - here is the paragraph that I used at start of (mysterious case of diasappearing Ls ) thread; pasted from source PDF - the double Ls all look fine, yet see the posted epub conversion, below Safely on the other side of the stairwell housing, Ruth tilted her head up and let the cataract wash over her cataracts. She’d been scheduled to have phacoemulsification the week after martial law was declared. Now she was stuck with cloudy vision of a cloudy sky. She pulled some matted strands of hair away from her eyes, her fingers straying up her forehead, which seemed to go all the way to the back of her head. I've added the italics here, for thread clarity. the above text looks fine if I post it into notepad. calibre converts it to this Safely on the other side of the stairwel housing, Ruth tilted her head up and let the cataract wash over her cataracts. She’d been scheduled to have phacoemulsification the week after martial law was declared. Now she was stuck with cloudy vision of a cloudy sky. She pul ed some matted strands of hair away from her eyes, her fingers straying up her forehead, which seemed to go al the way to the back of her head. Maybe it was better she couldn’t see that wel . In her mind she could stil picture herself as she was. Abe, too. so, all ----al pulled ----pul ed still -----stil stairwell ----stairwel Last edited by cybmole; 01-24-2011 at 01:36 PM. |
|
![]() |
![]() |
![]() |
#13 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
You might want to cut out a page from your pdf with your double ll and paste it here.
|
![]() |
![]() |
![]() |
#14 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,198
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There's no point. This is not going to be fixed until the new calibre pdf engine is ready. It is a bug in the poppler pdftohtml program calibre uses. The new pdf engine has a replacement for pdftohtml that I wrote that fixes this issue, but until it is ready, there is no fix.
|
![]() |
![]() |
![]() |
#15 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
PDF has some tricky oddball ways of doing things. Essentially, it is a bunch of text characters with rules for where to put those characters on the page. It's based on a printer language where that makes some sense, but it's not easy for a conversion engine as there's no way to even be sure you've found the end of the sentence. You have to guess from where the text is put on the page and the text itself. It's possible that the double ll's are saved as a single l with a rule for putting in two of them. Calibre's conversion engine (not written by Kovid) does not know all rules. Acrobat does know them all.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem with double L's converting PDF to EPUB | TheFakeMoonMan | Conversion | 26 | 06-08-2018 05:18 PM |
Double TOC | crutledge | Calibre | 0 | 07-18-2010 12:29 PM |
0.7.7 converts double "l's" to single | stan1 | Calibre | 3 | 07-06-2010 03:03 AM |
double click | mfaine | Calibre | 1 | 11-09-2009 10:56 AM |
double trouble | SUSGOD | Bookeen | 16 | 11-05-2009 01:45 PM |