Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 03-28-2011, 03:18 AM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,793
Karma: 1089170
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
pdf to epub - trouble with FL

I try convert on default settings & see all words like flights being messed up. seems that f is replaced with a strange compound character.

thus flights becomes ϐlights.

if I tick the keep ligatures option then flights becomes blights.

so is there an option which will fix this ?

PS some f characters are ok, thus some f words are OK, I cannot yet suss the rule that causes some f to be plan f and others to be not, in the PDF.

I can go ahead & patch up with sigil, as this source is potentially better than my existing lit source of same novel, but there could be other weird characters that I've not spotted yet.

sigil finds &replaces 1050 instances of ϐ - what is this ϐ thing anyway ???

Last edited by cybmole; 03-28-2011 at 03:44 AM.
cybmole is offline   Reply With Quote
Old 03-28-2011, 05:23 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
fl is sometimes a ligature, the same as ff, ll, etc. i.e. just like those others it's broken and there's nothing to be done about it.
ldolse is offline   Reply With Quote
Old 03-28-2011, 05:45 AM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,793
Karma: 1089170
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
hmm I understand double letter ligature thing ( we had a long discssion on that a while back ) ,

but in this book, words like fight convert to ϐight so it's an issue with (some) single letters also. - yet fill, flip are OK !

fixing up with sigil seems to have done the trick though, unless I come across other glitches once I am reading the conversion.
cybmole is offline   Reply With Quote
Old 03-28-2011, 07:26 AM   #4
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by cybmole View Post
[...] what is this ϐ thing anyway ???
It's a contraction of s and z into one letter, called "esszett" in german and typically used in the german language. It's used in places where a sharp 's' occurs, although it's use has been reduced since the orthographic "reform" of the '90s. I believe the letter originated as a ligature itself, although it's beyond me why Calibre would replace a ligature of 'fl' by one of 'sz'.

Hey, you asked
Manichean is offline   Reply With Quote
Old 03-28-2011, 07:43 AM   #5
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by Manichean View Post
although it's beyond me why Calibre would replace a ligature of 'fl' by one of 'sz'.

Hey, you asked
Most likely because the pdf used a custom font which mapped to a proprietary code point which happened to map to ϐ in the real world.
ldolse is offline   Reply With Quote
Old 03-28-2011, 08:14 AM   #6
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,793
Karma: 1089170
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
Quote:
Originally Posted by ldolse View Post
Most likely because the pdf used a custom font which mapped to a proprietary code point which happened to map to ϐ in the real world.
makes sense - calibre conversion is not replacing fl - it is replacing ( some but not all ) instances of f at start of words.

I think if I went back and re-ran the conversion with a correct search replace regex - find ϐ, replace with f - then I could get it to work in calibre

- but Ive patched it in sigil now.
cybmole is offline   Reply With Quote
Old 03-28-2011, 06:22 PM   #7
dwig
Guru
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 969
Karma: 1382338
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Dell Venue 8 Pro, Kindle 3/WiFi - Retired:Clie UX50, T415, ...
Quote:
Originally Posted by cybmole View Post
makes sense - calibre conversion is not replacing fl - it is replacing ( some but not all ) instances of f at start of words. ...
It might not be that Calibre is replacing characters.

There are some PDFs that are created by scanning a book and using OCR to create a text layer while retaining the scanned image layer. When you view these in Acrobat Reader you see the scanned layer, but when you do a word search or selection, you access the "hidden" text layer.

Try saving the PDF as text from Acrobat Reader and examine the resulting file. If the f's are betas or esszetts then the fault is in the PDF and its creation.
dwig is offline   Reply With Quote
Old 03-29-2011, 02:03 AM   #8
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,793
Karma: 1089170
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
good call - let me test & report....

okI did 3 tests using the same sentence - copy from adobe, paste
1) to Word,
2) to notepad++
3) directly to here.

in all 3 tests the leading f simply vanished from my test sentence i.e. flights became lights
There was no smoking on shuttle
lights.



[ c.f. flights became ϐlights in calibre epub output).

so calibre actually did a better job than word in that the calibre output was fixable in regex, whereas distinguishing lost f in word would be impossible




so would ALL pdf convert programs fall at this hurdle ???

Last edited by cybmole; 03-29-2011 at 02:08 AM.
cybmole is offline   Reply With Quote
Old 03-29-2011, 07:45 AM   #9
dwig
Guru
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 969
Karma: 1382338
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Dell Venue 8 Pro, Kindle 3/WiFi - Retired:Clie UX50, T415, ...
Quote:
Originally Posted by cybmole View Post
...

okI did 3 tests using the same sentence - copy from adobe, paste
1) to Word,
2) to notepad++
3) directly to here.

...
Sounds like the problem is in the PDF and not being caused by the conversion.

Another experiment:

1. open the txt file exported from Acrobat Reader in Notepad++
2. change the encoding to UTF-8 using the entry on the Encoding menu
3. save the file as TXT
4. convert using Calibre.
dwig is offline   Reply With Quote
Old 03-29-2011, 09:32 AM   #10
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,051
Karma: 777825
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Quote:
Originally Posted by cybmole View Post
so would ALL pdf convert programs fall at this hurdle ???
On this particular file I would say that the answer is yes.

It sounds like the file in question was created by scanning in image on the book and then applying OCR technology to create the underlying text. In this case some characters were not recognised correctly at the OCR stage. Unless you had some way of re-applying the OCR step (and doing a better job than the original program) then all conversion programs are going to fail with this PDF file.

Many (possibly the majority) PDF file are created from the original word processed document. In such a case the PDF file does not have the overlaying image and the underlying text is complete so a conversion program has a chance. However PDF conversion is still a little fraught ever with files created this way because of tricks that PDF does (ligatures, absolute placement of text, special symbols, etc) that a conversion program can struggle to understand and convert sensibly.
itimpi is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
pdf to epub trouble LittleRach Calibre 3 09-30-2010 09:38 PM
Trouble Converting PDF? federalbetrayal Calibre 1 09-28-2010 07:35 PM
iPhone PDF Trouble steffen4567 Apple Devices 2 09-04-2010 11:01 PM
Trouble with a large PDF ccowie Calibre 5 10-08-2009 09:58 PM
Trouble with DRM ePub JSWolf Sony Reader 12 07-28-2008 08:16 PM


All times are GMT -4. The time now is 10:41 PM.


MobileRead.com is a privately owned, operated and funded community.