Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 03-28-2011, 03:18 AM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
pdf to epub - trouble with FL

I try convert on default settings & see all words like flights being messed up. seems that f is replaced with a strange compound character.

thus flights becomes ϐlights.

if I tick the keep ligatures option then flights becomes blights.

so is there an option which will fix this ?

PS some f characters are ok, thus some f words are OK, I cannot yet suss the rule that causes some f to be plan f and others to be not, in the PDF.

I can go ahead & patch up with sigil, as this source is potentially better than my existing lit source of same novel, but there could be other weird characters that I've not spotted yet.

sigil finds &replaces 1050 instances of ϐ - what is this ϐ thing anyway ???

Last edited by cybmole; 03-28-2011 at 03:44 AM.
cybmole is offline   Reply With Quote
Old 03-28-2011, 05:23 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
fl is sometimes a ligature, the same as ff, ll, etc. i.e. just like those others it's broken and there's nothing to be done about it.
ldolse is offline   Reply With Quote
Old 03-28-2011, 05:45 AM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
hmm I understand double letter ligature thing ( we had a long discssion on that a while back ) ,

but in this book, words like fight convert to ϐight so it's an issue with (some) single letters also. - yet fill, flip are OK !

fixing up with sigil seems to have done the trick though, unless I come across other glitches once I am reading the conversion.
cybmole is offline   Reply With Quote
Old 03-28-2011, 07:26 AM   #4
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by cybmole View Post
[...] what is this ϐ thing anyway ???
It's a contraction of s and z into one letter, called "esszett" in german and typically used in the german language. It's used in places where a sharp 's' occurs, although it's use has been reduced since the orthographic "reform" of the '90s. I believe the letter originated as a ligature itself, although it's beyond me why Calibre would replace a ligature of 'fl' by one of 'sz'.

Hey, you asked
Manichean is offline   Reply With Quote
Old 03-28-2011, 07:43 AM   #5
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by Manichean View Post
although it's beyond me why Calibre would replace a ligature of 'fl' by one of 'sz'.

Hey, you asked
Most likely because the pdf used a custom font which mapped to a proprietary code point which happened to map to ϐ in the real world.
ldolse is offline   Reply With Quote
Old 03-28-2011, 08:14 AM   #6
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by ldolse View Post
Most likely because the pdf used a custom font which mapped to a proprietary code point which happened to map to ϐ in the real world.
makes sense - calibre conversion is not replacing fl - it is replacing ( some but not all ) instances of f at start of words.

I think if I went back and re-ran the conversion with a correct search replace regex - find ϐ, replace with f - then I could get it to work in calibre

- but Ive patched it in sigil now.
cybmole is offline   Reply With Quote
Old 03-28-2011, 06:22 PM   #7
dwig
Wizard
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 1,613
Karma: 6718479
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
Quote:
Originally Posted by cybmole View Post
makes sense - calibre conversion is not replacing fl - it is replacing ( some but not all ) instances of f at start of words. ...
It might not be that Calibre is replacing characters.

There are some PDFs that are created by scanning a book and using OCR to create a text layer while retaining the scanned image layer. When you view these in Acrobat Reader you see the scanned layer, but when you do a word search or selection, you access the "hidden" text layer.

Try saving the PDF as text from Acrobat Reader and examine the resulting file. If the f's are betas or esszetts then the fault is in the PDF and its creation.
dwig is offline   Reply With Quote
Old 03-29-2011, 02:03 AM   #8
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
good call - let me test & report....

okI did 3 tests using the same sentence - copy from adobe, paste
1) to Word,
2) to notepad++
3) directly to here.

in all 3 tests the leading f simply vanished from my test sentence i.e. flights became lights
There was no smoking on shuttle
lights.



[ c.f. flights became ϐlights in calibre epub output).

so calibre actually did a better job than word in that the calibre output was fixable in regex, whereas distinguishing lost f in word would be impossible




so would ALL pdf convert programs fall at this hurdle ???

Last edited by cybmole; 03-29-2011 at 02:08 AM.
cybmole is offline   Reply With Quote
Old 03-29-2011, 07:45 AM   #9
dwig
Wizard
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 1,613
Karma: 6718479
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
Quote:
Originally Posted by cybmole View Post
...

okI did 3 tests using the same sentence - copy from adobe, paste
1) to Word,
2) to notepad++
3) directly to here.

...
Sounds like the problem is in the PDF and not being caused by the conversion.

Another experiment:

1. open the txt file exported from Acrobat Reader in Notepad++
2. change the encoding to UTF-8 using the entry on the Encoding menu
3. save the file as TXT
4. convert using Calibre.
dwig is offline   Reply With Quote
Old 03-29-2011, 09:32 AM   #10
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Quote:
Originally Posted by cybmole View Post
so would ALL pdf convert programs fall at this hurdle ???
On this particular file I would say that the answer is yes.

It sounds like the file in question was created by scanning in image on the book and then applying OCR technology to create the underlying text. In this case some characters were not recognised correctly at the OCR stage. Unless you had some way of re-applying the OCR step (and doing a better job than the original program) then all conversion programs are going to fail with this PDF file.

Many (possibly the majority) PDF file are created from the original word processed document. In such a case the PDF file does not have the overlaying image and the underlying text is complete so a conversion program has a chance. However PDF conversion is still a little fraught ever with files created this way because of tricks that PDF does (ligatures, absolute placement of text, special symbols, etc) that a conversion program can struggle to understand and convert sensibly.
itimpi is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
pdf to epub trouble LittleRach Calibre 3 09-30-2010 09:38 PM
Trouble Converting PDF? federalbetrayal Calibre 1 09-28-2010 07:35 PM
iPhone PDF Trouble steffen4567 Apple Devices 2 09-04-2010 11:01 PM
Trouble with a large PDF ccowie Calibre 5 10-08-2009 09:58 PM
Trouble with DRM ePub JSWolf Sony Reader 12 07-28-2008 08:16 PM


All times are GMT -4. The time now is 04:57 AM.


MobileRead.com is a privately owned, operated and funded community.