10-12-2008, 12:14 PM | #1 |
Addict
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Typos during conversion
Hi guys.
I usually buy MS reader books at Fictionwise, use ConvertLit, then Calibre into lit. Yesterday I noticed one of my books had typos in lit file. All "fi" combinations were gone. They were in the lit file, they are in epub file, but not in lrt. "first" becomes "rst", "field" is "eld", etc. Also, pictures were not in the lit file but were in epub. I was surprised to see how often these two letters happen together. This is not a big problem, just wanted to share with you. David |
10-12-2008, 12:37 PM | #2 |
creator of calibre
Posts: 43,739
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You're saying fi is in the LIT and the EPUB, but not the LRF?
|
Advert | |
|
10-12-2008, 01:27 PM | #3 | |
Addict
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Quote:
So Calibre is converting it correctly, Sony is not displaying it. EPUB is fine on the reader. Last edited by ddavtian; 10-12-2008 at 01:37 PM. |
|
10-12-2008, 01:52 PM | #4 |
creator of calibre
Posts: 43,739
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
that's because the LIT file uses a special unicode symbol to represent fi. SONY's LRF viewer's default font cant display that symbol.
|
10-12-2008, 01:57 PM | #5 |
Addict
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Thanks Kovid. I'll be reading books in epub then :-)
|
Advert | |
|
10-12-2008, 05:38 PM | #6 |
Grand Sorcerer
Posts: 19,832
Karma: 11844413
Join Date: Jan 2007
Location: Tampa, FL USA
Device: Kindle Touch
|
|
10-12-2008, 07:23 PM | #7 |
creator of calibre
Posts: 43,739
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It could but since there are a very large number of possible ligatures this would just slow things down a lot
|
10-12-2008, 09:59 PM | #8 | |
Grand Sorcerer
Posts: 19,832
Karma: 11844413
Join Date: Jan 2007
Location: Tampa, FL USA
Device: Kindle Touch
|
Quote:
I think if you build a hash table of ligatures it would be quick to do a look up. However, I'm not sure how you are doing the conversion. Is this a stream based process? In .Net you would do this with stream readers and writers. The readers output is a stream which can input to the next reader/writer. The overhead is very small as the stream is processed from end to end and passed from adapter to adapter as the stream is processed. Is there anything like that in python? BOb |
|
10-12-2008, 10:15 PM | #9 |
creator of calibre
Posts: 43,739
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The actuall replacing happens in C. The problem is the number of ligatures, not the speed of python
|
10-20-2008, 12:15 AM | #10 |
Resident Curmudgeon
Posts: 73,510
Karma: 126422064
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
One solution is after the LIT has had the DRM removed, use lit2oeb to explode it. Then take the HTML, fix the ligatures an then use html2lrl witht he --use-spine on the OPF file and all will be well.
|
10-20-2008, 12:35 AM | #11 | |
Junior Member
Posts: 9
Karma: 10
Join Date: Oct 2008
Device: prs-505
|
Quote:
I think for these kind of ligatures there are only a handful ones common in english. 0xfb00 to 0xfb04 are VERY common. 0xfb05 is rare, I even think there are additional rules that it is only used for some specific words. This one is so rare that I cant even be bothered to goole for the specific rules associated with it... The ligatures fb00 to fb04 are very common and often used since they look so much better than the individual characters. Just having an automatic translation of these 5 ligatures probably cover the vast majority of ligatures a calibre user will ever encounter. Best of course would be to have the reader updated somehow so that it supports these 5 ligatures but that might be difficult. There is a reason these ligatures are used in printed media and books, they do look much better than the individual characters. regards ronnie sahlberg |
|
10-20-2008, 12:57 AM | #12 |
creator of calibre
Posts: 43,739
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
OK I've added code to replace those five ligatures.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Typos in ebooks | raac | General Discussions | 223 | 05-28-2011 02:12 PM |
Looking for examples of typos in eBooks | Tonycole | General Discussions | 1 | 05-05-2010 04:23 AM |
Kindle Errors and Typos | rlparker | News | 2 | 07-15-2009 02:07 PM |
eBooks and Typos | seldan | Reading and Management | 9 | 10-08-2007 12:35 PM |
ebook typos | sugarbear2403 | Sony Reader | 6 | 10-09-2006 11:47 PM |