09-11-2011, 04:08 PM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: Aug 2009
Device: Sony PRS600
|
mobi to epub conversions have spelling errors
Hi,
I've found when converting a mobi book to an epub book using Calibre (including the latest version) that spelling errors get introduced where the spelling was correct before. If I delete the new incorrect file and convert the mobi to epub again I get the same spelling errors. It's the odd error here and there, could be a letter missing or two words jumbled up. Has anyone got a remedy for this? I have removed the DRM and checked that the spelling is still accurate on the mobi so I don't think it's anything to do with that. Kind regards |
09-11-2011, 04:15 PM | #2 |
The Grand Mouse 高貴的老鼠
Posts: 71,504
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
There are two possibilities that I can think of, apart from you being mistaken.
(i) The calibre conversion might be doing something wrong with non-ascii characters, either interpreting utf8 as Window Latin-1 or vice versa (ii) The calibre decoding of the mobi might be getting the trailing characters in each segment of the mobipocket file wrong. But it seems unlikely to me that either thing would be wrong, so I'm guessing it must be a third problem I can't think of. You could try extracting the contents of the mobipocket file with Mobiunpack to get and HTML file, add that to calibre, and see if that will convert cleanly. Also, checking to see whether there are errors in the mobipocket file when viewed with the calibre file viewer would help pin down the problem. Best would be to find a DRM-free sample file that shows this problem, so that the calibre developers could see the problem for themselves. |
Advert | |
|
09-11-2011, 08:12 PM | #3 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
calibre's viewer internally converts MOBI to EPUB before displaying. Actually, calibre's viewer is an EPUB viewer and converts everything to EPUB before displaying it.
|
09-11-2011, 09:50 PM | #4 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
I'd actually say the most likely possibility is that he converted a Topaz file and is seeing the embedded topaz text OCR errors - don't assume it's mobi just because it's from Amazon. At least 10% of their content is Topaz.
Go the the edit Metadata window and see if the source file is really a .mobi - I'll bet it shows up a .zip. |
09-12-2011, 12:08 AM | #5 | |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
@dawnybros I link to some interesting posts about the origin of Topaz in this old post. |
|
Advert | |
|
09-12-2011, 02:57 AM | #6 | |
The Grand Mouse 高貴的老鼠
Posts: 71,504
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
|
|
09-16-2011, 07:53 AM | #7 |
Junior Member
Posts: 4
Karma: 10
Join Date: Aug 2009
Device: Sony PRS600
|
Hi,
I don't know if I'm using a Topaz file or not, but the problem happens even on newly released books where the file extender is .mobi, so I'm assuming not. The files are legally bought and downloaded from amazon in Kindle for PC, then I remove the DRM and check the de-DRM'd files (still in .mobi format) and they look fine. I then import the file into Calibre, run convert, and the output epub file is the one with errors. The original .mobi file is just fine. I've check the utf-8, Windows Latin, etc., but it's not just a matter of transposing characters it doesn't recognise, it also sometimes garbles/joins words or parts of words, or you'll find a partially closed div (/div>) sitting on the page. It's just here and there but I'd say probably there are probably errors such as this every few pages. I'm using a Sony PRS600 reader and it's the same problems on my PC so it's not a reader error either. These are small errors, but too frequent, and they 'jar' you out of what you are reading. Any other ideas please? Last edited by dawnybros; 09-16-2011 at 07:57 AM. |
09-16-2011, 08:27 AM | #8 | |
Well trained by Cats
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
DRM removal made some errors decrypting and you are seeing the results. Board policy is we don't discuss how to do this (with or without errors) |
|
09-16-2011, 09:33 AM | #9 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
if what the OP said 2 posts ago is correct, then it is not due to DRM removal, as he said he checked the post-DRM mobi fiels & they were still OK
Note that I have not seen this behaviour ever, & I've converted many retail .mobi ( not topaz) books to epub. if you could reproduce this error with a book from amazon's bestselling free books, & post just the book title, other could download a free legal copy & test to see if they get the same conversion error. why not post the titles of the books were you have the most annoyances, someone else may own the same book & be able to check. you could also try converting mobi to rtf, or to txt, IF as you claim the .mobi file in calibre is OK, then you may be able to narrow down wwhere the problem is occuring. STEP BY STEP - to trace exactly where errorsa are being introduced, you need to compare 1. retail mobi ( with drm ) read in Kindle for PC, 2. the de-drmed Mobi, after importing to calibre, but also read with kindle for PC. ( use open path, then open file within calibre database 3. same file as 2 but read with calibre viewer ( which causes an internal conversion to epub) 4. same file converted to other checkable formats - txt, rtf.... it could also be to do with specific versions of plug-ins and of kindle for PC. I followed advice elsewhere had have not allowed Kindle for PC to update beyond v1.4, as plug ins were reported to have issues with v1.5 & higher. but if your books are OK at step 2 above, then it is NOT due to DRM removal. PS I'm not sure whether a .mobi extension is a reliable way to detect topaz. I think not ,& I think you have to examine file headers via notepad++. I was given that advice in another thread. posting book titles shoulod not break any board rules. Last edited by cybmole; 09-16-2011 at 09:39 AM. |
09-16-2011, 09:37 AM | #10 |
Well trained by Cats
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
If posting a book title breaks the boards rules, most of us would be banned many times over
|
09-16-2011, 12:29 PM | #11 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
|
09-16-2011, 01:21 PM | #12 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
assuming for sake of argument that a file is topaz, is in calibre, & has been de-DRm'd what processing does the calibre viewer do when asked to display it ?
|
09-16-2011, 02:03 PM | #13 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
The Calibre viewer is an epub viewer - therefore it will have run a converion to epub before attempting to display it.
|
09-16-2011, 08:09 PM | #14 |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
If a book was topaz and had the drm stripped, the spelling errors would be in the mobi file in calibre as well as the converted epub. Because the spelling errors in a topaz book are in the underlying OCR code you don't see when reading the original drm azw file. The only thing you see when viewing the topaz azw is image glyphs. The OCR code is used to feed the dictionary and search portion of the kindle.
It would help if he linked to the exact book (on Amazon) he purchased and is having trouble with. |
09-17-2011, 12:47 AM | #15 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Unless they changed the drm plugin to do something radically different it doesn't create a .mobi file from topaz books, it creates a .zip or .htmlz file, depending on the plugin version. That's why I suggested the OP check the edit metadata screen. So to view it in the Calibre viewer it would be converted from one of these to ePub to view it.
That said, his description seems to indicate he's stripping the DRM outside of Calibre, so not really sure what he's doing/seeing. |
Tags |
calibre, epub conversion errors, mobi conversion |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Disable TOC for Mobi conversions | BRGriff | Conversion | 5 | 06-10-2011 05:21 PM |
Spelling errors and such | starrlamia | General Discussions | 29 | 11-29-2010 03:59 AM |
best program for correcting typos / spelling in epub & mobi books ? | cybmole | Calibre | 15 | 11-16-2010 06:22 AM |
Conversions from RTF (to mobi/epub) | Gwen Morse | Calibre | 6 | 10-14-2010 06:00 AM |
Conversion to Mobi to ePub errors | erik_reader | Conversion | 5 | 08-07-2010 02:03 AM |