Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 09-11-2011, 04:08 PM   #1
dawnybros
Junior Member
dawnybros began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2009
Device: Sony PRS600
mobi to epub conversions have spelling errors

Hi,

I've found when converting a mobi book to an epub book using Calibre (including the latest version) that spelling errors get introduced where the spelling was correct before. If I delete the new incorrect file and convert the mobi to epub again I get the same spelling errors. It's the odd error here and there, could be a letter missing or two words jumbled up.

Has anyone got a remedy for this? I have removed the DRM and checked that the spelling is still accurate on the mobi so I don't think it's anything to do with that.

Kind regards
dawnybros is offline   Reply With Quote
Old 09-11-2011, 04:15 PM   #2
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,504
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
There are two possibilities that I can think of, apart from you being mistaken.

(i) The calibre conversion might be doing something wrong with non-ascii characters, either interpreting utf8 as Window Latin-1 or vice versa
(ii) The calibre decoding of the mobi might be getting the trailing characters in each segment of the mobipocket file wrong.

But it seems unlikely to me that either thing would be wrong, so I'm guessing it must be a third problem I can't think of.

You could try extracting the contents of the mobipocket file with Mobiunpack to get and HTML file, add that to calibre, and see if that will convert cleanly.

Also, checking to see whether there are errors in the mobipocket file when viewed with the calibre file viewer would help pin down the problem.

Best would be to find a DRM-free sample file that shows this problem, so that the calibre developers could see the problem for themselves.
pdurrant is offline   Reply With Quote
Advert
Old 09-11-2011, 08:12 PM   #3
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by pdurrant View Post
Also, checking to see whether there are errors in the mobipocket file when viewed with the calibre file viewer would help pin down the problem.
calibre's viewer internally converts MOBI to EPUB before displaying. Actually, calibre's viewer is an EPUB viewer and converts everything to EPUB before displaying it.
user_none is offline   Reply With Quote
Old 09-11-2011, 09:50 PM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I'd actually say the most likely possibility is that he converted a Topaz file and is seeing the embedded topaz text OCR errors - don't assume it's mobi just because it's from Amazon. At least 10% of their content is Topaz.

Go the the edit Metadata window and see if the source file is really a .mobi - I'll bet it shows up a .zip.
ldolse is offline   Reply With Quote
Old 09-12-2011, 12:08 AM   #5
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by ldolse View Post
I'd actually say the most likely possibility is that he converted a Topaz file and is seeing the embedded topaz text OCR errors - don't assume it's mobi just because it's from Amazon. At least 10% of their content is Topaz.
You have obviously nailed the problem on the head. I attempted to reply to this a few times without sounding snotty, but while writing my ideas fell apart and you just can't say "Conversions don't cause spelling errors" without saying why the user was seeing what s/he was seeing. I feel much better now that the mystery is solved.

@dawnybros I link to some interesting posts about the origin of Topaz in this old post.
DoctorOhh is offline   Reply With Quote
Advert
Old 09-12-2011, 02:57 AM   #6
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,504
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by ldolse View Post
I'd actually say the most likely possibility is that he converted a Topaz file and is seeing the embedded topaz text OCR errors - don't assume it's mobi just because it's from Amazon. At least 10% of their content is Topaz.

Go the the edit Metadata window and see if the source file is really a .mobi - I'll bet it shows up a .zip.
Ah — I think you have found the third reason I didn't think of.
pdurrant is offline   Reply With Quote
Old 09-16-2011, 07:53 AM   #7
dawnybros
Junior Member
dawnybros began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2009
Device: Sony PRS600
Hi,
I don't know if I'm using a Topaz file or not, but the problem happens even on newly released books where the file extender is .mobi, so I'm assuming not. The files are legally bought and downloaded from amazon in Kindle for PC, then I remove the DRM and check the de-DRM'd files (still in .mobi format) and they look fine. I then import the file into Calibre, run convert, and the output epub file is the one with errors. The original .mobi file is just fine.

I've check the utf-8, Windows Latin, etc., but it's not just a matter of transposing characters it doesn't recognise, it also sometimes garbles/joins words or parts of words, or you'll find a partially closed div (/div>) sitting on the page. It's just here and there but I'd say probably there are probably errors such as this every few pages. I'm using a Sony PRS600 reader and it's the same problems on my PC so it's not a reader error either.

These are small errors, but too frequent, and they 'jar' you out of what you are reading.

Any other ideas please?

Last edited by dawnybros; 09-16-2011 at 07:57 AM.
dawnybros is offline   Reply With Quote
Old 09-16-2011, 08:27 AM   #8
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by dawnybros View Post
Hi,
I don't know if I'm using a Topaz file or not, but the problem happens even on newly released books where the file extender is .mobi, so I'm assuming not. The files are legally bought and downloaded from amazon in Kindle for PC, then I remove the DRM and check the de-DRM'd files (still in .mobi format) and they look fine. I then import the file into Calibre, run convert, and the output epub file is the one with errors. The original .mobi file is just fine.

I've check the utf-8, Windows Latin, etc., but it's not just a matter of transposing characters it doesn't recognise, it also sometimes garbles/joins words or parts of words, or you'll find a partially closed div (/div>) sitting on the page. It's just here and there but I'd say probably there are probably errors such as this every few pages. I'm using a Sony PRS600 reader and it's the same problems on my PC so it's not a reader error either.

These are small errors, but too frequent, and they 'jar' you out of what you are reading.

Any other ideas please?
I think you answered your own question.

DRM removal made some errors decrypting and you are seeing the results.
Board policy is we don't discuss how to do this (with or without errors)
theducks is offline   Reply With Quote
Old 09-16-2011, 09:33 AM   #9
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
if what the OP said 2 posts ago is correct, then it is not due to DRM removal, as he said he checked the post-DRM mobi fiels & they were still OK

Note that I have not seen this behaviour ever, & I've converted many retail .mobi ( not topaz) books to epub.

if you could reproduce this error with a book from amazon's bestselling free books, & post just the book title, other could download a free legal copy & test to see if they get the same conversion error.

why not post the titles of the books were you have the most annoyances, someone else may own the same book & be able to check.

you could also try converting mobi to rtf, or to txt, IF as you claim the .mobi file in calibre is OK, then you may be able to narrow down wwhere the problem is occuring.

STEP BY STEP - to trace exactly where errorsa are being introduced, you need to compare
1. retail mobi ( with drm ) read in Kindle for PC,
2. the de-drmed Mobi, after importing to calibre, but also read with kindle for PC. ( use open path, then open file within calibre database
3. same file as 2 but read with calibre viewer ( which causes an internal conversion to epub)
4. same file converted to other checkable formats - txt, rtf....

it could also be to do with specific versions of plug-ins and of kindle for PC. I followed advice elsewhere had have not allowed Kindle for PC to update beyond v1.4, as plug ins were reported to have issues with v1.5 & higher. but if your books are OK at step 2 above, then it is NOT due to DRM removal.

PS I'm not sure whether a .mobi extension is a reliable way to detect topaz. I think not ,& I think you have to examine file headers via notepad++. I was given that advice in another thread.

posting book titles shoulod not break any board rules.

Last edited by cybmole; 09-16-2011 at 09:39 AM.
cybmole is offline   Reply With Quote
Old 09-16-2011, 09:37 AM   #10
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
If posting a book title breaks the boards rules, most of us would be banned many times over
theducks is offline   Reply With Quote
Old 09-16-2011, 12:29 PM   #11
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by cybmole View Post
PS I'm not sure whether a .mobi extension is a reliable way to detect topaz. I think not ,& I think you have to examine file headers via notepad++. I was given that advice in another thread.
This is correct.
user_none is offline   Reply With Quote
Old 09-16-2011, 01:21 PM   #12
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
assuming for sake of argument that a file is topaz, is in calibre, & has been de-DRm'd what processing does the calibre viewer do when asked to display it ?
cybmole is offline   Reply With Quote
Old 09-16-2011, 02:03 PM   #13
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
The Calibre viewer is an epub viewer - therefore it will have run a converion to epub before attempting to display it.
itimpi is offline   Reply With Quote
Old 09-16-2011, 08:09 PM   #14
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
If a book was topaz and had the drm stripped, the spelling errors would be in the mobi file in calibre as well as the converted epub. Because the spelling errors in a topaz book are in the underlying OCR code you don't see when reading the original drm azw file. The only thing you see when viewing the topaz azw is image glyphs. The OCR code is used to feed the dictionary and search portion of the kindle.

It would help if he linked to the exact book (on Amazon) he purchased and is having trouble with.
DoctorOhh is offline   Reply With Quote
Old 09-17-2011, 12:47 AM   #15
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Unless they changed the drm plugin to do something radically different it doesn't create a .mobi file from topaz books, it creates a .zip or .htmlz file, depending on the plugin version. That's why I suggested the OP check the edit metadata screen. So to view it in the Calibre viewer it would be converted from one of these to ePub to view it.

That said, his description seems to indicate he's stripping the DRM outside of Calibre, so not really sure what he's doing/seeing.
ldolse is offline   Reply With Quote
Reply

Tags
calibre, epub conversion errors, mobi conversion


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Disable TOC for Mobi conversions BRGriff Conversion 5 06-10-2011 05:21 PM
Spelling errors and such starrlamia General Discussions 29 11-29-2010 03:59 AM
best program for correcting typos / spelling in epub & mobi books ? cybmole Calibre 15 11-16-2010 06:22 AM
Conversions from RTF (to mobi/epub) Gwen Morse Calibre 6 10-14-2010 06:00 AM
Conversion to Mobi to ePub errors erik_reader Conversion 5 08-07-2010 02:03 AM


All times are GMT -4. The time now is 12:08 PM.


MobileRead.com is a privately owned, operated and funded community.