11-04-2021, 07:09 PM | #1 |
Wizard
Posts: 1,165
Karma: 4917718
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Converted eBooks
I have only been editing ebooks for a year or so, to clean up my library, and am self taught. I am really only familiar with epub. I don't know the inner structure and workings of other formats like mobi or kfz (which calibre does not edit. why?), and when I once edited in AZW3, it was the same as editing an epub as I could not see any difference.
So a couple of questions: 1. Why, when converting from any other format to epub, is so much excess and useless code added? Looking through the css file, everything is "calibre_1, calibre_2" etc, and there can be upto 50 code styles, and most of them are the same or similar anyway. Then there is an excess of <span> throughout each xhtml page. There are even <spans> on the spaces between words. I can't imagine the original book was built this way. So why is there so much garbage code after conversion? 2. When using DeDRM, why does it automatically convert to mobi? How can you change it to convert directly to epub? Is there any way, or better plugins to use that avoids the above? |
11-04-2021, 08:12 PM | #2 | |
Addict
Posts: 389
Karma: 1638210
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
|
Quote:
If you are getting books from amateur publishers or "free" sites, then the sky isn't even close to the limit. I've seen books where every paragraph had a different style name. The css had about 15,000 lines in it. Huge numbers of <span>s are not uncommon. Books like this may have been made for a particular audience, and/or gone through many hands before you. So don't be surprised at this nonsense, very little of it is due to Calibre. The styles "calibre1", "calibre2" and so on is Calibre taking whatever it is given and trying to make a readable book out of it. Give it a nice, clean, well-coded book and it will do a very good job, with a minimum of "calibre" styles. Give it some weirdly coded book, and who knows what comes out. If you get an "original" book in epub or azw3, try opening it in the editor before you convert it, and look at the coding. Some are so weird you can't even understand them (until after Calibre has dome a lot of clean-up with a conversion). DeDRM, by the way, does no conversion at all, it just decrypts. If you are getting automatic conversions, you have set Calibre to do that, probably in Preferences-->Adding Books. |
|
Advert | |
|
11-04-2021, 08:24 PM | #3 | ||||
Grand Sorcerer
Posts: 6,566
Karma: 84810789
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
|
The calibre editor is primarily an EPUB 2 editor. KF8 (azw3) format has basically the same content as EPUB 2 but packaged differently so it is not a big stretch to support that also.
MOBI is very similar to KF8 but is based on the ancient HTML 3 standard. I don't know why it isn't supported, perhaps because it is such an outdated format. KFX on the other hand isn't based on HTML, but is a very proprietary Amazon format. Editing that would be next to impossible due to a lack of any documentation of how it works. Quote:
Quote:
Quote:
There is an option in calibre to automatically convert books to another format upon import. (Preferences, Adding Books, Adding actions, Automatically convert added books to the preferred output format.) You may want to check that you have not enabled this by mistake. Quote:
|
||||
11-04-2021, 08:48 PM | #4 |
Well trained by Cats
Posts: 29,976
Karma: 56143930
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Conversion does not have an AI (but it does quite well in cases).
GIGO applies to ebooks given to Calibre to convert. I've seen phases inside spans,with punctuation outside, then another span. Not sure what they were after, a non-italic comma or quote???????? |
11-05-2021, 08:53 PM | #5 | |||||
Wizard
Posts: 1,165
Karma: 4917718
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Hi all, thank you for the responses.
Quote:
Quote:
Quote:
I wanted to avoid the double conversion as I thought that is what is adding all the extra tags - azw(?)>mobi>epub Quote:
Quote:
I have a small library of 240 books. Most of the Amazon books are decent, but could do with some cleanup because of the excess code. Other books from elsewhere are just so poorly formatted. I have 6 novels where all the paragraphs have been stripped out and each chapter is just one paragraph. Grrrrr. They belong in the "elsewhere" group. |
|||||
Advert | |
|
11-05-2021, 09:02 PM | #6 |
Wizard
Posts: 1,165
Karma: 4917718
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Follow up question...
Which format would you use as your master copy? ie the format that you edit and fix, and from which you will convert to other formats depending on the ereader in use. Would epub be the best choice or another format? |
11-05-2021, 09:03 PM | #7 |
Running with scissors
Posts: 1,552
Karma: 14325282
Join Date: Nov 2019
Device: none
|
I don't remember if I looked at the kindle book in the editor but I had one that I converted to epub and it had a span around every word. I also think you can get more stuff added depending on things you've set in the calibre conversion settings, so you might try tweaking those if the extra stuff really bothers you.
In my case I think most if not all of that extra stuff is in the css file. I recently tweaked my conversion settings and now a bunch of the css classes have added to them Code:
border-bottom-color: currentColor; border-bottom-style: none; border-bottom-width: 0; border-left-color: currentColor; border-left-style: none; border-left-width: 0; border-right-color: currentColor; border-right-style: none; border-right-width: 0; border-top-color: currentColor; border-top-style: none; border-top-width: 0; |
11-05-2021, 09:11 PM | #8 |
Running with scissors
Posts: 1,552
Karma: 14325282
Join Date: Nov 2019
Device: none
|
You're in for a treat when you get a book that uses div tags instead of p tags. Luckily there's a plugin that can fix that in one go (editing toolbag I think it's called).
|
11-05-2021, 09:13 PM | #9 | |
Wizard
Posts: 1,165
Karma: 4917718
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Quote:
They are a little bit bothersome, but with a few different regex runs, I can clean most of it. Its just an extra step and I was curious why it happens. |
|
11-05-2021, 09:25 PM | #10 |
Running with scissors
Posts: 1,552
Karma: 14325282
Join Date: Nov 2019
Device: none
|
One of the things that I think confuses people is that there are what I think of as two "flavors" of MOBI; there's the MOBI container and the MOBI format. What I'm calling the MOBI format is the very old MobiPocket format, which is essentially the version of HTML before there was CSS. The MOBI container can contain a book in multiple formats and can contain it both in the old MobiPocket format as well as in the newer format that uses CSS; convenient for Amazon so that one file works on very old Kindles as well as the newer ones. So don't be dismayed when you see that the ebook has a .mobi extension; it's just a container.
|
11-05-2021, 09:39 PM | #11 |
Grand Sorcerer
Posts: 6,566
Karma: 84810789
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
|
It doesn’t default to MOBI. It doesn’t change the format. You get MOBI out because that is what you are putting in.
That explains why you are getting MOBI format. The second generation Kindle is twelve years old and does not support any of the newer formats. If you download a book targeting that device you will get MOBI format even though the book is available in newer and better formats when delivered to a newer device. Much of the original book formatting provided by the publisher is lost in MOBI format. You would be far better off installing the Kindle for PC/Mac app and using that to download Kindle books. |
11-05-2021, 09:44 PM | #12 |
Well trained by Cats
Posts: 29,976
Karma: 56143930
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
DeDRM does not 'convert', it decrypts. So you started/ended with Mobi (probably based upon your registered device.
Master Format is always the original (sans DRM) for me. EPUB/AZW are my preferred ones. Most PDF are left 'as is' (the are usually User Guides (yes, I keep copies of all my manuals, instructions in Calibre) |
11-05-2021, 09:49 PM | #13 | ||
Wizard
Posts: 1,165
Karma: 4917718
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Quote:
Quote:
Yep, I will install the pc version of Kindle and use that instead. I assume the books are fairly easy to locate once downloaded? probably in %appdata% at a guess. |
||
11-05-2021, 09:51 PM | #14 |
Wizard
Posts: 1,165
Karma: 4917718
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Yes, that is an important distinction. I am seeing where my errors are and how to fix now.
Ok, great. I was hoping epub was one of the better formats for this. Its what I have been using so far. |
11-05-2021, 09:51 PM | #15 |
Grand Sorcerer
Posts: 6,566
Karma: 84810789
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Question: Getting converted non-Amazon eBooks onto a PaperWhite | haertig | Amazon Kindle | 28 | 10-10-2017 09:47 PM |
Add Ebooks Converted To .Mobi To Android Kindle App | book64 | Android Devices | 15 | 07-16-2015 09:12 AM |
Converted ebooks always default to page 1 of 1 | ChildofCthulhu | Conversion | 2 | 06-29-2012 05:26 AM |
Search Feature doesnt work on converted ebooks | druthven | Sony Reader | 6 | 01-03-2009 08:34 PM |
Good device for non-converted PDF eBooks? | calvarez | Which one should I buy? | 24 | 10-03-2008 11:14 AM |