07-27-2012, 01:17 PM | #16 |
Enthusiast
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
|
Something does not work... looking into it.
|
07-27-2012, 01:18 PM | #17 |
Enthusiast
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
|
Ah, easy, you killed the 'return mi' at the end of get_metadata, perhaps because you put it into read_cover.
|
Advert | |
|
07-27-2012, 01:44 PM | #18 |
Enthusiast
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
|
I fixed it in my branch and optimized it a bit (no read_cover if opf.nocover).
|
07-27-2012, 03:12 PM | #19 |
creator of calibre
Posts: 43,930
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
merged.
|
07-28-2012, 03:12 AM | #20 |
Enthusiast
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
|
I started to work on conversion quirks. One particular problem now is that a convert will duplicate the cover image.
My call: Code:
ebook-convert book.odt book.epub --output-profile sony300 \ --preserve-cover-aspect-ratio \ --no-svg-cover --no-default-epub-cover \ --filter-css margin-top,margin-right,margin-left,margin-bottom,position,top,width I suspect I did create this by returning the cover image in get_metadata. The only logical way to work around this would be to strip the cover image markup from the source in the input process. This would imply that a cover image has not to be in the markup of the book. Or is there something I am missing to work around it? |
Advert | |
|
07-28-2012, 03:30 AM | #21 |
creator of calibre
Posts: 43,930
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There is no general fix for this. You can always end up with duplicated images when converting formats that dont have the concept of a cover. I have committed (an untested) fix to change the ODT input plugin to not use the first image as cover. However, you can still get duplicated images if for instance, a user adds an ODT to calibre, which gets its first image as the cover. Then convert, which will set that extracted first image as a cover. For these kinds of problems, there is the --remove-first-image option.
|
07-28-2012, 03:45 AM | #22 |
Enthusiast
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
|
Hmmm... if I understand it right, your change does call get_metadata with extract_cover set to False. But this still returns the cover href (as I programmed odt get_metadata) and only does not return cover_data. A test also shows that it changes nothing.
Here the first question is: if extract_cover is False, should get_metadata return neither cover nor cover_data? If yes, I need to change this. But: I want it to have the cover href, so it is set as cover in the content.opf. So inhibiting the detection might be the wrong way. Except if metadata import/convert is separated from content import/convert (sorry, I'm not yet that deep into the code). I still think selective removal of the detected cover image would be the better way to do it. I can also exactly identify this image in the ODT source, so I could also remove exactly this image. |
07-28-2012, 03:56 AM | #23 | |
Enthusiast
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
|
The option text is interesting:
Quote:
|
|
07-28-2012, 04:28 AM | #24 | |
Enthusiast
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
|
Quote:
The option that would help me is --dont-create-extra-titlepage. This is much better control over what the result should be than --remove-frist-image. In retrospect I don't even know what the transform stage of the conversion pipeline expects to get from the input plugin. As I would understand it, it should be defined if a cover image should be in two places or only in one (the two places are the metadata and the actual markup). As it is currently in two places, it is no wonder that the process creates another cover, as it seems to think: hey, there is a cover image in the metadata, it is certainly not in the markup, so I need to add it. From this it seems to me that the convert pipeline does not expect the cover to be in the markup if it is in the metadata. On the other hand this robs the ability of control, because you can't put your cover on the second page (I don't really know if somebody wants to do this, it's hypothetical). So in my eyes you need to be able to forbid the creation of a titlepage at all. |
|
07-28-2012, 05:32 AM | #25 |
Enthusiast
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
|
FYI: I think there is something missing in customize.builtins or metadata.odt directly so that the quick_metadata hack works.
But I still don't think that this is the solution. Looking into oeb.transform.cover it seems to me that as soon as there is a cover href set in the metadata, the titlepage will be generated. And without insert_cover being executed, there will be no OPF meta cover set to the item that is the cover image. Is this correct? I see a difference in marking a image as cover in the metadata and adding content to a document. BTW: I'm just rattling down my thoughts here. Please don't see this as urging you to fix something, I'm happy to do the coding as soon as I'm sure how it is supposed to be |
07-28-2012, 07:50 AM | #26 |
Enthusiast
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
|
A bit more code reading: a conversion to ePub works differently as for example to MOBI (hah, surprise!).
So there is not really one place to remove the extra titlepage (as I understand it mobi displays the cover always before, without extra markup, and this titlepage.xhtml is only generated in the ePub generation). So removing the Image (Frame, Paragraph) from the ODT seems to be the most compatible way of handling this. I added code for the image removal. It's a very easy solution, I just need to remove the parent text element from the document. After this everything looks like I expect it both with MOBI and ePub. But it's not committed yet, I have to do some more tests to make this more robust. One question: is it ok to use the mi object to pass the frame id from get_metadata back to the odt import? I would set it as mi._odf_cover_frame |
07-28-2012, 09:07 AM | #27 |
creator of calibre
Posts: 43,930
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The conversion pipeline expects the input plugin to supply it a book with a cover that is not part of the spine. For formats like ODT where there is no concept of cover, the input plugin has to guess. In theory the input plugin could set the cover and remove the image from the content, but this is wrong, because when it gets its wrong (i.e. the first image is not a cover) it can result in data loss instead of simple duplication. Which is why there is a --remove-first-image which the user can do after verifying manually that the first image is indeed a cover and should be removed.
So, in short, there is no way to build a robust automated solution for the general case. In your specific case, you can have the input plugin remove the first image if it is specifically identified as the cover via the custom opf.cover metadata. This is the appraoch that the epub input plugin takes. EPUB also is in the situation where it may or may not have a well defined cover. |
07-28-2012, 12:38 PM | #28 |
Enthusiast
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
|
Ok, sounds reasonable.
I need a clean way for transporting the frame name from get_metadata to the odt input function. But custom attrs in Metdata get not copied by smart_update in meta._get_metadata. Is there a way I do not see? |
07-28-2012, 12:47 PM | #29 |
creator of calibre
Posts: 43,930
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use get_metadata from the metadata/odt.py directly instead of the full get_metadata(). Note that if you do this, make sure you set the title and author of the mi object to some reasonable values if they are not set by get_metadata().
|
07-28-2012, 01:59 PM | #30 |
Enthusiast
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
|
Ok. It works, but I need to do some more tests, with different paragraph mixes, to be sure to catch all cases.
BTW: While doing my tests the most annoying thing where lots of 'Unknown' lines in the MOBI conversions. Line 318 in ebooks/mobi/utils.py is the reason that every empty paragraph in my source gets replaced by 'Unknown'. Well, at least this beasts does the replace, there as to be a place that calls this for every string. Perhaps there should a switch for this so that Unknown is only replaced for meaningful tags like title? Edit: it's the call to utf8_text in ebooks/mobi/writer2/serializer.py:383 Edit 2: As I read the commit that changed this (#12785), it is not about empty strings, but only about accented characters. So perhaps the best solution would be to add a empty keyword to the utf8_text that defaults to False and depending the replace with Unknown on this. Edit 3: Fixed this in my branch with rev 12795. Last edited by olig; 07-28-2012 at 02:36 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Auto Download Metadata on Import | ebookrights | Calibre | 2 | 12-18-2012 10:51 AM |
Import MetaData an Tags | adrian142 | Library Management | 0 | 04-03-2012 11:40 AM |
Import metadata from file | Vinavil | Library Management | 2 | 01-28-2012 03:48 PM |
Mixing metadata on import | PeteMan | Calibre | 2 | 01-03-2011 02:21 PM |
Import: prioritization of metadata source? | ATimson | Calibre | 2 | 02-28-2010 03:57 PM |