Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 07-27-2012, 01:17 PM   #16
olig
Enthusiast
olig began at the beginning.
 
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
Something does not work... looking into it.
olig is offline   Reply With Quote
Old 07-27-2012, 01:18 PM   #17
olig
Enthusiast
olig began at the beginning.
 
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
Ah, easy, you killed the 'return mi' at the end of get_metadata, perhaps because you put it into read_cover.
olig is offline   Reply With Quote
Advert
Old 07-27-2012, 01:44 PM   #18
olig
Enthusiast
olig began at the beginning.
 
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
I fixed it in my branch and optimized it a bit (no read_cover if opf.nocover).
olig is offline   Reply With Quote
Old 07-27-2012, 03:12 PM   #19
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
merged.
kovidgoyal is offline   Reply With Quote
Old 07-28-2012, 03:12 AM   #20
olig
Enthusiast
olig began at the beginning.
 
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
I started to work on conversion quirks. One particular problem now is that a convert will duplicate the cover image.

My call:
Code:
ebook-convert book.odt book.epub --output-profile sony300 \
  --preserve-cover-aspect-ratio \
  --no-svg-cover --no-default-epub-cover \
  --filter-css margin-top,margin-right,margin-left,margin-bottom,position,top,width
This will result in a extra titlepage.xhtml generated by the conversion, right in front of the titlepage that is in the converted xhtml of the ODT document.

I suspect I did create this by returning the cover image in get_metadata. The only logical way to work around this would be to strip the cover image markup from the source in the input process. This would imply that a cover image has not to be in the markup of the book.

Or is there something I am missing to work around it?
olig is offline   Reply With Quote
Advert
Old 07-28-2012, 03:30 AM   #21
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There is no general fix for this. You can always end up with duplicated images when converting formats that dont have the concept of a cover. I have committed (an untested) fix to change the ODT input plugin to not use the first image as cover. However, you can still get duplicated images if for instance, a user adds an ODT to calibre, which gets its first image as the cover. Then convert, which will set that extracted first image as a cover. For these kinds of problems, there is the --remove-first-image option.
kovidgoyal is offline   Reply With Quote
Old 07-28-2012, 03:45 AM   #22
olig
Enthusiast
olig began at the beginning.
 
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
Hmmm... if I understand it right, your change does call get_metadata with extract_cover set to False. But this still returns the cover href (as I programmed odt get_metadata) and only does not return cover_data. A test also shows that it changes nothing.

Here the first question is: if extract_cover is False, should get_metadata return neither cover nor cover_data? If yes, I need to change this.

But: I want it to have the cover href, so it is set as cover in the content.opf. So inhibiting the detection might be the wrong way. Except if metadata import/convert is separated from content import/convert (sorry, I'm not yet that deep into the code).

I still think selective removal of the detected cover image would be the better way to do it. I can also exactly identify this image in the ODT source, so I could also remove exactly this image.
olig is offline   Reply With Quote
Old 07-28-2012, 03:56 AM   #23
olig
Enthusiast
olig began at the beginning.
 
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
The option text is interesting:
Quote:

--remove-first-image

Remove the first image from the input ebook. Useful if the input document has a cover image that is not identified as a cover. In this case, if you set a cover in calibre, the output document will end up with two cover images if you do not specify this option.
So that implies that if the cover image is detected (it is, at least in get_metadata) there should be no duplication. In result the question is: if the cover image is detected, who will remove it from the resulting xhtml in the input process? (if it is not some meta information but real markup)
olig is offline   Reply With Quote
Old 07-28-2012, 04:28 AM   #24
olig
Enthusiast
olig began at the beginning.
 
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
Quote:
Originally Posted by olig View Post
Hmmm... if I understand it right, your change does call get_metadata with extract_cover set to False. But this still returns the cover href (as I programmed odt get_metadata) and only does not return cover_data. A test also shows that it changes nothing.
Did read some more code... and it still looks wrong: as I understand it it removes the cover from the data which is given to OPFCreator. So in result I would suspect that it disappears from the OPF, not from the markup. This is exactly what I don't want.

The option that would help me is --dont-create-extra-titlepage. This is much better control over what the result should be than --remove-frist-image.

In retrospect I don't even know what the transform stage of the conversion pipeline expects to get from the input plugin. As I would understand it, it should be defined if a cover image should be in two places or only in one (the two places are the metadata and the actual markup).

As it is currently in two places, it is no wonder that the process creates another cover, as it seems to think: hey, there is a cover image in the metadata, it is certainly not in the markup, so I need to add it. From this it seems to me that the convert pipeline does not expect the cover to be in the markup if it is in the metadata.

On the other hand this robs the ability of control, because you can't put your cover on the second page (I don't really know if somebody wants to do this, it's hypothetical). So in my eyes you need to be able to forbid the creation of a titlepage at all.
olig is offline   Reply With Quote
Old 07-28-2012, 05:32 AM   #25
olig
Enthusiast
olig began at the beginning.
 
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
FYI: I think there is something missing in customize.builtins or metadata.odt directly so that the quick_metadata hack works.

But I still don't think that this is the solution.

Looking into oeb.transform.cover it seems to me that as soon as there is a cover href set in the metadata, the titlepage will be generated. And without insert_cover being executed, there will be no OPF meta cover set to the item that is the cover image. Is this correct?

I see a difference in marking a image as cover in the metadata and adding content to a document.

BTW: I'm just rattling down my thoughts here. Please don't see this as urging you to fix something, I'm happy to do the coding as soon as I'm sure how it is supposed to be
olig is offline   Reply With Quote
Old 07-28-2012, 07:50 AM   #26
olig
Enthusiast
olig began at the beginning.
 
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
A bit more code reading: a conversion to ePub works differently as for example to MOBI (hah, surprise!).

So there is not really one place to remove the extra titlepage (as I understand it mobi displays the cover always before, without extra markup, and this titlepage.xhtml is only generated in the ePub generation).

So removing the Image (Frame, Paragraph) from the ODT seems to be the most compatible way of handling this.

I added code for the image removal. It's a very easy solution, I just need to remove the parent text element from the document. After this everything looks like I expect it both with MOBI and ePub.

But it's not committed yet, I have to do some more tests to make this more robust.

One question: is it ok to use the mi object to pass the frame id from get_metadata back to the odt import? I would set it as mi._odf_cover_frame
olig is offline   Reply With Quote
Old 07-28-2012, 09:07 AM   #27
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The conversion pipeline expects the input plugin to supply it a book with a cover that is not part of the spine. For formats like ODT where there is no concept of cover, the input plugin has to guess. In theory the input plugin could set the cover and remove the image from the content, but this is wrong, because when it gets its wrong (i.e. the first image is not a cover) it can result in data loss instead of simple duplication. Which is why there is a --remove-first-image which the user can do after verifying manually that the first image is indeed a cover and should be removed.

So, in short, there is no way to build a robust automated solution for the general case. In your specific case, you can have the input plugin remove the first image if it is specifically identified as the cover via the custom opf.cover metadata. This is the appraoch that the epub input plugin takes. EPUB also is in the situation where it may or may not have a well defined cover.
kovidgoyal is offline   Reply With Quote
Old 07-28-2012, 12:38 PM   #28
olig
Enthusiast
olig began at the beginning.
 
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
Ok, sounds reasonable.

I need a clean way for transporting the frame name from get_metadata to the odt input function. But custom attrs in Metdata get not copied by smart_update in meta._get_metadata. Is there a way I do not see?
olig is offline   Reply With Quote
Old 07-28-2012, 12:47 PM   #29
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use get_metadata from the metadata/odt.py directly instead of the full get_metadata(). Note that if you do this, make sure you set the title and author of the mi object to some reasonable values if they are not set by get_metadata().
kovidgoyal is offline   Reply With Quote
Old 07-28-2012, 01:59 PM   #30
olig
Enthusiast
olig began at the beginning.
 
Posts: 32
Karma: 12
Join Date: Jul 2012
Device: Kindle 4nt 4.1.3 jailbreak
Ok. It works, but I need to do some more tests, with different paragraph mixes, to be sure to catch all cases.

BTW: While doing my tests the most annoying thing where lots of 'Unknown' lines in the MOBI conversions. Line 318 in ebooks/mobi/utils.py is the reason that every empty paragraph in my source gets replaced by 'Unknown'. Well, at least this beasts does the replace, there as to be a place that calls this for every string. Perhaps there should a switch for this so that Unknown is only replaced for meaningful tags like title?

Edit: it's the call to utf8_text in ebooks/mobi/writer2/serializer.py:383

Edit 2: As I read the commit that changed this (#12785), it is not about empty strings, but only about accented characters. So perhaps the best solution would be to add a empty keyword to the utf8_text that defaults to False and depending the replace with Unknown on this.

Edit 3: Fixed this in my branch with rev 12795.

Last edited by olig; 07-28-2012 at 02:36 PM.
olig is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Auto Download Metadata on Import ebookrights Calibre 2 12-18-2012 10:51 AM
Import MetaData an Tags adrian142 Library Management 0 04-03-2012 11:40 AM
Import metadata from file Vinavil Library Management 2 01-28-2012 03:48 PM
Mixing metadata on import PeteMan Calibre 2 01-03-2011 02:21 PM
Import: prioritization of metadata source? ATimson Calibre 2 02-28-2010 03:57 PM


All times are GMT -4. The time now is 11:29 PM.


MobileRead.com is a privately owned, operated and funded community.