07-04-2014, 11:40 AM | #1 |
Enthusiast
Posts: 39
Karma: 10
Join Date: Jul 2012
Device: none
|
Epub (or other) to HTMLZ attributes renamed
I use the htmlz format because it places all the html into a single file. That is awesome. However, a lot of the attributes get renamed in the process, and I can't find a way to preserve the original attributes.
For example, in the epub, this is a tag: <p class="RM-recipe-method"> But after conversion, this is that same tag: <p class="intit"> I do the conversions via command line. I do not care about css, it would be removed anyway, but I need the original attributes, because I convert the html to tagged text, and the original attribute tells me what the text is, while the converted attribute has no meaning. I have tried several ways to control it, but the original attributes are always changed. Is there a way to preserve these attributes (on the command line)? |
07-04-2014, 11:52 AM | #2 |
creator of calibre
Posts: 44,029
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No class names are not preserved by conversion.
|
07-04-2014, 01:09 PM | #3 |
Enthusiast
Posts: 39
Karma: 10
Join Date: Jul 2012
Device: none
|
So, could a change be made that preserves the original class AND adds something for Calibre to recognize the classes? For example, from my previous post:
<p class="intit_RM-recipe-method"> That would be sufficient for my code to identify the class. Actually, a lot of classes ARE preserved already. It's just not all of them. Last edited by shotsky; 07-04-2014 at 01:16 PM. |
07-04-2014, 01:11 PM | #4 |
creator of calibre
Posts: 44,029
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I have no interest in making such a change.
|
07-05-2014, 12:05 PM | #5 |
Grand Sorcerer
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
@shotsky,
I believe a calibre conversion rationalises the css classes so that there are no classes, in the output css, with exactly the same css attributes as a differently named class. This would result in the classes in the html tags also being rationalised to match. Any unused classes would also be removed. Is it possible that your input css for "RM-recipe-method" and "intit" are so similar that one of them is superfluous? |
07-05-2014, 02:15 PM | #6 |
Enthusiast
Posts: 39
Karma: 10
Join Date: Jul 2012
Device: none
|
I wonder if you would reconsider changing class names from the original class names? As it is, some of them do not change, others do, and it is not obvious why some change and others don't. I have about 100 users using my tools, and each one of them also uses Calibre as the ebook conversion tool. Classes often describe what a given entity is, as opposed to how it looks.
If underscores and hyphens are causing the attribute name changes, it would be satisfactory to simply eliminate those characters and use the remaining letters. Case is also unimportant for the attribute name. Numbers could be added to them if needed to keep them organized as well. If I knew how to write that kind of code, I would tackle it myself, but I don't so I have to rely on someone else that is willing to look at it. Please reconsider - I am sure I'm not the only one that post processes the output of Calibre, and retaining attribute names would help us all. Regards, John |
07-05-2014, 02:26 PM | #7 | |
Enthusiast
Posts: 39
Karma: 10
Join Date: Jul 2012
Device: none
|
Quote:
It is possible to do both, simply by separating the two classes with a space. That would look like: class="RM-recipe-method intit" Or the reverse, if preferable to Calibre: class="intit RM-recipe-method" In this way, the css is certain to remain as wanted by the Calibre converter, yet the 'meaning' of the class is retained. Regards, John |
|
07-05-2014, 05:37 PM | #8 | |
Grand Sorcerer
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
Quote:
I think you've had the official answer to your request. I've never seen calibre produce html tags with multiple classes e.g. <p class="xxx yyy"> so I'd be surprised if it started now (thankfully from my POV, I'd hate that). Speaking only for myself, I try to keep my input tags and classes as simple/minimal as possible and have found that calibre seems to retain my input class names during conversion these days (except, of course, when the input html has classless tags like <h1>, <p> etc). Whether this is pure dumb luck or whether I've inadvertently found a 'magic formula', I don't know. I suspect the former |
|
07-11-2014, 08:24 AM | #9 | |
Enthusiast
Posts: 39
Karma: 10
Join Date: Jul 2012
Device: none
|
Quote:
This is similar to an attribute named 'copyright', which Calibre would leave alone, since it is a 'recognized' part of a book. In my case, it is not a recognized part of a book, but it IS a clue to what follows - a direction step in a cookbook. However, in the same book, there is an attribute "INGREDIENT" that shows up in many places, but which is untouched. I don't think that is a recognized part of a book. Note that this is not MY html in the first place - it is whatever is in the ebook to be converted, and the quality varies greatly, but this is not a quality issue, it is a mystery why Calibre should change attribute names that are perfectly valid in the first place. |
|
07-11-2014, 09:06 AM | #10 |
Grand Sorcerer
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
OK, I'm not in a position to argue as I don't have your source files. All I can say is that when calibre decides it needs to create new a class name in my conversions they always have names like "calibrenn". I can't think of any circumstances where calibre would pull the name "intit" out of thin air.
|
07-11-2014, 09:43 AM | #11 | |
Well trained by Cats
Posts: 29,995
Karma: 57259778
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
calibre# or calibre## (I don't think I have seen 3 digits even withe the worst Word cr*p ) |
|
08-23-2014, 04:18 PM | #12 |
Junior Member
Posts: 3
Karma: 10
Join Date: Dec 2011
Device: none
|
Class renaming and Sigil
This issue is why I switched to Sigil for creating my epubs from html and then I just use Calibre to create .mobi (or .azw3) and pdf versions. I don't find it worthwhile to have to sort through (mild) machine language to figure out what .calibre32 was or .calibre14 if ever I have to update or correct errors in the epub after conversion. Maybe there's a way to keep my "master" version in htmlz which doesn't rename classes, and then spit out an epub, but It seems I would have to go through and break up the html again into chapters. I would like to use Calibre for everything and some of its more advanced features, but apparently there is no off switch or workaround for this behavior. If anyone knows a workaround in the software or workflow, I'm all ears. Appreciated.
|
08-27-2014, 12:58 AM | #13 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
I don't know if this counts as a "workaround", but instead of comparing the calibre converter to the Sigil editor, why don't you try comparing the calibre editor to the Sigil editor?
In other words, the calibre editor absolutely does not rename things, as it is meant for </gasp> editing. Conversion, on the other hand, will gleefully rename things as it is NOT repeat NOT meant for editing! |
09-06-2014, 02:32 PM | #14 |
Junior Member
Posts: 3
Karma: 10
Join Date: Dec 2011
Device: none
|
sigil vs. calibre editor/converter
So, when I'm doing the initial creation/conversion of an epub from an html file and a css file, if I do it with Sigil my classes are respected. If I do it with Calibre I get "class=calibrenn". If there is a way to use the Calibre editor or converter to import the html and spit out an epub without rewriting the classes, I would be very interested to know.
|
09-06-2014, 07:23 PM | #15 |
null operator (he/him)
Posts: 20,696
Karma: 26966376
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@Gunaddho - the editor can be run standalone, in the File menu there's a Create New Epub option, once you have one of those you can add component files - HTML, CSS, Images etc.
See Using calibre's editor independently BR |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Markdown to ePub generation of ID attributes | Agama | Conversion | 2 | 10-18-2012 02:45 AM |
EPUB -> MOBI -> HTMLz margin/blockquote annoyance | therealjoeblow | Conversion | 2 | 07-20-2012 01:20 PM |
htmlz to epub? | shootist | Other formats | 1 | 03-19-2012 10:28 PM |
Epub is renamed when loaded on KOBO | kljewelrydesign | Kobo Reader | 3 | 09-11-2010 08:25 AM |
Epub is renamed when loaded on KOBO | kljewelrydesign | General Discussions | 3 | 09-11-2010 02:00 AM |