![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 264
Karma: 9246
Join Date: Feb 2010
Location: Berlin, Germany
Device: Kobo H20, iPhone 6+, Macbook Pro
|
Avoiding to manually clean an epub?
The quality of HTML and CSS from the publishers is a shame. A bitter fact.
In the past I cleaned/purified the code of my epubs manually via Sigil. It needs a lot of time and effort. And my knowledge about HTML and CSS is very good. I paused 3 years with eBook reading. Some days ago I bought an H2O. A wonderful display, a joy. Are there patches, tools, hacks, methods, plugins for Calibre which helps to avoid to manually clean an epub, when you want to control the following design attributes? * margins * font * justification * hyphenation * margin between paragraphs * first-line indent I like to have same design for every single book. Automatically. I know that such a goal is absolutely not trivial - the mess of CSS and HTML is incredible in the original epubs. My hope is, that there are smart heuristics, algorithms, smart tools which fight successfully against that dilemma. Thanks for your help. |
![]() |
![]() |
![]() |
#2 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 78,947
Karma: 144284074
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
The more you look into the internals of an ePub, the more you'll understand what's going on and the faster you'll be able to fix things.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,613
Karma: 6718541
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
Quote:
I don't use it, but there is a Hyphenate This plugin for calibre that add soft hyphens and may server your needs. check out https://www.mobileread.com/forums/sh...d.php?t=208534 Also, while Sigil has come back to life and grown, calibre now has its own similar Editor that can serve very well for any additional touchups that you may feel the need to do. Personally, I generally rely on beating on an ePub with calibre's editor, often simply replacing the CSS files with one of my one stock files. I occasionally do a conversion first to see if that will do well enough. I have calibre set to keep the original ePub as well as the new conversion so I can return to the original to try a different tack when needed. Last edited by dwig; 02-24-2017 at 06:43 PM. |
|
![]() |
![]() |
![]() |
#4 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 78,947
Karma: 144284074
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
As for hyphenating, do not do so to an ePub as ADE and most ePub renderers support hyphenation. The problem is most (not tried all) programs will not work properly with soft hyphens. So no don't do it. |
|
![]() |
![]() |
![]() |
#5 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31
Karma: 101916
Join Date: Nov 2013
Device: Kobo Aura HD
|
I use the Calibre plugin "Modify epub" to do most of the things you're looking for and there's a few other Calibre plugins that do this as well. It uses the Calibre conversion templates without actually converting and gives a good first bulk edit.
But it doesn't discriminate in applying the templates and can cause trouble eg applying a paragraph text-indent adds it to all and can cause drop capitals in leading paragraphs to be oddly placed. I usually underprescribe with the plugin and use the Calibre editor on individual books to finish. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,821
Karma: 19162882
Join Date: Nov 2012
Location: Te Riu-a-Māui
Device: Kobo Glo
|
I've generally found that using calibre to convert ePub --> ePub causes as many new problems as it fixes. A conversion can help with some problems that are a lot of trouble to fix manually (such as when files need to be split because they contain multiple chapters), but if I do a conversion I still need to edit the ePub manually to fix the problems that remained plus the new problems caused by the conversion.
|
![]() |
![]() |
![]() |
#7 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,613
Karma: 6718541
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
Quote:
I've set calibre so that it retains the original unconverted ePub. If I do the ePub>ePub conversion, I check the results in my (currently) preferred reader. If it's good enough then I'm done. If not, I open the conversion in the Editor and look around. I may find that it's going to be easier to delete the conversion, restore the ORIGINAL and manually edit that rather than editing the conversion. I generally clean up ePubs for personal reading, not distribution, so often I'll tolerate flaws that I would never accept if I was distributing the ebook. I've found that with simple fiction it's often adequate to completely strip out the original CSS, use the Editor's tools to remove the now unused CSS references and then insert my stock CSS stylesheet. |
|
![]() |
![]() |
![]() |
#8 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
|
Quote:
last time I looked ( some time ago I admit) the calibre conversion search replace regex rules did not apply to stylesheets. please tell me I'm wrong and how to now fully automate this [ as at now I do it with a regex that lives in my sigil recently used find replace collection ] for me, epub to epub usually works and I standardly do that first before manual inspection and tweaking, but sometimes it will split chapter headings and chapter text into 2 separate files. ( because the publisher used "chapter" as a css style, I think ). So that requires a restore, reconvert with twealed xpath.... Last edited by stumped; 02-25-2017 at 01:35 AM. |
|
![]() |
![]() |
![]() |
#9 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,821
Karma: 19162882
Join Date: Nov 2012
Location: Te Riu-a-Māui
Device: Kobo Glo
|
Quote:
The problem is that there are some situations where an explicit line-height is needed and doesn't interfere with the line spacing slider, such as setting the line-height of a raise-cap or superscript to zero to prevent it from messing up the line spacing of the containing paragraph. So you have to go back and manually add those in after the conversion. I really don't think there is any general solution that will work for all books, although many will work for most books which might be good enough. In the end it is like the problem of correcting spelling errors caused by poor scanning, there is no way to do it automatically that will work in all situations because sometimes the correct spelling can only be determined by reading and understanding the book. The book I am reading at the moment (Seasons of Plenty by Colin Greenland, published by Gateway SF) has a character called Iogo (IOGO) which is misspelt in a number of places as logo (LOGO). However there are also a number of places where the word logo (LOGO) is used correctly, so I don't think it is possible for any algorithm to work out which are correct spellings of LOGO and which are misspellings of IOGO. I just have to go through manually and check each case. |
|
![]() |
![]() |
![]() |
#10 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 264
Karma: 9246
Join Date: Feb 2010
Location: Berlin, Germany
Device: Kobo H20, iPhone 6+, Macbook Pro
|
Thanks for your comments and hints.
Indeed no automation can fix every problem, that is sure. Because the situation is so complex, may be a user inface which offers several levels of "aggressiveness" - for each single layout feature - could be a good reaction. And the scripts has to be able to analyse and manipulate the CSS. As an example for "justification": Aggressiveness Level 10 (maximum) [x] Set every part of the book to "left" ... Example for "line-height": Aggressiveness Level 10 (maximum) [x] Set every part of the book to a line-height 100% ... |
![]() |
![]() |
![]() |
#11 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,821
Karma: 19162882
Join Date: Nov 2012
Location: Te Riu-a-Māui
Device: Kobo Glo
|
There is a newish (since version 2.53) conversion setting in Calibre that I haven't looked into much yet: Look & Feel > Transform Styles. It allows you to set more sophisticated rules than simple filters.
https://manual.calibre-ebook.com/edi...css-properties I haven't worked how to do it in a way that handles all cases where the publisher has used different types of unit (px, pt, em), but it might be possible to create rules that would only remove line-height properties that set the height strictly greater than 1em and less than about 1.5em, which would be a big improvement on simply removing all instances of line-height. Last edited by GeoffR; 02-25-2017 at 03:58 AM. Reason: greater than 1em and less than 1.5em |
![]() |
![]() |
![]() |
#12 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 264
Karma: 9246
Join Date: Feb 2010
Location: Berlin, Germany
Device: Kobo H20, iPhone 6+, Macbook Pro
|
My first try with "Convert epup" with Calibre for the book "Richard Dawkins. Die Poesie der Naturwissenschaften."
I set the justification to "left". And [x] Remove margin between paragraphs Indentation: 0em [x]Insert empty Line between paragraphs Margin: 0.5em That worked fine in that book. These two layout features are the most important for my eyes. I do not like justified text at all, when the hypenation isn't really perfect. And I need a margin between paragraphs and no indentation at the beginning of a paragraph. |
![]() |
![]() |
![]() |
#13 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
|
[QUOTE=GeoffR;3481315]You can set CSS properties to filter in the conversion settings Look & Feel > Styling > Filter Style Information. For example if you set a ...
The problem is that there are some situations where an explicit line-height is needed and doesn't interfere with the line spacing slider, such as setting the line-height of a raise-cap or superscript to zero to prevent it from messing up the line spacing of the containing paragraph. ... End quote I did not know you could use zero for that. I would expect zero height to make the superscript invisible That could be a trick worth leaning as I hate seeing odd line heights due to sub or superscripts I think the regex that someone kindly wrote for me only takes out fixed heights of between 1 and 2 em |
![]() |
![]() |
![]() |
#14 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 264
Karma: 9246
Join Date: Feb 2010
Location: Berlin, Germany
Device: Kobo H20, iPhone 6+, Macbook Pro
|
I'm curious about your view, your ideas about the best UI in Calibre, the best usability to deal with the dilemma of bad CSS/HTML from the publishers.
My own idea is to break the problem down to a more simple usecase. At the moment the Kobo software has in my opinion a bug in the UI for the layout settings. The user get's no information that the style rules from the publisher interferes with the style settings in the UI. Nor he get's an option to "disable settings from the publisher" for each specific layout feature. My idea is, that there will be a Plugin for Calibre which "repairs" exactly that narrow software bug of the Kobo. The working title could be: "Normalizing layout settings from the publisher" or "Eliminating inference from the style settings by the publisher and Kobo" or ... Then you get the following offers: 1 Font [x] Remove all style rules from the publisher Warning: Every part of the book will appear in the font, which is set in Kobo 2 Font Size [x] Remove all style rules from the publisher Warning: Every part of the book will appear in the size, which is set in Kobo Exception: Superscripts and Subscripts will appear in 65% of that size 3 Line Height [x] Remove all style rules from the publisher Warning: Every part of the book will appear in the line height, which is set in Kobo 4 Page Margins [x] Remove all style rules from the publisher Warning: Not only the page margins may be changed but even margins of paragraphs. If so, and if that bothers you, please disable the option 5 Justification [x] Remove all style rules from the publisher Warning: Every part of the book will appear in that justification, which is set in Kobo The main idea behind focussing on those 5 layout features is to create an interface which is simple enough for normal users, who are overstrained by all the many other possibilities in that area. What do you think about the idea? Please regard it just as a first sketch. I would enjoy a discussion about it. Thanks. |
![]() |
![]() |
![]() |
#15 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
|
i think it's harder to automate than you suggest
e.g. styles - does that include bold, italic, italicbold... justification: I like story text to be justified, but that does to mean I want all headings, poetry verse which may occur in the story to be ruthlessly changed margins,- for narrative purposes it is often necessary for some margins to be deeper than others & because CSS is hierarchical you have to consider at what level(s) to apply changes. I get good results( for me ) by having calibre add css only to the main calibre body style - where it can still be specifically over-ridden lower down within the CSS if needed. So I add a top level don't hyphenate ( as I don't like hyphenation) and top level widows 1 orphans 1 as I don't personally care for those effects either. But I am not tweaking solely for kobo, I often want a book to look good on my kobo reader which my son has on semi-permanent loan, but also to look good in the various android reader apps we use. Mostly that just requires zapping line heights and any forced left justify commands in body text, then let the apps or the device sliders do their things. i only want to have to keep one version per format in calibre android apps like bookari also add CSS in order to implement their themes, so possibly you end up , in the book, with CSS from publisher, more CSS from calibre plug in , and from the reader app , all asserting their importance while being poked at by the kobo reader sliders... not a pretty sight Last edited by stumped; 02-25-2017 at 09:47 AM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert to clean EPUB | Narga | Conversion | 3 | 07-25-2013 03:24 AM |
Short Fiction Martinez, Brian: A Good Clean, A Harsh Clean. v1. 13th Dec 2010 | BrianMartinez | Kindle Books | 0 | 12-13-2010 09:25 PM |
Short Fiction Martinez, Brian: A Good Clean, A Harsh Clean. v1. 13th Dec 2010 | BrianMartinez | ePub Books | 0 | 12-13-2010 09:23 PM |
How do i send epub to kobo manually | hbilly2002 | Kobo Reader | 2 | 11-28-2010 06:24 AM |
Manually opening epub file on Mac problems | slantybard | Calibre | 2 | 08-29-2009 01:09 PM |