05-10-2019, 10:56 AM | #1 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Large spaces between paragraphs
In the news that I download appear extremely large spaces between paragraphs. This is because Calibre inserts each time eight <br class="calibre5" /> tags. Is there a way to edit the recipe in order to reduce those large spaces? Be aware that I'm not a tecnician!
Thanks in advance! |
05-10-2019, 09:29 PM | #2 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Add
dict(name='br') to remove_tags in the recipe |
Advert | |
|
05-11-2019, 07:01 AM | #3 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Thank you Kovid! But would this remove all tags? I only need to remov seven of eight tags.
Edit: I forgot to mention that paragraphs, in this news, are mainly controlled by <br/> tags, and only at the end regularly by a terminating </p> tag. Removing all <br/> tags would remove all paragraphs. Last edited by Leonatus; 05-11-2019 at 07:25 AM. |
05-11-2019, 10:04 AM | #4 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Then there is no simple solution you ahve to write code to do it.
|
05-11-2019, 12:24 PM | #5 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Thank you!
|
Advert | |
|
05-12-2019, 06:35 AM | #6 |
Developer
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
I really think it's a regression, as for the recipes I use everything works fine until calibre version 3.39.1, while later versions (I tested 3.40.1 and 3.42.0) inflate the vertical whitespace.
A single "<br>" tag is replaced by 3.39.1 with a single "<p class="calibre8" style="margin:0pt; border:0pt; height:0pt">*</p>" while the later versions replace each "<br>" tag with four of these "<p..> </p>" sections. I can provide a simple test recipe with the good and the broken epub file generated from it, if that helps. |
05-12-2019, 06:44 AM | #7 |
Developer
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
This is the source HTML:
Code:
Line 1<br>line 2<br>line 3<br><br>line a<br>line b<br> <p>Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p> <p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p> |
05-12-2019, 09:03 AM | #8 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
A test recipe is useful
|
05-12-2019, 12:25 PM | #9 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
That's the recipe in question:
Code:
from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1295262156(BasicNewsRecipe): title = u'kath.net' __author__ = 'Bobus' description = u'Katholische Nachrichten' oldest_article = 7 language = 'de' max_articles_per_feed = 100 no_stylesheets = True encoding = 'iso-8859-1' feeds = [(u'kath.net', u'https://www.kath.net/2005/xml/index.xml')] def print_version(self, url): return url + "/print/yes" def get_browser(self, *a, **kwargs): kwargs['verify_ssl_certificates'] = False return BasicNewsRecipe.get_browser(self, *a, **kwargs) extra_css = 'td.textb {font-size: medium;}' |
05-12-2019, 01:47 PM | #10 |
Developer
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
This is my test recipe which created the ebooks from the screenshots above.
Code:
#!/usr/bin/env python # -*- coding: utf-8 mode: python -*- __license__ = 'GPL v3' __copyright__ = 'Steffen Siebert <calibre at steffensiebert.de>' __version__ = '1.0' """ Create dummy ebook to test navigation elements. """ import re import string from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ptempfile import PersistentTemporaryFile class NavigationTest(BasicNewsRecipe): __author__ = 'Steffen Siebert' title = 'Navigation Test' description = 'Navigation Test' publisher ='Steffen Siebert' lang = 'de-DE' language = 'de' publication_type = 'magazine' articles_are_obfuscated = True use_embedded_content = False no_stylesheets = True conversion_options = {'comments': description, 'language': language, 'publisher': publisher} feeds = 3 """ The number of feeds to generate. """ articles_per_feed = 3 """ The number of articles to generate for each feed. """ LOREM_IPSUM = """Line 1<br>line 2<br>line 3<br><br>line a<br>line b<br> <p>Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p> <p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p>""" """ Dummy text. """ """ Calibre recipe to create dummy ebook for testing navigation elements. """ def generate_image(self, feed, article): try: from PIL import Image, ImageDraw, ImageFont Image, ImageDraw, ImageFont except ImportError: import Image, ImageDraw, ImageFont font_path = P('fonts/liberation/LiberationSerif-Bold.ttf') img = Image.new('RGB', (self.MI_WIDTH, self.MI_HEIGHT), 'white') draw = ImageDraw.Draw(img) font = ImageFont.truetype(font_path, 22) text = "Image of feed %s article %s" % (feed, article) width, height = draw.textsize(text, font=font) left = max(int((self.MI_WIDTH - width)/2.), 0) top = max(int((self.MI_HEIGHT - height)/2.), 0) draw.text((left, top), text, fill=(255,0,0), font=font) output = PersistentTemporaryFile('_fa.jpg') img.save(output, 'JPEG') output.close() return output.name def get_obfuscated_article(self, url): result = re.match("^http://dummy/feed_([0-9]+)/article_([0-9]+).html$", url) feed = result.group(1) article = result.group(2) imageUrl = "file:///%s" % self.generate_image(feed, article) # Generate content into new temporary html file. html = PersistentTemporaryFile('_fa.html') html.write('<html>\n<head>\n<title>Feed %s Article %s</title>\n</head>\n' % (feed, article)) html.write("<body>\n<h1>Feed %s Article %s</h1>\n" % (feed, article)) html.write('<p><img src="%s" alt="Image of feed %s article %s"></p>' % (imageUrl, feed, article)) html.write(self.LOREM_IPSUM) html.write("</body>\n</html>\n") html.close() return html.name def parse_index(self): feeds = [] for feed in range(1, self.feeds + 1): feedName = "Feed %i" % feed articles = [] for article in range(1, self.articles_per_feed + 1): url = "http://dummy/feed_%i/article_%i.html" % (feed, article) title = "Feed %i Article %i" % (feed, article) articles.append({'title': title, 'url': url, 'date': ''}) feeds.append((feedName, articles)) return feeds |
05-13-2019, 03:07 AM | #11 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The <br> multiplication is a bug in html5-parser https://github.com/kovidgoyal/html5-...2da97df7b6f7e1
|
05-13-2019, 06:17 AM | #12 | |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Quote:
|
|
05-13-2019, 07:08 AM | #13 |
Developer
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
|
05-13-2019, 07:19 AM | #14 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
|
05-29-2019, 08:10 AM | #15 |
Wizard
Posts: 1,023
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
With the latest release of Calibre, the problem has vanished. Thank you!
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Adding spaces after images and between paragraphs | awcross | Sigil | 7 | 06-14-2017 02:47 PM |
Spaces between paragraphs | Bigo2 | Calibre | 15 | 06-25-2014 03:37 AM |
Removing spaces between paragraphs | Skydog | Calibre | 12 | 02-20-2013 08:52 PM |
Spaces between Paragraphs in FBReader? | luqmaninbmore | PocketBook | 2 | 03-10-2010 09:09 AM |
Huge Spaces Between Paragraphs | diremommy | Calibre | 0 | 12-29-2009 06:30 PM |