![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,055
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Large spaces between paragraphs
In the news that I download appear extremely large spaces between paragraphs. This is because Calibre inserts each time eight <br class="calibre5" /> tags. Is there a way to edit the recipe in order to reduce those large spaces? Be aware that I'm not a tecnician!
Thanks in advance! |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,359
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Add
dict(name='br') to remove_tags in the recipe |
![]() |
![]() |
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,055
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Thank you Kovid! But would this remove all tags? I only need to remov seven of eight tags.
Edit: I forgot to mention that paragraphs, in this news, are mainly controlled by <br/> tags, and only at the end regularly by a terminating </p> tag. Removing all <br/> tags would remove all paragraphs. Last edited by Leonatus; 05-11-2019 at 07:25 AM. |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,359
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Then there is no simple solution you ahve to write code to do it.
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,055
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Thank you!
|
![]() |
![]() |
![]() |
#6 |
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
I really think it's a regression, as for the recipes I use everything works fine until calibre version 3.39.1, while later versions (I tested 3.40.1 and 3.42.0) inflate the vertical whitespace.
A single "<br>" tag is replaced by 3.39.1 with a single "<p class="calibre8" style="margin:0pt; border:0pt; height:0pt">*</p>" while the later versions replace each "<br>" tag with four of these "<p..> </p>" sections. I can provide a simple test recipe with the good and the broken epub file generated from it, if that helps. |
![]() |
![]() |
![]() |
#7 |
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
This is the source HTML:
Code:
Line 1<br>line 2<br>line 3<br><br>line a<br>line b<br> <p>Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p> <p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p> |
![]() |
![]() |
![]() |
#8 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,359
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
A test recipe is useful
|
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,055
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
That's the recipe in question:
Code:
from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1295262156(BasicNewsRecipe): title = u'kath.net' __author__ = 'Bobus' description = u'Katholische Nachrichten' oldest_article = 7 language = 'de' max_articles_per_feed = 100 no_stylesheets = True encoding = 'iso-8859-1' feeds = [(u'kath.net', u'https://www.kath.net/2005/xml/index.xml')] def print_version(self, url): return url + "/print/yes" def get_browser(self, *a, **kwargs): kwargs['verify_ssl_certificates'] = False return BasicNewsRecipe.get_browser(self, *a, **kwargs) extra_css = 'td.textb {font-size: medium;}' |
![]() |
![]() |
![]() |
#10 |
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
This is my test recipe which created the ebooks from the screenshots above.
Code:
#!/usr/bin/env python # -*- coding: utf-8 mode: python -*- __license__ = 'GPL v3' __copyright__ = 'Steffen Siebert <calibre at steffensiebert.de>' __version__ = '1.0' """ Create dummy ebook to test navigation elements. """ import re import string from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ptempfile import PersistentTemporaryFile class NavigationTest(BasicNewsRecipe): __author__ = 'Steffen Siebert' title = 'Navigation Test' description = 'Navigation Test' publisher ='Steffen Siebert' lang = 'de-DE' language = 'de' publication_type = 'magazine' articles_are_obfuscated = True use_embedded_content = False no_stylesheets = True conversion_options = {'comments': description, 'language': language, 'publisher': publisher} feeds = 3 """ The number of feeds to generate. """ articles_per_feed = 3 """ The number of articles to generate for each feed. """ LOREM_IPSUM = """Line 1<br>line 2<br>line 3<br><br>line a<br>line b<br> <p>Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p> <p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p>""" """ Dummy text. """ """ Calibre recipe to create dummy ebook for testing navigation elements. """ def generate_image(self, feed, article): try: from PIL import Image, ImageDraw, ImageFont Image, ImageDraw, ImageFont except ImportError: import Image, ImageDraw, ImageFont font_path = P('fonts/liberation/LiberationSerif-Bold.ttf') img = Image.new('RGB', (self.MI_WIDTH, self.MI_HEIGHT), 'white') draw = ImageDraw.Draw(img) font = ImageFont.truetype(font_path, 22) text = "Image of feed %s article %s" % (feed, article) width, height = draw.textsize(text, font=font) left = max(int((self.MI_WIDTH - width)/2.), 0) top = max(int((self.MI_HEIGHT - height)/2.), 0) draw.text((left, top), text, fill=(255,0,0), font=font) output = PersistentTemporaryFile('_fa.jpg') img.save(output, 'JPEG') output.close() return output.name def get_obfuscated_article(self, url): result = re.match("^http://dummy/feed_([0-9]+)/article_([0-9]+).html$", url) feed = result.group(1) article = result.group(2) imageUrl = "file:///%s" % self.generate_image(feed, article) # Generate content into new temporary html file. html = PersistentTemporaryFile('_fa.html') html.write('<html>\n<head>\n<title>Feed %s Article %s</title>\n</head>\n' % (feed, article)) html.write("<body>\n<h1>Feed %s Article %s</h1>\n" % (feed, article)) html.write('<p><img src="%s" alt="Image of feed %s article %s"></p>' % (imageUrl, feed, article)) html.write(self.LOREM_IPSUM) html.write("</body>\n</html>\n") html.close() return html.name def parse_index(self): feeds = [] for feed in range(1, self.feeds + 1): feedName = "Feed %i" % feed articles = [] for article in range(1, self.articles_per_feed + 1): url = "http://dummy/feed_%i/article_%i.html" % (feed, article) title = "Feed %i Article %i" % (feed, article) articles.append({'title': title, 'url': url, 'date': ''}) feeds.append((feedName, articles)) return feeds |
![]() |
![]() |
![]() |
#11 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,359
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The <br> multiplication is a bug in html5-parser https://github.com/kovidgoyal/html5-...2da97df7b6f7e1
|
![]() |
![]() |
![]() |
#12 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,055
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Quote:
|
|
![]() |
![]() |
![]() |
#13 |
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
|
![]() |
![]() |
![]() |
#14 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,055
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
|
![]() |
![]() |
![]() |
#15 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,055
Karma: 11391181
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
With the latest release of Calibre, the problem has vanished. Thank you!
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Adding spaces after images and between paragraphs | awcross | Sigil | 7 | 06-14-2017 02:47 PM |
Spaces between paragraphs | Bigo2 | Calibre | 15 | 06-25-2014 03:37 AM |
Removing spaces between paragraphs | Skydog | Calibre | 12 | 02-20-2013 08:52 PM |
Spaces between Paragraphs in FBReader? | luqmaninbmore | PocketBook | 2 | 03-10-2010 09:09 AM |
Huge Spaces Between Paragraphs | diremommy | Calibre | 0 | 12-29-2009 06:30 PM |