Large spaces between paragraphs

Leonatus · 05-10-2019, 10:56 AM

In the news that I download appear extremely large spaces between paragraphs. This is because Calibre inserts each time eight tags. Is there a way to edit the recipe in order to reduce those large spaces? Be aware that I'm not a tecnician!
Thanks in advance!

kovidgoyal · 05-10-2019, 09:29 PM

Add

dict(name='br')

to remove_tags in the recipe

Leonatus · 05-11-2019, 07:01 AM

Thank you Kovid! But would this remove all tags? I only need to remov seven of eight tags.
Edit: I forgot to mention that paragraphs, in this news, are mainly controlled by tags, and only at the end regularly by a terminating tag. Removing all tags would remove all paragraphs.

kovidgoyal · 05-11-2019, 10:04 AM

Then there is no simple solution you ahve to write code to do it.

Leonatus · 05-11-2019, 12:24 PM

Thank you!

siebert · 05-12-2019, 06:35 AM

I really think it's a regression, as for the recipes I use everything works fine until calibre version 3.39.1, while later versions (I tested 3.40.1 and 3.42.0) inflate the vertical whitespace.

A single " " tag is replaced by 3.39.1 with a single "*" while the later versions replace each " " tag with four of these "<p..> " sections.

I can provide a simple test recipe with the good and the broken epub file generated from it, if that helps.

siebert · 05-12-2019, 06:44 AM

This is the source HTML:

Code:

Line 1<br>line 2<br>line 3<br><br>line a<br>line b<br>
<p>Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et
dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi
consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
<p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu 
feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit
augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy
nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p>

The first attachment shows how the epub generated by 3.33.1 looks like, the second one the epub generated by 3.42.0.

kovidgoyal · 05-12-2019, 09:03 AM

A test recipe is useful

Leonatus · 05-12-2019, 12:25 PM

That's the recipe in question:

Code:

from calibre.web.feeds.news import BasicNewsRecipe


class AdvancedUserRecipe1295262156(BasicNewsRecipe):
    title = u'kath.net'
    __author__ = 'Bobus'
    description = u'Katholische Nachrichten'
    oldest_article = 7
    language = 'de'
    max_articles_per_feed = 100
    no_stylesheets = True
    encoding = 'iso-8859-1'

    feeds = [(u'kath.net', u'https://www.kath.net/2005/xml/index.xml')]

    def print_version(self, url):
        return url + "/print/yes"

    def get_browser(self, *a, **kwargs):
        kwargs['verify_ssl_certificates'] = False
        return BasicNewsRecipe.get_browser(self, *a, **kwargs)

    extra_css = 'td.textb {font-size: medium;}'

siebert · 05-12-2019, 01:47 PM

This is my test recipe which created the ebooks from the screenshots above.

Code:

#!/usr/bin/env  python
# -*- coding: utf-8 mode: python -*-

__license__   = 'GPL v3'
__copyright__ = 'Steffen Siebert <calibre at steffensiebert.de>'
__version__   = '1.0'

""" Create dummy ebook to test navigation elements. """

import re
import string
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile

class NavigationTest(BasicNewsRecipe):
    __author__ = 'Steffen Siebert'
    title = 'Navigation Test'
    description = 'Navigation Test'
    publisher ='Steffen Siebert'
    lang = 'de-DE'
    language = 'de'
    publication_type = 'magazine'
    articles_are_obfuscated = True
    use_embedded_content = False
    no_stylesheets = True
    conversion_options = {'comments': description, 'language': language, 'publisher': publisher}

    feeds = 3
    """ The number of feeds to generate. """
    articles_per_feed = 3
    """ The number of articles to generate for each feed. """
    
    LOREM_IPSUM = """Line 1<br>line 2<br>line 3<br><br>line a<br>line b<br>
    <p>Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et
    dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi
    consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
    Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
    <p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu 
    feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit
    augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy
    nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p>"""
    """ Dummy text. """

    """
    Calibre recipe to create dummy ebook for testing navigation elements.
    """

    def generate_image(self, feed, article):
        try:
            from PIL import Image, ImageDraw, ImageFont
            Image, ImageDraw, ImageFont
        except ImportError:
            import Image, ImageDraw, ImageFont

        font_path = P('fonts/liberation/LiberationSerif-Bold.ttf')
        img = Image.new('RGB', (self.MI_WIDTH, self.MI_HEIGHT), 'white')
        draw = ImageDraw.Draw(img)
        font = ImageFont.truetype(font_path, 22)
        text = "Image of feed %s article %s" % (feed, article)
        width, height = draw.textsize(text, font=font)
        left = max(int((self.MI_WIDTH - width)/2.), 0)
        top = max(int((self.MI_HEIGHT - height)/2.), 0)
        draw.text((left, top), text, fill=(255,0,0), font=font)
        output = PersistentTemporaryFile('_fa.jpg')
        img.save(output, 'JPEG')
        output.close()
        return output.name

    def get_obfuscated_article(self, url):
        result = re.match("^http://dummy/feed_([0-9]+)/article_([0-9]+).html$", url)
        feed = result.group(1)
        article = result.group(2)
        imageUrl = "file:///%s" % self.generate_image(feed, article)

        # Generate content into new temporary html file.
        html = PersistentTemporaryFile('_fa.html')
        html.write('<html>\n<head>\n<title>Feed %s Article %s</title>\n</head>\n' % (feed, article))
        html.write("<body>\n<h1>Feed %s Article %s</h1>\n" % (feed, article))
        html.write('<p><img src="%s" alt="Image of feed %s article %s"></p>' % (imageUrl, feed, article))
        html.write(self.LOREM_IPSUM)
        html.write("</body>\n</html>\n")
        html.close()

        return html.name

    def parse_index(self):
        feeds = []

        for feed in range(1, self.feeds + 1):
            feedName = "Feed %i" % feed
            articles = []
            for article in range(1, self.articles_per_feed + 1):
                url = "http://dummy/feed_%i/article_%i.html" % (feed, article)
                title = "Feed %i Article %i" % (feed, article)
                articles.append({'title': title, 'url': url, 'date': ''})
            feeds.append((feedName, articles))

        return feeds

kovidgoyal · 05-13-2019, 03:07 AM

The multiplication is a bug in html5-parser https://github.com/kovidgoyal/html5-...2da97df7b6f7e1

Leonatus · 05-13-2019, 06:17 AM

Quote:

Originally Posted by kovidgoyal

The multiplication is a bug in html5-parser https://github.com/kovidgoyal/html5-...2da97df7b6f7e1

Okay, thank you! Could someone please tell me what I have to do now?

siebert · 05-13-2019, 07:08 AM

Quote:

Originally Posted by Leonatus

Okay, thank you! Could someone please tell me what I have to do now?

Just wait for the next calibre release. Or downgrade to 3.39.1, if you're in a hurry.

Leonatus · 05-13-2019, 07:19 AM

Quote:

Originally Posted by siebert

Just wait for the next calibre release. Or downgrade to 3.39.1, if you're in a hurry.

Ah, o. k. Now I understand. Sorry for my overhaste!

Leonatus · 05-29-2019, 08:10 AM

With the latest release of Calibre, the problem has vanished. Thank you!

05-10-2019, 10:56 AM	#1
Leonatus Wizard Posts: 1,023 Karma: 10963125 Join Date: Mar 2013 Location: Guben, Brandenburg, Germany Device: Kobo Clara 2E, Tolino Shine 3	Large spaces between paragraphs In the news that I download appear extremely large spaces between paragraphs. This is because Calibre inserts each time eight <br class="calibre5" /> tags. Is there a way to edit the recipe in order to reduce those large spaces? Be aware that I'm not a tecnician! Thanks in advance!

05-11-2019, 07:01 AM	#3
Leonatus Wizard Posts: 1,023 Karma: 10963125 Join Date: Mar 2013 Location: Guben, Brandenburg, Germany Device: Kobo Clara 2E, Tolino Shine 3	Thank you Kovid! But would this remove all tags? I only need to remov seven of eight tags. Edit: I forgot to mention that paragraphs, in this news, are mainly controlled by <br/> tags, and only at the end regularly by a terminating </p> tag. Removing all <br/> tags would remove all paragraphs. Last edited by Leonatus; 05-11-2019 at 07:25 AM.

05-12-2019, 06:35 AM	#6
siebert Developer Posts: 155 Karma: 280 Join Date: Nov 2010 Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)	I really think it's a regression, as for the recipes I use everything works fine until calibre version 3.39.1, while later versions (I tested 3.40.1 and 3.42.0) inflate the vertical whitespace. A single "<br>" tag is replaced by 3.39.1 with a single "<p class="calibre8" style="margin:0pt; border:0pt; height:0pt">*</p>" while the later versions replace each "<br>" tag with four of these "<p..> </p>" sections. I can provide a simple test recipe with the good and the broken epub file generated from it, if that helps.

05-13-2019, 03:07 AM	#11
kovidgoyal creator of calibre Posts: 43,850 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	The <br> multiplication is a bug in html5-parser https://github.com/kovidgoyal/html5-...2da97df7b6f7e1

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Adding spaces after images and between paragraphs	awcross	Sigil	7	06-14-2017 02:47 PM
Spaces between paragraphs	Bigo2	Calibre	15	06-25-2014 03:37 AM
Removing spaces between paragraphs	Skydog	Calibre	12	02-20-2013 08:52 PM
Spaces between Paragraphs in FBReader?	luqmaninbmore	PocketBook	2	03-10-2010 09:09 AM
Huge Spaces Between Paragraphs	diremommy	Calibre	0	12-29-2009 06:30 PM

05-10-2019, 09:29 PM	#2
kovidgoyal creator of calibre Posts: 43,850 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Add dict(name='br') to remove_tags in the recipe

05-11-2019, 10:04 AM	#4
kovidgoyal creator of calibre Posts: 43,850 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Then there is no simple solution you ahve to write code to do it.

05-11-2019, 12:24 PM	#5
Leonatus Wizard Posts: 1,023 Karma: 10963125 Join Date: Mar 2013 Location: Guben, Brandenburg, Germany Device: Kobo Clara 2E, Tolino Shine 3	Thank you!

05-12-2019, 09:03 AM	#8
kovidgoyal creator of calibre Posts: 43,850 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	A test recipe is useful

05-29-2019, 08:10 AM	#15
Leonatus Wizard Posts: 1,023 Karma: 10963125 Join Date: Mar 2013 Location: Guben, Brandenburg, Germany Device: Kobo Clara 2E, Tolino Shine 3	With the latest release of Calibre, the problem has vanished. Thank you!

Advert

Advert