|
|
#1 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,081
Karma: 11391183
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Large spaces between paragraphs
In the news that I download appear extremely large spaces between paragraphs. This is because Calibre inserts each time eight <br class="calibre5" /> tags. Is there a way to edit the recipe in order to reduce those large spaces? Be aware that I'm not a tecnician!
Thanks in advance! |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,600
Karma: 28548974
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Add
dict(name='br') to remove_tags in the recipe |
|
|
|
|
|
#3 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,081
Karma: 11391183
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Thank you Kovid! But would this remove all tags? I only need to remov seven of eight tags.
Edit: I forgot to mention that paragraphs, in this news, are mainly controlled by <br/> tags, and only at the end regularly by a terminating </p> tag. Removing all <br/> tags would remove all paragraphs. Last edited by Leonatus; 05-11-2019 at 07:25 AM. |
|
|
|
|
|
#4 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,600
Karma: 28548974
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Then there is no simple solution you ahve to write code to do it.
|
|
|
|
|
|
#5 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,081
Karma: 11391183
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Thank you!
|
|
|
|
|
|
#6 |
|
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
I really think it's a regression, as for the recipes I use everything works fine until calibre version 3.39.1, while later versions (I tested 3.40.1 and 3.42.0) inflate the vertical whitespace.
A single "<br>" tag is replaced by 3.39.1 with a single "<p class="calibre8" style="margin:0pt; border:0pt; height:0pt">*</p>" while the later versions replace each "<br>" tag with four of these "<p..> </p>" sections. I can provide a simple test recipe with the good and the broken epub file generated from it, if that helps. |
|
|
|
|
|
#7 |
|
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
This is the source HTML:
Code:
Line 1<br>line 2<br>line 3<br><br>line a<br>line b<br> <p>Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p> <p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p> |
|
|
|
|
|
#8 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,600
Karma: 28548974
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
A test recipe is useful
|
|
|
|
|
|
#9 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,081
Karma: 11391183
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
That's the recipe in question:
Code:
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1295262156(BasicNewsRecipe):
title = u'kath.net'
__author__ = 'Bobus'
description = u'Katholische Nachrichten'
oldest_article = 7
language = 'de'
max_articles_per_feed = 100
no_stylesheets = True
encoding = 'iso-8859-1'
feeds = [(u'kath.net', u'https://www.kath.net/2005/xml/index.xml')]
def print_version(self, url):
return url + "/print/yes"
def get_browser(self, *a, **kwargs):
kwargs['verify_ssl_certificates'] = False
return BasicNewsRecipe.get_browser(self, *a, **kwargs)
extra_css = 'td.textb {font-size: medium;}'
|
|
|
|
|
|
#10 |
|
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
This is my test recipe which created the ebooks from the screenshots above.
Code:
#!/usr/bin/env python
# -*- coding: utf-8 mode: python -*-
__license__ = 'GPL v3'
__copyright__ = 'Steffen Siebert <calibre at steffensiebert.de>'
__version__ = '1.0'
""" Create dummy ebook to test navigation elements. """
import re
import string
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
class NavigationTest(BasicNewsRecipe):
__author__ = 'Steffen Siebert'
title = 'Navigation Test'
description = 'Navigation Test'
publisher ='Steffen Siebert'
lang = 'de-DE'
language = 'de'
publication_type = 'magazine'
articles_are_obfuscated = True
use_embedded_content = False
no_stylesheets = True
conversion_options = {'comments': description, 'language': language, 'publisher': publisher}
feeds = 3
""" The number of feeds to generate. """
articles_per_feed = 3
""" The number of articles to generate for each feed. """
LOREM_IPSUM = """Line 1<br>line 2<br>line 3<br><br>line a<br>line b<br>
<p>Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et
dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi
consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
<p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu
feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit
augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy
nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p>"""
""" Dummy text. """
"""
Calibre recipe to create dummy ebook for testing navigation elements.
"""
def generate_image(self, feed, article):
try:
from PIL import Image, ImageDraw, ImageFont
Image, ImageDraw, ImageFont
except ImportError:
import Image, ImageDraw, ImageFont
font_path = P('fonts/liberation/LiberationSerif-Bold.ttf')
img = Image.new('RGB', (self.MI_WIDTH, self.MI_HEIGHT), 'white')
draw = ImageDraw.Draw(img)
font = ImageFont.truetype(font_path, 22)
text = "Image of feed %s article %s" % (feed, article)
width, height = draw.textsize(text, font=font)
left = max(int((self.MI_WIDTH - width)/2.), 0)
top = max(int((self.MI_HEIGHT - height)/2.), 0)
draw.text((left, top), text, fill=(255,0,0), font=font)
output = PersistentTemporaryFile('_fa.jpg')
img.save(output, 'JPEG')
output.close()
return output.name
def get_obfuscated_article(self, url):
result = re.match("^http://dummy/feed_([0-9]+)/article_([0-9]+).html$", url)
feed = result.group(1)
article = result.group(2)
imageUrl = "file:///%s" % self.generate_image(feed, article)
# Generate content into new temporary html file.
html = PersistentTemporaryFile('_fa.html')
html.write('<html>\n<head>\n<title>Feed %s Article %s</title>\n</head>\n' % (feed, article))
html.write("<body>\n<h1>Feed %s Article %s</h1>\n" % (feed, article))
html.write('<p><img src="%s" alt="Image of feed %s article %s"></p>' % (imageUrl, feed, article))
html.write(self.LOREM_IPSUM)
html.write("</body>\n</html>\n")
html.close()
return html.name
def parse_index(self):
feeds = []
for feed in range(1, self.feeds + 1):
feedName = "Feed %i" % feed
articles = []
for article in range(1, self.articles_per_feed + 1):
url = "http://dummy/feed_%i/article_%i.html" % (feed, article)
title = "Feed %i Article %i" % (feed, article)
articles.append({'title': title, 'url': url, 'date': ''})
feeds.append((feedName, articles))
return feeds
|
|
|
|
|
|
#11 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,600
Karma: 28548974
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The <br> multiplication is a bug in html5-parser https://github.com/kovidgoyal/html5-...2da97df7b6f7e1
|
|
|
|
|
|
#12 | |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,081
Karma: 11391183
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Quote:
|
|
|
|
|
|
|
#13 |
|
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
|
|
|
|
|
|
#14 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,081
Karma: 11391183
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
|
|
|
|
|
|
#15 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,081
Karma: 11391183
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
With the latest release of Calibre, the problem has vanished. Thank you!
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Adding spaces after images and between paragraphs | awcross | Sigil | 7 | 06-14-2017 02:47 PM |
| Spaces between paragraphs | Bigo2 | Calibre | 15 | 06-25-2014 03:37 AM |
| Removing spaces between paragraphs | Skydog | Calibre | 12 | 02-20-2013 08:52 PM |
| Spaces between Paragraphs in FBReader? | luqmaninbmore | PocketBook | 2 | 03-10-2010 09:09 AM |
| Huge Spaces Between Paragraphs | diremommy | Calibre | 0 | 12-29-2009 06:30 PM |