Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-04-2011, 01:44 PM   #1
Dizzley
Junior Member
Dizzley began at the beginning.
 
Dizzley's Avatar
 
Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Amazon Kindle 3
Question WordLive daily bible reading progress

I'm making a recipe to download the daily bible reading from WordLive (UK). I'm glad to say that there are RSS feeds for the different daily output.

The basic feed is at http://feeds.feedburner.com/org/ELCH?format=xml.

This seems a good start. I'm now tweaking. Later I will add a subscription login so the user can set preferences.

Right now I have a problem with bible verse numbers: Calibre sees the first few as header numbers. They are actually in sup tags, typically:
Code:
<sup class="versenum" id="en-TNIV-25582">1</sup>
<p> Now the tax collectors... </p>
How can I get these verse numbers to pass through untouched by Calibre?

Here's my current recipe:
Spoiler:
Code:
class WordLiveClassicRecipe(BasicNewsRecipe):
    title          = u'WordLive'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = True
    use_embedded_content = True
    oldest_article = 28
    max_articles_per_feed = 100
    use_embedded_content = True
    encoding = 'utf8'
    remove_empty_feeds = True
#   no_stylesheets = True
    remove_javascript = True
#     keep_only_tags = [{'class':'regularitem'}]
    feeds          = [(u'WordLive Classic', u'http://feeds.feedburner.com/org/ELCH?format=xml')]

Last edited by Dizzley; 10-04-2011 at 01:44 PM. Reason: minor brainfade in original
Dizzley is offline   Reply With Quote
Old 10-05-2011, 10:31 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Dizzley View Post
Right now I have a problem with bible verse numbers: Calibre sees the first few as header numbers. They are actually in sup tags,
The sup tags should be untouched. Try running your recipe from the command line to create html files and see if they make it through to the html before Calibre does the conversion to your final format. See here:
https://www.mobileread.com/forums/sho...d.php?t=121439
Specifically the tips on writing recipes using ebook-convert.
Starson17 is offline   Reply With Quote
Old 10-05-2011, 11:40 AM   #3
Dizzley
Junior Member
Dizzley began at the beginning.
 
Dizzley's Avatar
 
Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Amazon Kindle 3
Thanks for replying Starson17.

It might not be the sup tag. The unwanted effect is a line break after the verse number "11". This is visible when viewing the input HTML file for article_1 from the debug directory in a browser.

Yes, I'm already running it on the command line - good idea. It seems like the first one, or first few verses get picked up and breaks the layout. I suppose it could be the sup tag or the p tag - classed as "calibre9". Today's feed seems to only break the first verse (11) across lines. Verse 12 onwards is fine.

Here's an extract of the feed XML:
Spoiler:
<h5 class="passage-header">The Parable of the Lost Son</h5>&nbsp;<sup class="versenum" id="en-TNIV-25592">11</sup> Jesus continued: <font class='woj'>“There was a man who had two sons.</font> <font class='woj'><sup class="versenum" id="en-TNIV-25593">12</sup> The younger one said to his father, ‘Father, give me my share of the estate.’ So he divided his property between them.</font> <p />&nbsp;&nbsp;&nbsp;<font class='woj'><sup class="versenum" id="en-TNIV-25594">13</sup> “Not long after that, the younger son got together all he had, set off for a distant country and there squandered his wealth in wild living.</font>


The content of the article seems the same in input, parsed and processed HTML. Here's an extract from today's feed's input HTML debug output:
Spoiler:
<h5 class="passage-header">The Parable of the Lost Son</h5>*<sup class="versenum"
id="en-TNIV-25592">11</sup><p> Jesus continued: </p><font class="woj">“There was a man who
had two sons.</font> <font class="woj"><sup class="versenum" id="en-TNIV-25593">12</sup>
The younger one said to his father, ‘Father, give me my share of the estate.’ So he
divided his property between them.</font> <p></p>***<font class="woj"><sup
class="versenum" id="en-TNIV-25594">13</sup> “Not long after that, the younger son got
together all he had, set off for a distant country and there squandered his wealth in wild
living.</font>


I notice two changes in the input HTML (in Bold) -
1) there's a new <p> tag near versnum 11, and
2) some <p> tag changes near versenum 13 (which still renders correctly as a new paragraph).

You can check the feed at http://feeds.feedburner.com/org/ELCH but it changes daily of course.

I'l begin by looking at the CSS.
I'm somewhat Python savvy so I'm willing to do what it takes.

Last edited by Dizzley; 10-05-2011 at 11:41 AM. Reason: typo
Dizzley is offline   Reply With Quote
Old 10-05-2011, 02:40 PM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Dizzley View Post
It might not be the sup tag.
I notice two changes in the input HTML (in Bold) -
1) there's a new <p> tag near versnum 11, and
2) some <p> tag changes near versenum 13 (which still renders correctly as a new paragraph).
I'm not sure how calibre would put <p> tags into content other than in the conversion process. To avoid the conversion code entirely, use the method of command line ebook-convert I linked you to. It won't produce the multiple directories from the debugged conversion process, just the raw html.

I also see you're using embedded content from the RSS feed. (You've doubled some lines in the posted recipe, but that shouldn't be a problem.) You might try not using the embedded content.

Still another possibility is that the problem is in the RSS feed, but you're not seeing it if you are looking with a browser (browsers sometimes change the raw source before showing the page). You can print the raw XML soup Calibre sees with :
Code:
    def preprocess_html (self, soup):
        print 'the Soup is:', soup
        return soup
Sometimes I do that to make sure I know what Calibre's recipe is actually getting to work with.

Last edited by Starson17; 10-06-2011 at 09:18 AM.
Starson17 is offline   Reply With Quote
Old 10-06-2011, 09:06 AM   #5
Dizzley
Junior Member
Dizzley began at the beginning.
 
Dizzley's Avatar
 
Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Amazon Kindle 3
Thanks for the patience.

Thanks for the feed debug code. Today's feed soup contains:

Spoiler:
<h4>Luke 16</h4>
*<sup class="versenum" id="en-TNIV-25614">1</sup><p> Jesus told his disciples: </p><font class="woj">“There was a rich man whose manager was accused of wasting his possessions.</font> <font class="woj"><sup class="versenum" id="en-TNIV-25615">2</sup> So he called him in and asked him, ‘What is this I hear about you? Give an account of your management, because you cannot be manager any longer.’</font> <p></p>***<font class="woj"><sup class="versenum" id="en-TNIV-25616">3</sup> “The manager said to himself...


So I can now see the feed Calibre is working on.

It does look like the offending text has a <p> tag following the </sup> tag (bolded). Also there are empty <p> tags (red).

I might try cleaning sequence </sup><p>text</p> up to be </sup>text

Also as you suggest, I'll try not using the embedded text.
Dizzley is offline   Reply With Quote
Reply

Tags
recipe, superscript


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Classic Hide Reading Progress Bar grapaslingo Barnes & Noble NOOK 2 05-16-2012 05:54 PM
Bible Gateway Reading Plans somedayson Recipes 1 03-06-2011 02:24 AM
Classic Nook Reading Progress Bar Goes Blank gidgiddonihah Barnes & Noble NOOK 8 08-30-2010 11:56 AM
PRS-300 Can I use the Daily Edition reading light cover with it? m-reader Sony Reader 13 02-02-2010 12:23 AM
Classic Synchronize book reading progress between Blackberry & Nook? Greg G Barnes & Noble NOOK 11 12-10-2009 08:51 PM


All times are GMT -4. The time now is 11:31 AM.


MobileRead.com is a privately owned, operated and funded community.