Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 02-15-2009, 09:11 PM   #211
Hypernova
Hyperreader
Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.
 
Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
Recipe for manager.co.th

Hi everyone. My first post here. First of all I want to give a big THANK YOU to Kovid Goyal and everyone who help making calibre.

I am trying to make a recipe for manager.co.th, a news site in Thai. Here is what I have so far.

Code:
class AdvancedUserRecipe1234529365(BasicNewsRecipe):
    title          = u'Manager Online'
    oldest_article = 7
    max_articles_per_feed = 100
    encoding              = 'cp874'
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    #keep_only_tags     = [dict(name='td', attrs={'class':'body'})]

    feeds          = [
                           (u'การเมือง', u'http://www.manager.co.th/RSS/Politics/Politics.xml'),
                           (u'กีฬา', u'http://www.manager.co.th/RSS/Sport/Sport.xml'),
                           (u'อาชญากรรมและกระบวนการยุติธรรม', u'http://www.manager.co.th/RSS/Crime/Crime.xml'),
                           (u'ภูมิภาค', u'http://www.manager.co.th/RSS/Local/Local.xml'),
                           (u'คุณภาพชีวิต', u'http://www.manager.co.th/RSS/QOL/QOL.xml'),
                           (u'เศรษฐกิจ', u'http://www.manager.co.th/RSS/Business/Business.xml'),
                           (u'เกม', u'http://www.manager.co.th/RSS/Game/Game.xml'),
                           (u'วิทยาศาสตร์', u'http://www.manager.co.th/RSS/Science/Science.xml'),
                           (u'ชีวิตในเมือง', u'http://www.manager.co.th/RSS/Metrolife/Metrolife.xml'),
                           (u'ครอบครัว', u'http://www.manager.co.th/RSS/Family/Family.xml'),
                           (u'ชีวิตในรั้วมหาลัย', u'http://www.manager.co.th/RSS/Campus/Campus.xml'),
                           (u'บังเทิง', u'http://www.manager.co.th/RSS/Entertainment/Entertainment.xml'),
                           (u'ผู้จัดกวน', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=1052'),
                           (u'ธรรมะ - ผู้จัดการ', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8101&sourcenewsid=0'),
                           (u'ธรรมะ - ทั่วไป', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8100&sourcenewsid=0')
                      ]

    def print_version(self, url):
        return url.replace('http://www.manager.co.th/asp-bin/mgrview.aspx?', 'http://www.manager.co.th/asp-bin/PrintNews.aspx?')
I want to say that I don't know a thing or two about HTML/CSS. As of now, this recipe works fine. However, if I use the line that I comment out, it will pick up only the text nicely, but the result is that text align become center. Can anyone help me with this? Thank you in advance.
Hypernova is offline  
Old 02-16-2009, 07:24 AM   #212
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
This is what you need to add to your recipe:

After remove_javascript=True insert this:

Code:
    html2lrf_options = ['--ignore-tables']    
    html2epub_options = 'linearize_tables = True'
And at the end of your script insert this:

Code:
    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(align=True):
            del item['align']
        return soup
The first piece of code transforms table tags into paragraphs which is quite important for good epub generation.

The second piece of code removes any style from html and any align tag (this resolving your align problem)
kiklop74 is offline  
Old 02-16-2009, 05:47 PM   #213
Hypernova
Hyperreader
Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.
 
Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
It is still centered for some reasons. I'm begin to think that maybe html2lrf just do that for Thai by default. Is there a way to "forced" left-align on the html before the converter process it? I am guessing that may help.

The epub and mobi always crash both the calibre's viewer and the reader. I can only goes so far as the table of content for epub and only the first blank page for mobi

And thank you for your help kiklop74.

Here is the current (result is center-aligned) code.

Code:
class AdvancedUserRecipe1234529365(BasicNewsRecipe):
    title          = u'Manager Online'
    oldest_article = 7
    max_articles_per_feed = 100
    encoding              = 'cp874'
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    html2lrf_options = ['--ignore-tables']    
    html2epub_options = 'linearize_tables = True'
    keep_only_tags     = [dict(name='td', attrs={'class':'body'})]

    feeds          = [
                           (u'การเมือง', u'http://www.manager.co.th/RSS/Politics/Politics.xml')

                      ]

    def print_version(self, url):
        return url.replace('http://www.manager.co.th/asp-bin/mgrview.aspx?', 'http://www.manager.co.th/asp-bin/PrintNews.aspx?')
    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(align=True):
            del item['align']
        return soup
Hypernova is offline  
Old 02-16-2009, 06:41 PM   #214
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
After remove_javascript=true insert this :

Code:
extra_css = 'body{text-align: left}'
BTW how do you see thai characters on the reader? Sony reader fonts do not support thai unicode range at all.
kiklop74 is offline  
Old 02-16-2009, 09:59 PM   #215
Hypernova
Hyperreader
Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.
 
Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
*Edited*: I just realized that maybe I shouldn't ask for help in this thread. Should I make a new thread? Or this is fine? I'm very sorry if this is not an appropriate place to ask.

Still centered. I'm out of idea. What confuse me is that the first code, which does nothing to the source except take out ccs and javescript give a properly left-aligned result. Do you have any suggestion? I tried to make a recipe for another thai newspaper site and it does not have this problem at all. By the way, the table of content is properly left-aligned.

I flashed the firmware so that I get the default font that have thai characters. The result is not that good, unsurprisingly. There are four levels in thai writing system, and the reader just put the upper two at the same place. They also don't do a good job on where to begin a new line, but that maybe due to the converter rather than the reader itself since the same thing appear in calibre's viewer. Still readable though. I'm thinking about telling html2lrf to embedded a thai font if the recipe if actually share with others.

EDIT2: Ok, here's my best try. Since I doubt anyone will use it, it'll just post it here. Thanks kiklop74 for your help.

Code:
class AdvancedUserRecipe1234529365(BasicNewsRecipe):
    title          = u'Manager Online'
    oldest_article = 7
    max_articles_per_feed = 100
    encoding              = 'cp874'
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True

    remove_tags     = [dict(name='td', attrs={'align':'right'})]
    remove_tags     = [dict(name='td', attrs={'align':'left'})]

    html2lrf_options = ['--ignore-tables']
    html2epub_options = 'linearize_tables = True'

    feeds          = [
                           (u'การเมือง', u'http://www.manager.co.th/RSS/Politics/Politics.xml'),
                           (u'กีฬา', u'http://www.manager.co.th/RSS/Sport/Sport.xml'),
                           (u'อาชญากรรมและกระบวนการยุติธรรม', u'http://www.manager.co.th/RSS/Crime/Crime.xml'),
                           (u'ภูมิภาค', u'http://www.manager.co.th/RSS/Local/Local.xml'),
                           (u'คุณภาพชีวิต', u'http://www.manager.co.th/RSS/QOL/QOL.xml'),
                           (u'เศรษฐกิจ', u'http://www.manager.co.th/RSS/Business/Business.xml'),
                           (u'เกม', u'http://www.manager.co.th/RSS/Game/Game.xml'),
                           (u'วิทยาศาสตร์', u'http://www.manager.co.th/RSS/Science/Science.xml'),
                           (u'ชีวิตในเมือง', u'http://www.manager.co.th/RSS/Metrolife/Metrolife.xml'),
                           (u'ครอบครัว', u'http://www.manager.co.th/RSS/Family/Family.xml'),
                           (u'ชีวิตในรั้วมหาลัย', u'http://www.manager.co.th/RSS/Campus/Campus.xml'),
                           (u'บังเทิง', u'http://www.manager.co.th/RSS/Entertainment/Entertainment.xml'),
                           (u'ผู้จัดกวน', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=1052'),
                           (u'ธรรมะ - ผู้จัดการ', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8101&sourcenewsid=0'),
                           (u'ธรรมะ - ทั่วไป', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8100&sourcenewsid=0')
                      ]

    def print_version(self, url):
        return url.replace('http://www.manager.co.th/asp-bin/mgrview.aspx?', 'http://www.manager.co.th/asp-bin/PrintNews.aspx?')
If anyone actually try to use this, keep in mind that for sony reader you'll need to flash the firmware so that the default font has thai characters. I use Leelawadee.
Attached Files
File Type: zip ASTVMGR.zip (954 Bytes, 324 views)

Last edited by Hypernova; 02-17-2009 at 05:09 PM.
Hypernova is offline  
Old 02-17-2009, 08:13 AM   #216
kitzj0
Member
kitzj0 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
Missing text in custom feed

I am trying to create a custom feed of my local newspaper:

http://rss.cincinnati.com/apps/pbcs....enq01&mime=xml

I can get the feed in epub and I can preview the feed in Calibre. Everything looks fine with the table of contents, etc and when I click on an article the text appears. I then transfer the feed to the PRS-505, and everything looks fine with the table of contents (the article title appears) but when I click on the article all that is shown is a blank page.

Any ideasas to what I am doing wrong? I just entered the feed in the url under custom feed, do I need to add something in advanced? Total Newbie here to Calibre.

Kovidgoyal suggested I add

html2epub_options = 'linearize_tables = True'

which I did and In the viewer in calibre, the text is not formatted correctly and it pulls in alot of the newspaper graphics, etc. I even tried pulling in the print version by adding

def print_version(self, url):
return url + '&template=printart'

but that looks even worse in the viewer and upon transfer to the PRS-505, I do not get any text outside of the table of contents and a page with the title of the article and a two line text of the article.

Thanks!
kitzj0 is offline  
Old 02-17-2009, 09:22 AM   #217
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Your link to the rss is invalid.

Please post valid rss link or at least entire recipe code.
kiklop74 is offline  
Old 02-17-2009, 09:47 AM   #218
kitzj0
Member
kitzj0 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
Sorry about the link, here is the link:

http://rss.cincinnati.com/apps/pbcs....enq01&mime=xml

I am making progress now. Here is what I have so far:

Code:
class AdvancedUserRecipe1234144423(BasicNewsRecipe):
    title          = u'Cincinnati Enquirer'
    oldest_article = 7
    language       = _('English')
    __author__     = 'Joseph Kitzmiller'
    max_articles_per_feed = 100
    html2epub_options = 'linearize_tables = True' 
    
    feeds          = [(u'Cincinnati Enquirer', u'http://rss.cincinnati.com/apps/pbcs.dll/section?category=rssenq01&mime=xml')]

    def print_version(self, url):
        return url + '&template=printart'
Now I can get the article on the Sony. However when I click on the article from the table of contents the article appears but it is very light. The reader says it is formatting the article and takes several seconds for the reader to show the article. Occasionally, the reader will automatically go back to the table of contents. The HTML header from the newspaper appears and I think it is holding up the device. Is there someway to edit this out?

Thanks for your help!

Last edited by kitzj0; 02-17-2009 at 09:51 AM.
kitzj0 is offline  
Old 02-17-2009, 10:03 AM   #219
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
This is modified version of your recipe that should work better:

PHP Code:

class AdvancedUserRecipe1234144423(BasicNewsRecipe):
    
title          u'Cincinnati Enquirer'
    
oldest_article 7
    language       
_('English')
    
__author__     'Joseph Kitzmiller'
    
max_articles_per_feed 100    
    html2epub_options 
'linearize_tables = True' 

    
no_stylesheets        True
    use_embedded_content  
False
    remove_javascript     
True
    extra_css 
'body {text-align: left;}'
    
encoding 'cp1252'
    
    
feeds          = [(u'Cincinnati Enquirer'u'http://rss.cincinnati.com/apps/pbcs.dll/section?category=rssenq01&mime=xml')]

    
def print_version(selfurl):
        return 
url '&template=printart'

    
def preprocess_html(selfsoup):
        for 
item in soup.findAll(style=True):
            
del item['style']
        for 
item in soup.findAll(face=True):
            
del item['face']
        return 
soup 
kiklop74 is offline  
Old 02-17-2009, 10:26 AM   #220
kitzj0
Member
kitzj0 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
Thanks so much for your help kiklop74. I applied your code and transfered with calibre. However, I got the same result. When I click on article title, there is a pause of 10 seconds and then another 20 seconds for the article to appear without the formatting icon appearing on the middle of the screen.

However, when I use Sony's library software and transfer over the epub file with Sony software, everything works great. The article appears within a second.

My other feeds transfer over ok with Calibre.
kitzj0 is offline  
Old 02-17-2009, 11:24 AM   #221
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by kitzj0 View Post
Thanks so much for your help kiklop74. I applied your code and transfered with calibre. However, I got the same result. When I click on article title, there is a pause of 10 seconds and then another 20 seconds for the article to appear without the formatting icon appearing on the middle of the screen.

However, when I use Sony's library software and transfer over the epub file with Sony software, everything works great. The article appears within a second.

My other feeds transfer over ok with Calibre.
I really do not know what else to tell you. I'll check the feed out at home with the device itself.
kiklop74 is offline  
Old 02-17-2009, 11:36 AM   #222
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
After some testing I discovered that ditching tables before processing does the trick.

Try this recipe:

Code:
class AdvancedUserRecipe1234144423(BasicNewsRecipe):
    title          = u'Cincinnati Enquirer'
    oldest_article = 7
    language       = _('English')
    __author__     = 'Joseph Kitzmiller'
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    encoding = 'cp1252'
   
    keep_only_tags = [dict(name='div', attrs={'class':'padding'})]

    remove_tags = [
                     dict(name=['object','link','table','embed'])
                    ,dict(name='div',attrs={'id':'pluckcomments'})
                    ,dict(name='div',attrs={'class':'articleflex-container'})
                  ]
   
    feeds          = [(u'Cincinnati Enquirer', u'http://rss.cincinnati.com/apps/pbcs.dll/section?category=rssenq01&mime=xml')]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(face=True):
            del item['face']
        return soup
kiklop74 is offline  
Old 02-17-2009, 12:16 PM   #223
kitzj0
Member
kitzj0 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
Thanks for your time and help kiklop74!

However, that code puts me back to where I was originally. The table of contents shows up, but upon clicking article in Table of Contents, all I get is a blank screen. I appreciate what you have done. It is no problem to use the Sony Library software to transfer the feed. I figure it takes about the same amount of time to fetch the paper in the morning from outside.
kitzj0 is offline  
Old 02-17-2009, 12:19 PM   #224
kitzj0
Member
kitzj0 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
The navigation on the Cincinnati Enquirer website is horrible. My issues probably have something to do with poor website management.
kitzj0 is offline  
Old 02-17-2009, 12:20 PM   #225
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
This smells like some sort of bug in epub generation. I already reported similar behavior with some other epub.

I hope Kovid will have time to investigate this in depth.
kiklop74 is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 02:30 PM.


MobileRead.com is a privately owned, operated and funded community.