|  02-15-2009, 09:11 PM | #211 | 
| Hyperreader            Posts: 130 Karma: 28678 Join Date: Feb 2009 Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360 | 
				
				Recipe for manager.co.th
			 
			
			Hi everyone.  My first post here.  First of all I want to give a big THANK YOU to Kovid Goyal and everyone who help making calibre. I am trying to make a recipe for manager.co.th, a news site in Thai. Here is what I have so far. Code: class AdvancedUserRecipe1234529365(BasicNewsRecipe):
    title          = u'Manager Online'
    oldest_article = 7
    max_articles_per_feed = 100
    encoding              = 'cp874'
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    #keep_only_tags     = [dict(name='td', attrs={'class':'body'})]
    feeds          = [
                           (u'การเมือง', u'http://www.manager.co.th/RSS/Politics/Politics.xml'),
                           (u'กีฬา', u'http://www.manager.co.th/RSS/Sport/Sport.xml'),
                           (u'อาชญากรรมและกระบวนการยุติธรรม', u'http://www.manager.co.th/RSS/Crime/Crime.xml'),
                           (u'ภูมิภาค', u'http://www.manager.co.th/RSS/Local/Local.xml'),
                           (u'คุณภาพชีวิต', u'http://www.manager.co.th/RSS/QOL/QOL.xml'),
                           (u'เศรษฐกิจ', u'http://www.manager.co.th/RSS/Business/Business.xml'),
                           (u'เกม', u'http://www.manager.co.th/RSS/Game/Game.xml'),
                           (u'วิทยาศาสตร์', u'http://www.manager.co.th/RSS/Science/Science.xml'),
                           (u'ชีวิตในเมือง', u'http://www.manager.co.th/RSS/Metrolife/Metrolife.xml'),
                           (u'ครอบครัว', u'http://www.manager.co.th/RSS/Family/Family.xml'),
                           (u'ชีวิตในรั้วมหาลัย', u'http://www.manager.co.th/RSS/Campus/Campus.xml'),
                           (u'บังเทิง', u'http://www.manager.co.th/RSS/Entertainment/Entertainment.xml'),
                           (u'ผู้จัดกวน', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=1052'),
                           (u'ธรรมะ - ผู้จัดการ', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8101&sourcenewsid=0'),
                           (u'ธรรมะ - ทั่วไป', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8100&sourcenewsid=0')
                      ]
    def print_version(self, url):
        return url.replace('http://www.manager.co.th/asp-bin/mgrview.aspx?', 'http://www.manager.co.th/asp-bin/PrintNews.aspx?') | 
|   | 
|  02-16-2009, 07:24 AM | #212 | 
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | 
			
			This is what you need to add to your recipe: After remove_javascript=True insert this: Code:     html2lrf_options = ['--ignore-tables']    
    html2epub_options = 'linearize_tables = True'Code:     def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(align=True):
            del item['align']
        return soupThe second piece of code removes any style from html and any align tag (this resolving your align problem) | 
|   | 
|  02-16-2009, 05:47 PM | #213 | 
| Hyperreader            Posts: 130 Karma: 28678 Join Date: Feb 2009 Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360 | 
			
			It is still centered for some reasons.  I'm begin to think that maybe html2lrf just do that for Thai by default.  Is there a way to "forced" left-align on the html before the converter process it?  I am guessing that may help. The epub and mobi always crash both the calibre's viewer and the reader. I can only goes so far as the table of content for epub and only the first blank page for mobi And thank you for your help kiklop74. Here is the current (result is center-aligned) code. Code: class AdvancedUserRecipe1234529365(BasicNewsRecipe):
    title          = u'Manager Online'
    oldest_article = 7
    max_articles_per_feed = 100
    encoding              = 'cp874'
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    html2lrf_options = ['--ignore-tables']    
    html2epub_options = 'linearize_tables = True'
    keep_only_tags     = [dict(name='td', attrs={'class':'body'})]
    feeds          = [
                           (u'การเมือง', u'http://www.manager.co.th/RSS/Politics/Politics.xml')
                      ]
    def print_version(self, url):
        return url.replace('http://www.manager.co.th/asp-bin/mgrview.aspx?', 'http://www.manager.co.th/asp-bin/PrintNews.aspx?')
    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(align=True):
            del item['align']
        return soup | 
|   | 
|  02-16-2009, 06:41 PM | #214 | 
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | 
			
			After remove_javascript=true insert this : Code: extra_css = 'body{text-align: left}' | 
|   | 
|  02-16-2009, 09:59 PM | #215 | 
| Hyperreader            Posts: 130 Karma: 28678 Join Date: Feb 2009 Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360 | 
			
			*Edited*:  I just realized that maybe I shouldn't ask for help in this thread.  Should I make a new thread?  Or this is fine?  I'm very sorry if this is not an appropriate place to ask. Still centered. I'm out of idea. What confuse me is that the first code, which does nothing to the source except take out ccs and javescript give a properly left-aligned result. Do you have any suggestion? I tried to make a recipe for another thai newspaper site and it does not have this problem at all. By the way, the table of content is properly left-aligned. I flashed the firmware so that I get the default font that have thai characters. The result is not that good, unsurprisingly. There are four levels in thai writing system, and the reader just put the upper two at the same place. They also don't do a good job on where to begin a new line, but that maybe due to the converter rather than the reader itself since the same thing appear in calibre's viewer. Still readable though. I'm thinking about telling html2lrf to embedded a thai font if the recipe if actually share with others. EDIT2: Ok, here's my best try. Since I doubt anyone will use it, it'll just post it here. Thanks kiklop74 for your help. Code: class AdvancedUserRecipe1234529365(BasicNewsRecipe):
    title          = u'Manager Online'
    oldest_article = 7
    max_articles_per_feed = 100
    encoding              = 'cp874'
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_tags     = [dict(name='td', attrs={'align':'right'})]
    remove_tags     = [dict(name='td', attrs={'align':'left'})]
    html2lrf_options = ['--ignore-tables']
    html2epub_options = 'linearize_tables = True'
    feeds          = [
                           (u'การเมือง', u'http://www.manager.co.th/RSS/Politics/Politics.xml'),
                           (u'กีฬา', u'http://www.manager.co.th/RSS/Sport/Sport.xml'),
                           (u'อาชญากรรมและกระบวนการยุติธรรม', u'http://www.manager.co.th/RSS/Crime/Crime.xml'),
                           (u'ภูมิภาค', u'http://www.manager.co.th/RSS/Local/Local.xml'),
                           (u'คุณภาพชีวิต', u'http://www.manager.co.th/RSS/QOL/QOL.xml'),
                           (u'เศรษฐกิจ', u'http://www.manager.co.th/RSS/Business/Business.xml'),
                           (u'เกม', u'http://www.manager.co.th/RSS/Game/Game.xml'),
                           (u'วิทยาศาสตร์', u'http://www.manager.co.th/RSS/Science/Science.xml'),
                           (u'ชีวิตในเมือง', u'http://www.manager.co.th/RSS/Metrolife/Metrolife.xml'),
                           (u'ครอบครัว', u'http://www.manager.co.th/RSS/Family/Family.xml'),
                           (u'ชีวิตในรั้วมหาลัย', u'http://www.manager.co.th/RSS/Campus/Campus.xml'),
                           (u'บังเทิง', u'http://www.manager.co.th/RSS/Entertainment/Entertainment.xml'),
                           (u'ผู้จัดกวน', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=1052'),
                           (u'ธรรมะ - ผู้จัดการ', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8101&sourcenewsid=0'),
                           (u'ธรรมะ - ทั่วไป', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8100&sourcenewsid=0')
                      ]
    def print_version(self, url):
        return url.replace('http://www.manager.co.th/asp-bin/mgrview.aspx?', 'http://www.manager.co.th/asp-bin/PrintNews.aspx?')Last edited by Hypernova; 02-17-2009 at 05:09 PM. | 
|   | 
|  02-17-2009, 08:13 AM | #216 | 
| Member  Posts: 13 Karma: 10 Join Date: Feb 2009 Device: PRS-505 | 
				
				Missing text in custom feed
			 
			
			I am trying to create a custom feed of my local newspaper: http://rss.cincinnati.com/apps/pbcs....enq01&mime=xml I can get the feed in epub and I can preview the feed in Calibre. Everything looks fine with the table of contents, etc and when I click on an article the text appears. I then transfer the feed to the PRS-505, and everything looks fine with the table of contents (the article title appears) but when I click on the article all that is shown is a blank page. Any ideasas to what I am doing wrong? I just entered the feed in the url under custom feed, do I need to add something in advanced? Total Newbie here to Calibre. Kovidgoyal suggested I add html2epub_options = 'linearize_tables = True' which I did and In the viewer in calibre, the text is not formatted correctly and it pulls in alot of the newspaper graphics, etc. I even tried pulling in the print version by adding def print_version(self, url): return url + '&template=printart' but that looks even worse in the viewer and upon transfer to the PRS-505, I do not get any text outside of the table of contents and a page with the title of the article and a two line text of the article. Thanks! | 
|   | 
|  02-17-2009, 09:22 AM | #217 | 
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | 
			
			Your link to the rss is invalid.  Please post valid rss link or at least entire recipe code. | 
|   | 
|  02-17-2009, 09:47 AM | #218 | 
| Member  Posts: 13 Karma: 10 Join Date: Feb 2009 Device: PRS-505 | 
			
			Sorry about the link, here is the link: http://rss.cincinnati.com/apps/pbcs....enq01&mime=xml I am making progress now. Here is what I have so far: Code: class AdvancedUserRecipe1234144423(BasicNewsRecipe):
    title          = u'Cincinnati Enquirer'
    oldest_article = 7
    language       = _('English')
    __author__     = 'Joseph Kitzmiller'
    max_articles_per_feed = 100
    html2epub_options = 'linearize_tables = True' 
    
    feeds          = [(u'Cincinnati Enquirer', u'http://rss.cincinnati.com/apps/pbcs.dll/section?category=rssenq01&mime=xml')]
    def print_version(self, url):
        return url + '&template=printart'Thanks for your help! Last edited by kitzj0; 02-17-2009 at 09:51 AM. | 
|   | 
|  02-17-2009, 10:03 AM | #219 | 
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | 
			
			This is modified version of your recipe that should work better: PHP Code: 
			 | 
|   | 
|  02-17-2009, 10:26 AM | #220 | 
| Member  Posts: 13 Karma: 10 Join Date: Feb 2009 Device: PRS-505 | 
			
			Thanks so much for your help kiklop74. I applied your code and transfered with calibre. However, I got the same result. When I click on article title, there is a pause of 10 seconds and then another 20 seconds for the article to appear without the formatting icon appearing on the middle of the screen. However, when I use Sony's library software and transfer over the epub file with Sony software, everything works great. The article appears within a second. My other feeds transfer over ok with Calibre. | 
|   | 
|  02-17-2009, 11:24 AM | #221 | |
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | Quote: 
 | |
|   | 
|  02-17-2009, 11:36 AM | #222 | 
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | 
			
			After some testing I discovered that ditching tables before processing does the trick. Try this recipe: Code: class AdvancedUserRecipe1234144423(BasicNewsRecipe):
    title          = u'Cincinnati Enquirer'
    oldest_article = 7
    language       = _('English')
    __author__     = 'Joseph Kitzmiller'
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    encoding = 'cp1252'
   
    keep_only_tags = [dict(name='div', attrs={'class':'padding'})]
    remove_tags = [
                     dict(name=['object','link','table','embed'])
                    ,dict(name='div',attrs={'id':'pluckcomments'})
                    ,dict(name='div',attrs={'class':'articleflex-container'})
                  ]
   
    feeds          = [(u'Cincinnati Enquirer', u'http://rss.cincinnati.com/apps/pbcs.dll/section?category=rssenq01&mime=xml')]
    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(face=True):
            del item['face']
        return soup | 
|   | 
|  02-17-2009, 12:16 PM | #223 | 
| Member  Posts: 13 Karma: 10 Join Date: Feb 2009 Device: PRS-505 | 
			
			Thanks for your time and help kiklop74! However, that code puts me back to where I was originally. The table of contents shows up, but upon clicking article in Table of Contents, all I get is a blank screen. I appreciate what you have done. It is no problem to use the Sony Library software to transfer the feed. I figure it takes about the same amount of time to fetch the paper in the morning from outside. | 
|   | 
|  02-17-2009, 12:19 PM | #224 | 
| Member  Posts: 13 Karma: 10 Join Date: Feb 2009 Device: PRS-505 | 
			
			The navigation on the Cincinnati Enquirer website is horrible. My issues probably have something to do with poor website management.
		 | 
|   | 
|  02-17-2009, 12:20 PM | #225 | 
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | 
			
			This smells like some sort of bug in epub generation. I already reported similar behavior with some other epub. I hope Kovid will have time to investigate this in depth. | 
|   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM | 
| Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM | 
| How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM | 
| Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM | 
| Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |