![]() |
#211 |
Hyperreader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
|
Recipe for manager.co.th
Hi everyone. My first post here. First of all I want to give a big THANK YOU to Kovid Goyal and everyone who help making calibre.
I am trying to make a recipe for manager.co.th, a news site in Thai. Here is what I have so far. Code:
class AdvancedUserRecipe1234529365(BasicNewsRecipe): title = u'Manager Online' oldest_article = 7 max_articles_per_feed = 100 encoding = 'cp874' no_stylesheets = True use_embedded_content = False remove_javascript = True #keep_only_tags = [dict(name='td', attrs={'class':'body'})] feeds = [ (u'การเมือง', u'http://www.manager.co.th/RSS/Politics/Politics.xml'), (u'กีฬา', u'http://www.manager.co.th/RSS/Sport/Sport.xml'), (u'อาชญากรรมและกระบวนการยุติธรรม', u'http://www.manager.co.th/RSS/Crime/Crime.xml'), (u'ภูมิภาค', u'http://www.manager.co.th/RSS/Local/Local.xml'), (u'คุณภาพชีวิต', u'http://www.manager.co.th/RSS/QOL/QOL.xml'), (u'เศรษฐกิจ', u'http://www.manager.co.th/RSS/Business/Business.xml'), (u'เกม', u'http://www.manager.co.th/RSS/Game/Game.xml'), (u'วิทยาศาสตร์', u'http://www.manager.co.th/RSS/Science/Science.xml'), (u'ชีวิตในเมือง', u'http://www.manager.co.th/RSS/Metrolife/Metrolife.xml'), (u'ครอบครัว', u'http://www.manager.co.th/RSS/Family/Family.xml'), (u'ชีวิตในรั้วมหาลัย', u'http://www.manager.co.th/RSS/Campus/Campus.xml'), (u'บังเทิง', u'http://www.manager.co.th/RSS/Entertainment/Entertainment.xml'), (u'ผู้จัดกวน', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=1052'), (u'ธรรมะ - ผู้จัดการ', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8101&sourcenewsid=0'), (u'ธรรมะ - ทั่วไป', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8100&sourcenewsid=0') ] def print_version(self, url): return url.replace('http://www.manager.co.th/asp-bin/mgrview.aspx?', 'http://www.manager.co.th/asp-bin/PrintNews.aspx?') |
![]() |
![]() |
#212 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
This is what you need to add to your recipe:
After remove_javascript=True insert this: Code:
html2lrf_options = ['--ignore-tables'] html2epub_options = 'linearize_tables = True' Code:
def preprocess_html(self, soup): for item in soup.findAll(style=True): del item['style'] for item in soup.findAll(align=True): del item['align'] return soup The second piece of code removes any style from html and any align tag (this resolving your align problem) |
![]() |
Advert | |
|
![]() |
#213 |
Hyperreader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
|
It is still centered for some reasons. I'm begin to think that maybe html2lrf just do that for Thai by default. Is there a way to "forced" left-align on the html before the converter process it? I am guessing that may help.
The epub and mobi always crash both the calibre's viewer and the reader. I can only goes so far as the table of content for epub and only the first blank page for mobi And thank you for your help kiklop74. Here is the current (result is center-aligned) code. Code:
class AdvancedUserRecipe1234529365(BasicNewsRecipe): title = u'Manager Online' oldest_article = 7 max_articles_per_feed = 100 encoding = 'cp874' no_stylesheets = True use_embedded_content = False remove_javascript = True html2lrf_options = ['--ignore-tables'] html2epub_options = 'linearize_tables = True' keep_only_tags = [dict(name='td', attrs={'class':'body'})] feeds = [ (u'การเมือง', u'http://www.manager.co.th/RSS/Politics/Politics.xml') ] def print_version(self, url): return url.replace('http://www.manager.co.th/asp-bin/mgrview.aspx?', 'http://www.manager.co.th/asp-bin/PrintNews.aspx?') def preprocess_html(self, soup): for item in soup.findAll(style=True): del item['style'] for item in soup.findAll(align=True): del item['align'] return soup |
![]() |
![]() |
#214 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
After remove_javascript=true insert this :
Code:
extra_css = 'body{text-align: left}' |
![]() |
![]() |
#215 |
Hyperreader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
|
*Edited*: I just realized that maybe I shouldn't ask for help in this thread. Should I make a new thread? Or this is fine? I'm very sorry if this is not an appropriate place to ask.
Still centered. I'm out of idea. What confuse me is that the first code, which does nothing to the source except take out ccs and javescript give a properly left-aligned result. Do you have any suggestion? I tried to make a recipe for another thai newspaper site and it does not have this problem at all. By the way, the table of content is properly left-aligned. I flashed the firmware so that I get the default font that have thai characters. The result is not that good, unsurprisingly. There are four levels in thai writing system, and the reader just put the upper two at the same place. They also don't do a good job on where to begin a new line, but that maybe due to the converter rather than the reader itself since the same thing appear in calibre's viewer. Still readable though. I'm thinking about telling html2lrf to embedded a thai font if the recipe if actually share with others. EDIT2: Ok, here's my best try. Since I doubt anyone will use it, it'll just post it here. Thanks kiklop74 for your help. Code:
class AdvancedUserRecipe1234529365(BasicNewsRecipe): title = u'Manager Online' oldest_article = 7 max_articles_per_feed = 100 encoding = 'cp874' no_stylesheets = True use_embedded_content = False remove_javascript = True remove_tags = [dict(name='td', attrs={'align':'right'})] remove_tags = [dict(name='td', attrs={'align':'left'})] html2lrf_options = ['--ignore-tables'] html2epub_options = 'linearize_tables = True' feeds = [ (u'การเมือง', u'http://www.manager.co.th/RSS/Politics/Politics.xml'), (u'กีฬา', u'http://www.manager.co.th/RSS/Sport/Sport.xml'), (u'อาชญากรรมและกระบวนการยุติธรรม', u'http://www.manager.co.th/RSS/Crime/Crime.xml'), (u'ภูมิภาค', u'http://www.manager.co.th/RSS/Local/Local.xml'), (u'คุณภาพชีวิต', u'http://www.manager.co.th/RSS/QOL/QOL.xml'), (u'เศรษฐกิจ', u'http://www.manager.co.th/RSS/Business/Business.xml'), (u'เกม', u'http://www.manager.co.th/RSS/Game/Game.xml'), (u'วิทยาศาสตร์', u'http://www.manager.co.th/RSS/Science/Science.xml'), (u'ชีวิตในเมือง', u'http://www.manager.co.th/RSS/Metrolife/Metrolife.xml'), (u'ครอบครัว', u'http://www.manager.co.th/RSS/Family/Family.xml'), (u'ชีวิตในรั้วมหาลัย', u'http://www.manager.co.th/RSS/Campus/Campus.xml'), (u'บังเทิง', u'http://www.manager.co.th/RSS/Entertainment/Entertainment.xml'), (u'ผู้จัดกวน', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=1052'), (u'ธรรมะ - ผู้จัดการ', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8101&sourcenewsid=0'), (u'ธรรมะ - ทั่วไป', u'http://manager.co.th/rss/getRSS.aspx?browsenewsid=8100&sourcenewsid=0') ] def print_version(self, url): return url.replace('http://www.manager.co.th/asp-bin/mgrview.aspx?', 'http://www.manager.co.th/asp-bin/PrintNews.aspx?') Last edited by Hypernova; 02-17-2009 at 05:09 PM. |
![]() |
Advert | |
|
![]() |
#216 |
Member
![]() Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
|
Missing text in custom feed
I am trying to create a custom feed of my local newspaper:
http://rss.cincinnati.com/apps/pbcs....enq01&mime=xml I can get the feed in epub and I can preview the feed in Calibre. Everything looks fine with the table of contents, etc and when I click on an article the text appears. I then transfer the feed to the PRS-505, and everything looks fine with the table of contents (the article title appears) but when I click on the article all that is shown is a blank page. Any ideasas to what I am doing wrong? I just entered the feed in the url under custom feed, do I need to add something in advanced? Total Newbie here to Calibre. Kovidgoyal suggested I add html2epub_options = 'linearize_tables = True' which I did and In the viewer in calibre, the text is not formatted correctly and it pulls in alot of the newspaper graphics, etc. I even tried pulling in the print version by adding def print_version(self, url): return url + '&template=printart' but that looks even worse in the viewer and upon transfer to the PRS-505, I do not get any text outside of the table of contents and a page with the title of the article and a two line text of the article. Thanks! |
![]() |
![]() |
#217 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Your link to the rss is invalid.
Please post valid rss link or at least entire recipe code. |
![]() |
![]() |
#218 |
Member
![]() Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
|
Sorry about the link, here is the link:
http://rss.cincinnati.com/apps/pbcs....enq01&mime=xml I am making progress now. Here is what I have so far: Code:
class AdvancedUserRecipe1234144423(BasicNewsRecipe): title = u'Cincinnati Enquirer' oldest_article = 7 language = _('English') __author__ = 'Joseph Kitzmiller' max_articles_per_feed = 100 html2epub_options = 'linearize_tables = True' feeds = [(u'Cincinnati Enquirer', u'http://rss.cincinnati.com/apps/pbcs.dll/section?category=rssenq01&mime=xml')] def print_version(self, url): return url + '&template=printart' Thanks for your help! Last edited by kitzj0; 02-17-2009 at 09:51 AM. |
![]() |
![]() |
#219 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
This is modified version of your recipe that should work better:
PHP Code:
|
![]() |
![]() |
#220 |
Member
![]() Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
|
Thanks so much for your help kiklop74. I applied your code and transfered with calibre. However, I got the same result. When I click on article title, there is a pause of 10 seconds and then another 20 seconds for the article to appear without the formatting icon appearing on the middle of the screen.
However, when I use Sony's library software and transfer over the epub file with Sony software, everything works great. The article appears within a second. My other feeds transfer over ok with Calibre. |
![]() |
![]() |
#221 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Quote:
|
|
![]() |
![]() |
#222 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
After some testing I discovered that ditching tables before processing does the trick.
Try this recipe: Code:
class AdvancedUserRecipe1234144423(BasicNewsRecipe): title = u'Cincinnati Enquirer' oldest_article = 7 language = _('English') __author__ = 'Joseph Kitzmiller' max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False remove_javascript = True encoding = 'cp1252' keep_only_tags = [dict(name='div', attrs={'class':'padding'})] remove_tags = [ dict(name=['object','link','table','embed']) ,dict(name='div',attrs={'id':'pluckcomments'}) ,dict(name='div',attrs={'class':'articleflex-container'}) ] feeds = [(u'Cincinnati Enquirer', u'http://rss.cincinnati.com/apps/pbcs.dll/section?category=rssenq01&mime=xml')] def preprocess_html(self, soup): for item in soup.findAll(style=True): del item['style'] for item in soup.findAll(face=True): del item['face'] return soup |
![]() |
![]() |
#223 |
Member
![]() Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
|
Thanks for your time and help kiklop74!
However, that code puts me back to where I was originally. The table of contents shows up, but upon clicking article in Table of Contents, all I get is a blank screen. I appreciate what you have done. It is no problem to use the Sony Library software to transfer the feed. I figure it takes about the same amount of time to fetch the paper in the morning from outside. |
![]() |
![]() |
#224 |
Member
![]() Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
|
The navigation on the Cincinnati Enquirer website is horrible. My issues probably have something to do with poor website management.
|
![]() |
![]() |
#225 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
This smells like some sort of bug in epub generation. I already reported similar behavior with some other epub.
I hope Kovid will have time to investigate this in depth. |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |