06-18-2011, 07:43 AM | #1 |
Addict
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Recipe: Out and About : Camping and Caravan - News and Reviews
******* WARNING***********
I've just found out the epub created, which displays fine in calibre, crashes my prs300. If I convert to LRF that's fine. I've looked at the html in the epub and it looks basic enough- maybe someone can shed some light? ************************** Code:
import time, re class AdvancedUserRecipe1306061239(BasicNewsRecipe): title = u'Out and about live' description = 'Camping and Caravan - News and Reviews' author = 'Dave Asbury' cover_url= 'http://www.outandaboutlive.co.uk/img/template/footer/illustration_3.jpg' masthead_url = 'http://www.outandaboutlive.co.uk/img/template/cloud_logo.gif' oldest_article = 56 max_articles_per_feed = 100 remove_empty_feeds = True remove_javascript = True no_stylesheets = True preprocess_regexps = [ (re.compile(r'Other News'), lambda h2 : ''), (re.compile(r'Magazines'), lambda h4 : '') ] keep_only_tags = [ dict(attrs={'class':['Content']}) ] remove_tags = [ dict(attrs={'class' : ['ItemSummary','Buttons','jcarousel-skin-oal_magselector']}) #,dict(name='h4', attrs={'Magazines'}) ] remove_attributes = ['Other News'] feeds = [(u'Camping News', u'http://feeds.feedburner.com/OAL/News/Camping'), (u'Camping Features', u'http://feeds.feedburner.com/OAL/Features/Camping'), (u'Camping Reviews',u'http://feeds.feedburner.com/OAL/Reviews/Camping'), (u'Caravan News',u'http://feeds.feedburner.com/OAL/News/Caravans'), (u'Caravan Features',u'http://feeds.feedburner.com/OAL/Features/Caravans'), (u'Caravan Reviews',u'http://feeds.feedburner.com/OAL/Reviews/Caravans') ] Last edited by scissors; 06-18-2011 at 08:11 AM. |
06-18-2011, 09:16 AM | #2 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
We'd need your prs300. Some bad html from the site will cause those devices to crash. You need to split up the html until you find the bad code, then modify the recipe to remove/fix it.
|
06-18-2011, 12:54 PM | #3 | |
Addict
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Quote:
I removed various tags until all I had was calibre menus and the article pages only consisted of the calibre headers and footers. Same results - loads click through menus - crash on selection of actual article. I'm pretty much stuck as the html in the epub all seemed okay. It also crashes the sony library preview on the pc.... Last edited by scissors; 06-18-2011 at 01:21 PM. |
|
06-18-2011, 01:32 PM | #4 |
creator of calibre
Posts: 43,776
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Figuring out what is causing ADE to crash is a bit of an art. The way I would approach it is:
1) Use --test to generate a smaller epub 2) Delete the html file from the epub one by one until you find the one that causing the crash 3) Comment out/delete parts of that html file until you isolate what in the html is causing the crash Since you say your have already done 3, try running the epub against an epub validator like flightcrew and see if there are errors in the OPF/NCX file. |
06-18-2011, 02:39 PM | #5 | |
Addict
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Quote:
both show there are lots of attribute 'property' is not declared for element 'meta' errors. I'm guessing these are from lines such as <meta property="og:type" content="article"/> The other errors are Out and about live [Sat, 18 Jun 2011] - calibre.epub/feed_0/article_6/index.html NA This OPS document is reachable but not present in the OPF <spine>. I thought that was the problem - referencing a no existant file as There is no article_6 folder in the epub, but then again I looked at the index htmls and i couldn't find a refernce to the folder however in some of the other articles the calibre navbar has "../article_6/index.html" in the source. When I browse these sources in firefox clicking on the various article_x/indexhtml links opens a new source except the article_6 links throws a file not found error. Last edited by scissors; 06-18-2011 at 02:42 PM. |
|
06-18-2011, 02:47 PM | #6 |
creator of calibre
Posts: 43,776
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
unreferenced files and or missing files dont cause a crash. try removing the meta tags. In general, remove the entire <head> section
|
06-18-2011, 03:33 PM | #7 | |
Addict
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Quote:
I added a remove tag name = head. no difference. Unfortunately I've now spent the best part of a day on this and my missus is proper miffed. I'm now spending more time twiddling than actually reading. Looks like i'll be setting my default output back to LRF. I'm off for a beer. Thanks for a great program BTW |
|
06-19-2011, 02:49 AM | #8 |
Addict
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
I've noticed the link for the print version is the same as the standard with "print-" sandwiched in after the last slash. I thought i might try that instead....
I did some trawling and found starsons thread regarding the join/split by slash method. It seems ideal as i counted the slashs and there were 6, only the string to be inserted changed. The two links end as... Cornish-campsite/_ch3_nw1433 Cornish-campsite/Print-_ch3_nw1433 So I thought it was just a case of adding code after the feeds=[] command such as def print_version(self,url): segments = URL.split('/') printURL = '/'.join(segments[0:6]) + '/Print-' + ''.join(segments[6:]) return printURL I thought this took the url, split it,added '/Print-' and then added the end of the url back on. it appears to do nothing and the original feeds are still fetched. I've obviously mis-understood how this works. Can anyone help? Also is it possible to echo a string to screen so you can test what's going on? EDit. I just used the bbc print_version example pasted in after the feeds ie: def print_version(self, url): return url.replace('http://', 'http://newsvote.bbc.co.uk/mpapps/pagetools/print/') in my reasoning this should have failed as each article would be for example http://newsvote.bbc.co.uk/mpapps/pag...te/_ch3_nw1433 which wouldn't exist - but the origianl article is still fetched. I'm baffled now - any help? thanks guys Last edited by scissors; 06-19-2011 at 06:16 AM. |
06-19-2011, 09:45 AM | #9 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
1) Use CODE tags when posting here (the hash). They are needed to be sure you are using correct indents. (Are you?) 2) Post the complete recipe so we can test it. 3) Use SPOILER tags here to make the thread easier to read (the X-ed eye). 4) In your recipe, insert print statements. They will print to the log Code:
#define variable here print 'The variable is: ', variable Code:
ebook-convert test.recipe test_1 --test -vv> Test.txt I suspect you haven't got def print_version(self, url): correctly indented so it doesn't run. |
|
06-19-2011, 12:20 PM | #10 |
Addict
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Hi Starson.
Sorry I never noticed all those controls. Hopefully this is better. I finally got the code to replace the URL with the print version. But it made no difference. Following on from Kovid and yourselfs tips I loaded the various articles into notepad++ and deleted the entire contents between <head></head>. This stopped the crash on the articles where it's removed. (I've attached a copy of the resultant epub with the header from the first article removed). The only effect is the "previous next section and main" calibre generated header is larger text (and no crash of the sony). Here is the recipe as it stands Spoiler:
Here is the contents of the header I removed from the first article Spoiler:
I did at 1 point remove the <head> from the second article - it too stopped crashing. Can a post process be done to remove <head></head> contents a second run so to speak. Is it possible there is a bug in Calibre (I'm on a course for 2 weeks tomorrow so replies may be difficult) Edit forgot to attach epub - attached next message Last edited by scissors; 06-19-2011 at 02:49 PM. |
06-19-2011, 02:09 PM | #11 |
Addict
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Attached epub of where first article has entire <head> section removed - and does not crash prs300.
Following articles with <head> intact do crash sony. How to make Calibre not generate this header, or at least one that won't crash the prs300? |
06-19-2011, 02:11 PM | #12 |
creator of calibre
Posts: 43,776
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Add
preprocess_regexps = [(re.compile('r<head.*?</head>', re.DOTALL), lambda m:'')] |
06-19-2011, 02:46 PM | #13 |
Addict
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
|
06-19-2011, 02:57 PM | #14 |
creator of calibre
Posts: 43,776
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
no_stylesheets = True
|
06-19-2011, 03:11 PM | #15 |
Addict
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
For Testing: Roger Ebert (movie reviews) Recipe | spedinfargo | Recipes | 5 | 02-19-2011 09:32 PM |
Recipe for KA-News.de | tfeld | Recipes | 0 | 12-30-2010 05:45 PM |
Help with news recipe | Acey | Calibre | 2 | 03-12-2010 06:36 AM |
Gadget Lab Hardware News and Reviews Amazon Dumps Sprint for Kindle 2, Embraces AT&T | DMcCunney | News | 2 | 10-26-2009 12:10 PM |
PRS-505 reviews: CNET (7/10), ABC News | TadW | Sony Reader | 0 | 11-15-2007 10:59 AM |