![]() |
#2116 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
That is as far as I can help you. This is starting to be really complicated and my time is required elsewhere. In your place I'd just leave the links, they do not obstruct the main text so much.
|
![]() |
![]() |
#2117 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Sorry, but no you're not.
![]() Quote:
Quote:
Quote:
Read this and this and this and this. (Particularly the last one on BeautifulSoup) Last edited by Starson17; 06-16-2010 at 03:13 PM. |
|||
![]() |
Advert | |
|
![]() |
#2118 | |
Zealot
![]() Posts: 137
Karma: 61
Join Date: Jun 2006
Location: Gijón, Spain
Device: Kindle 3G+WiFi & Galaxy Note
|
Quote:
I skimmed through that last link and I kind of understand what you mean. I only did a little Java at University, and I see I am biting more than I can chew here. I will leave it as it is and submit a ticket for replacing the old recipe which my own, which at least works 95%. |
|
![]() |
![]() |
#2119 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
![]() |
![]() |
#2120 | |
Zealot
![]() Posts: 137
Karma: 61
Join Date: Jun 2006
Location: Gijón, Spain
Device: Kindle 3G+WiFi & Galaxy Note
|
Quote:
1. Open a print dialog box that will print the current page as it shows, with all the icons, comments, menus and other garbage. 2. Open a pop-up window saying there's been a bad server request. So, not very useful. Also, the RSS is awful. Sometimes it gives links as www.sociedade.publico.pt, sometimes as www.publico.pt/sociedade, sometimes as www.publico.pt, etc, etc I cannot make head or tails out of it, really. I know, this newspaper website is a mess structurally and otherwise. But it's my favourite Portuguese newspaper (very popular there, too) and I gotta keep learning that beautiful language. ![]() |
|
![]() |
Advert | |
|
![]() |
#2121 | |
Zealot
![]() Posts: 137
Karma: 61
Join Date: Jun 2006
Location: Gijón, Spain
Device: Kindle 3G+WiFi & Galaxy Note
|
Quote:
![]() I have just uploaded the (mostly) working recipe to the tracker: http://bugs.calibre-ebook.com/ticket/5854 |
|
![]() |
![]() |
#2122 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Well, if you decide you want to dip into the soup, let us know. Other than that, I don't see any way to deal with the structure at that site. Even if you decide to go that way, it's quite likely they will change the site and break all your work. The less organized and more random the site organization, the harder it is to make reliable recipes.
|
![]() |
![]() |
#2123 |
Zealot
![]() Posts: 137
Karma: 61
Join Date: Jun 2006
Location: Gijón, Spain
Device: Kindle 3G+WiFi & Galaxy Note
|
I am afraid I found some more problems. I don't really mind issues 2-4, but would like to solve them if it's easy. Issue 1, however, is more of a critical error.
Issue 1: Some articles show up with completely garbled text (see "gardbledText.jpg"), both in Calibre and in my PRS-300. Every time I download the news, the articles that show up corrupt are different ones, so it's not an issue with a specific article. Problem with the server? Issue 2: I had to delete the "Ecosfera" feed from the recipe, because it was making my PRS-300 freeze & reboot, although the articles from said feed displayed just fine on Calibre. As a result, some articles from the main feed (which conform to the "Ecosfera" structure) are showing up empty on the resulting ebook. This also happens with articles from other feeds, which are completely empty, such as http://desporto.publico.pt/noticia.aspx?id=1442218 Is there an EASY way to say, "if you find an empty article, delete it from the book and from the TOC"? Issue 3: Sometimes the feed provides the same article twice. For instance, "Proposta de composição no exame do 9º ano provocou mais um corrupio nas escolas" under the "Educação" section appears twice, with the same URL, the same title and the same exact content. Is there an EASY way to say, "if you find repeated articles, delete all of them except for the newest one"? Issue 4: Some articles have the "Next" link disabled. Under PRS-300, I cannot navigate to them. Under Calibre, clicking on them makes no difference. This happens with the "Australiano Tim Cahill suspenso por um jogo" (9th) article from the "Desporto" section, for instance. Any EASY way to solve this? I ran the recipe with the debugging parameters as follow: ebook-convert publico_pt_test.recipe .epub -vv --debug-pipeline p --extract-to x I ran the resulting ePUB through Adobe's Epubcheck (http://code.google.com/p/epubcheck/) and it returned hundreds of errors. Is this normal? Attached: 1. parsing_debug.zip > Results of debugging with -vv 2. ebook-convert_log.txt > Terminal messages from debugging 3. epubcheck_log.txt > Results of epubcheck for compliance 4. gardbledText.jpg > Garbled text on my Reader 5. publico_pt_test.epub > ePUB with today's news 6. publico_pt_test.txt > Current recipe |
![]() |
![]() |
#2124 |
Member
![]() Posts: 12
Karma: 10
Join Date: May 2010
Device: Nook
|
Cyanide And Happiness?
Any progress on the Cyanide & Happiness request?
Here are the links... The website is http://www.explosm.net/comics/ and the RSS is: http://feeds.feedburner.com/Explosm I would really, really, really appreciate it if someone could help me with this. Thanks so much!!! |
![]() |
![]() |
#2125 | |||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
There's a lot there. I'll take an initial stab at it.
Quote:
Quote:
Quote:
Quote:
link, are you following to the next page or the next article. If the former, I'd be looking at multipage code. If the latter, I'd hope the article was already in the feed. Quote:
Sorry I can't help more. |
|||||
![]() |
![]() |
#2126 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
![]() |
![]() |
#2127 | ||
Zealot
![]() Posts: 137
Karma: 61
Join Date: Jun 2006
Location: Gijón, Spain
Device: Kindle 3G+WiFi & Galaxy Note
|
As always, thanks a lot for your help, Starson17.
Quote:
Quote:
It's just going to the next article, there's no multipage used in these feeds. Yes, the article is already in the feed, as I can get there one pageturn at a time. |
||
![]() |
![]() |
#2128 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
You're welcome. Be aware, I'm no expert, but I've been able to make the recipes do anything I've really tried to get them to do, so I've wandered through many different parts.
Quote:
Code:
print 'The preprocess soup is: ', soup Quote:
Quote:
Three methods of stripping I typically use: 1) Use the remove_tags, keep_only_tags, etc. This is easy. 2) Use preprocess_html(soup), find your tag, use .extract() This is only a bit harder. 3) Get down and dirty with .preprocess_regexps. You provide a list of regexp substitution rules to run on the downloaded html. Each element of the list is a two element tuple. The first element of the tuple is a compiled regular expression and the second a callable that takes a single match object and returns a string to replace the match. It's basically text-based, not tag-based, search and replace in the html. You can remove tags, change tags, fix broken tags, change links, etc. It's very flexible for difficult situations. |
|||
![]() |
![]() |
#2129 | ||
Zealot
![]() Posts: 137
Karma: 61
Join Date: Jun 2006
Location: Gijón, Spain
Device: Kindle 3G+WiFi & Galaxy Note
|
Quote:
I tried limiting simultaneous_downloads to 1, but that didn't solve the issue. Quote:
I downloaded the news to LRF instead and noticed that the "Next" text did not even had link formatting in Calibre, while it did have link formatting in ePUB, but didn't work. It's like there is no link at all, rather than a non-active link. |
||
![]() |
![]() |
#2130 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
The last "Next" link is invalid if there is no next feed. I suppose it's a bug, but not one I notice, as I don't use the navbar. If your next article's index.html isn't built, that would make an invalid Next link. |
||
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |