|
|
#1 |
|
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Hi All!
I'm trying to clean up some really messy HTML newspaper site's page. They are heavily using tables.In my recipe I was able to find the needed content, and extract it via keeponly_tags, and remove_tags. Spoiler:
But the article(s) are in an inner table/(thead|tr/td). Which - if I convert the recipe to mobi for my Kindle - doesn't look good. Actually Only the first screen is filled with the text, and the second page is empty. So I tried to get rid of the unnecessary tags, but without luck. I tried postprocess_html: Spoiler:
But it gave me a TypeError: Spoiler:
Then I had tried preprocess_regexps, but it gave me empty article pages Spoiler:
The recipe in its actual state (which works fine if you are creating e.g. PDF output) can be reached here: https://github.com/zsoltika/.hu-reci...0_1_nap.recipe So my question is: after cleaning up the articles html via keeponly_tags, and remove_tags, how does one replace some tags - in my case: table, thead, tfoot, tr, td; BUT only the tag names, not their contents! - with another tag name (e.g. </?span>)? And one more thing popped into my mind: wouldn't it be nicer, if the various api callables/overrides etc. at http://calibre-ebook.com/user_manual/news_recipe.html will be numbered? I mean I don't get which applies earlier in the process from ['preprocess_html', 'preprocess_regexps', 'keeponly_tags', 'remove_tags']. Thanks for any help! |
|
|
|
|
|
#2 | |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
'linearize_tables' : True Code:
def postprocess_html(self, soup, first_fetch):
for t in soup.findAll(['table', 'tr', 'td']):
t.name = 'div'
|
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Worked like a charme, thank You!
|
|
|
|
![]() |
| Tags |
| recipes, replacewith, tables |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Replacing my Sony with K3? | cognym | Amazon Kindle | 61 | 02-02-2011 05:02 PM |
| Replacing my new Kobo - again! | objectman | Kobo Reader | 7 | 09-20-2010 09:00 PM |
| Replacing the battery | AprilHare | Sony Reader | 12 | 04-29-2009 02:08 PM |
| Replacing ¬ | PieOPah | Workshop | 5 | 12-17-2008 05:25 PM |
| iLiad Replacing the contentlister | tribble | iRex Developer's Corner | 21 | 06-22-2007 04:58 PM |