|
|
#1 |
|
Enthusiast
![]() Posts: 35
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
I'm trying to clean up some really messy HTML newspaper site's page. They are heavily using tables.In my recipe I was able to find the needed content, and extract it via keeponly_tags, and remove_tags. Spoiler:
But the article(s) are in an inner table/(thead|tr/td). Which - if I convert the recipe to mobi for my Kindle - doesn't look good. Actually Only the first screen is filled with the text, and the second page is empty. So I tried to get rid of the unnecessary tags, but without luck. I tried postprocess_html: Spoiler:
But it gave me a TypeError: Spoiler:
Then I had tried preprocess_regexps, but it gave me empty article pages Spoiler:
The recipe in its actual state (which works fine if you are creating e.g. PDF output) can be reached here: https://github.com/zsoltika/.hu-reci...0_1_nap.recipe So my question is: after cleaning up the articles html via keeponly_tags, and remove_tags, how does one replace some tags - in my case: table, thead, tfoot, tr, td; BUT only the tag names, not their contents! - with another tag name (e.g. </?span>)? And one more thing popped into my mind: wouldn't it be nicer, if the various api callables/overrides etc. at http://calibre-ebook.com/user_manual/news_recipe.html will be numbered? I mean I don't get which applies earlier in the process from ['preprocess_html', 'preprocess_regexps', 'keeponly_tags', 'remove_tags']. Thanks for any help! |
|
|
|
|
|
#2 | |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,994
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
'linearize_tables' : True Code:
def postprocess_html(self, soup, first_fetch):
for t in soup.findAll(['table', 'tr', 'td']):
t.name = 'div'
__________________
The speed limit on our spacetime highway of life is c - lightspeed. Surprisingly, the posted minimum is also c. We can go no faster, nor any slower! The conversion factor from the time dimension to space is ct. When stationary in space, we must move into the future at c to have our clock tick off t. When moving in space, the spacetime vector c tilts from pure motion in time. The faster we move in the space dimension, the slower we move in the time dimension. |
|
|
|
|
|
Enthusiast
|
|
|
|
#3 |
|
Enthusiast
![]() Posts: 35
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Worked like a charme, thank You!
|
|
|
|
![]() |
| Tags |
| recipes, replacewith, tables |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Replacing my Sony with K3? | cognym | Amazon Kindle | 61 | 02-02-2011 04:02 PM |
| Replacing my new Kobo - again! | objectman | Kobo Reader | 7 | 09-20-2010 08:00 PM |
| Replacing the battery | AprilHare | Sony Reader | 12 | 04-29-2009 01:08 PM |
| Replacing ¬ | PieOPah | Workshop | 5 | 12-17-2008 04:25 PM |
| iLiad Replacing the contentlister | tribble | iRex Developer's Corner | 21 | 06-22-2007 03:58 PM |