|03-28-2011, 08:34 AM||#1|
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Replacing tags after using them
I'm trying to clean up some really messy HTML newspaper site's page. They are heavily using tables.
In my recipe I was able to find the needed content, and extract it via keeponly_tags, and remove_tags.
But the article(s) are in an inner table/(thead|tr/td). Which - if I convert the recipe to mobi for my Kindle - doesn't look good. Actually Only the first screen is filled with the text, and the second page is empty.
So I tried to get rid of the unnecessary tags, but without luck.
I tried postprocess_html:
But it gave me a TypeError:
Then I had tried preprocess_regexps, but it gave me empty article pages
The recipe in its actual state (which works fine if you are creating e.g. PDF output) can be reached here: https://github.com/zsoltika/.hu-reci...0_1_nap.recipe
So my question is: after cleaning up the articles html via keeponly_tags, and remove_tags, how does one replace some tags - in my case: table, thead, tfoot, tr, td; BUT only the tag names, not their contents! - with another tag name (e.g. </?span>)?
And one more thing popped into my mind: wouldn't it be nicer, if the various api callables/overrides etc. at http://calibre-ebook.com/user_manual/news_recipe.html will be numbered? I mean I don't get which applies earlier in the process from ['preprocess_html', 'preprocess_regexps', 'keeponly_tags', 'remove_tags'].
Thanks for any help!
|03-28-2011, 09:04 AM||#2|
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
'linearize_tables' : True
def postprocess_html(self, soup, first_fetch): for t in soup.findAll(['table', 'tr', 'td']): t.name = 'div'
The speed limit on our spacetime highway of life is c - lightspeed. Surprisingly, the posted minimum is also c. We can go no faster, nor any slower! The conversion factor from the time dimension to space is ct. When stationary in space, we must move into the future at c to have our clock tick off t. When moving in space, the spacetime vector c tilts from pure motion in time. The faster we move in the space dimension, the slower we move in the time dimension.
|recipes, replacewith, tables|
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Replacing my Sony with K3?||cognym||Amazon Kindle||61||02-02-2011 04:02 PM|
|Replacing my new Kobo - again!||objectman||Kobo Reader||7||09-20-2010 08:00 PM|
|Replacing the battery||AprilHare||Sony Reader||12||04-29-2009 01:08 PM|
|Replacing ¬||PieOPah||Workshop||5||12-17-2008 04:25 PM|
|iLiad Replacing the contentlister||tribble||iRex Developer's Corner||21||06-22-2007 03:58 PM|