05-18-2011, 10:03 PM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: May 2011
Device: none
|
Lists getting changed in recipe processing
Hi all,
I'm new here - I had a look around but could not find anything on this problem. I am working on recipes to scrape WordPress sites and I am running into problems with Calibre v0.8.1 changing the HTML format of pages. For example, using this recipe: https://bitbucket.org/wwmm/schtml/sr...tsefton.recipe With this command: ebook-convert ptsefton.recipe .epub --debug-pipeline d --test The recipe fetches the first page which has this code in it: <ul><li><a href="#id2">Immediate future</a></li><li><a href="#id3">The future</a></li></ul> I know that this code is still intact when postprocess_html returns the HTML, but in the debug output in the parsed directory it has changed to this: <ul/><li/><a href="#id2">Immediate future</a><li/><a href="#id3">The future</a> Does anyone have any idea why this would be happening? Thanks, Peter |
05-18-2011, 10:15 PM | #2 |
creator of calibre
Posts: 43,775
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
look for ascii control codes in the raw html, they usually cause this sort of thing.
|
Advert | |
|
05-18-2011, 10:40 PM | #3 |
Junior Member
Posts: 4
Karma: 10
Join Date: May 2011
Device: none
|
Thanks @kovidgoyal for the prompt reply.
Turned out not to be control characters - I was returning only div element instead of the whole page in the soup variable. Solved. |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Heuristic processing | saxondawg | Conversion | 6 | 01-21-2018 07:43 PM |
Word Processing on the Kindle 3 | cow_trix | Amazon Kindle | 41 | 05-17-2011 03:22 AM |
Trying to use Textile processing | getajob | Conversion | 18 | 03-09-2011 07:34 AM |
Comic File Processing | wonderboy | Other formats | 1 | 08-08-2009 04:17 AM |
Perl processing | alexxxm | Sony Reader | 3 | 11-26-2007 06:13 AM |