Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-18-2011, 10:03 PM   #1
ptsefton
Junior Member
ptsefton began at the beginning.
 
Posts: 4
Karma: 10
Join Date: May 2011
Device: none
Lists getting changed in recipe processing

Hi all,

I'm new here - I had a look around but could not find anything on this problem.

I am working on recipes to scrape WordPress sites and I am running into problems with Calibre v0.8.1 changing the HTML format of pages.

For example, using this recipe: https://bitbucket.org/wwmm/schtml/sr...tsefton.recipe

With this command:
ebook-convert ptsefton.recipe .epub --debug-pipeline d --test
The recipe fetches the first page which has this code in it:
<ul><li><a href="#id2">Immediate future</a></li><li><a href="#id3">The future</a></li></ul>

I know that this code is still intact when postprocess_html returns the HTML, but in the debug output in the parsed directory it has changed to this:
<ul/><li/><a href="#id2">Immediate future</a><li/><a href="#id3">The future</a>

Does anyone have any idea why this would be happening?

Thanks,
Peter
ptsefton is offline   Reply With Quote
Old 05-18-2011, 10:15 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,775
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
look for ascii control codes in the raw html, they usually cause this sort of thing.
kovidgoyal is offline   Reply With Quote
Advert
Old 05-18-2011, 10:40 PM   #3
ptsefton
Junior Member
ptsefton began at the beginning.
 
Posts: 4
Karma: 10
Join Date: May 2011
Device: none
Thanks @kovidgoyal for the prompt reply.

Turned out not to be control characters - I was returning only div element instead of the whole page in the soup variable.

Solved.
ptsefton is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Heuristic processing saxondawg Conversion 6 01-21-2018 07:43 PM
Word Processing on the Kindle 3 cow_trix Amazon Kindle 41 05-17-2011 03:22 AM
Trying to use Textile processing getajob Conversion 18 03-09-2011 07:34 AM
Comic File Processing wonderboy Other formats 1 08-08-2009 04:17 AM
Perl processing alexxxm Sony Reader 3 11-26-2007 06:13 AM


All times are GMT -4. The time now is 12:11 PM.


MobileRead.com is a privately owned, operated and funded community.