![]() |
#1 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
|
Trouble with internal links
Hey there! I'm having trouble with getting internal links working, with a BasicNewsRecipe. I'm creating a recipe for The Codeless Code. Almost everything works, except for internal links. Basically, it is grabbing every article that has a URL that resembles this:
http://thecodelesscode.com/case/171 And adding that to the feed. All the way from 1 up to 184, and putting them into a nice book. In preprocess_html, I am stripping out all of the links, except the ones that begin with /case/. However, no matter what I do, I can't seem to get those links within the book to work at all. If I leave them alone (ie, href="/case/152"), it generates this error: Referenced file u'/case/152' not found If I change it to the full URI (ie, http://thecodelesscode.com/case/152), then it works fine, but it leaves a hyperlink to the website, not to the chapter inside the ebook. If I change it to a relative URI (ie, href="152") it will just say that it can't find u'152'. Is there a trick to what I'm trying to do? Or is the BasicNewsRecipe just not intended for this sort of thing? Thanks! |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,192
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Downloaded articles are named according to a particular scheme ass feed_n/article_n/index.html
You need to convert your internal links to refer to those names. There is no easy way to do that, since the recipe download system is not designed for it. Essentially, you need to override create_opf() in your recipe class to store a mapping of article.orig_url -> filename Then implement postprocess_book() to use that mapping to replace the links in the downloaded articles using the previously stored mapping. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
|
Awesome, thank you.
![]() |
![]() |
![]() |
![]() |
#4 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
|
Okay, so I thought I had everything entirely figured out. I've generated the proper mappings in create_opf without any issue.
And in postprocess_book, I can even find every HTML file and fix the hrefs, for example: Code:
def postprocess_book(self, oeb, opts, log): output_files = [ self.path_remappings[key] for key in self.path_remappings.keys() ] for output in output_files: # Load the HTML file in f = open(self.output_dir + '/feed_0/' + output) soup = bs(f) f.close() # Replace all the anchors anchors = soup.findAll('a') for anchor in anchors: if '/case/' in anchor['href']: if anchor['href'] in self.path_remappings: anchor['href'] = '../' + self.path_remappings[ anchor['href'] ] # Write it back out with open(self.output_dir + '/feed_0/' + output, "wb") as f: html = unicode(soup) f.write(html.encode('utf-8')) f.close() their <a href="/case/174">newly appointed</a> master-in-training Zjing decided that they should work in separate shifts -- Landhwa by day, Wangohan by night.</p> To: their <a href="../article_5/index.html">newly appointed</a> master-in-training Zjing decided that they should work in separate shifts -- Landhwa by day, Wangohan by night.</p> However, in the very final file (article_5/index_u1.html), it ends up like this: their <a href="../..//case/174">newly appointed</a> master-in-training Zjing decided that they should work in separate shifts -- Landhwa by day, Wangohan by night.</p> Am I going about this the wrong way, by messing with the HTML files in the output directory? Should I instead be mucking around with some internal structure in oeb? Last edited by marumari; 04-09-2015 at 02:57 PM. |
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
|
Never mind, I think I figured out how it's internally represented in memory. I'll post the final recipe when I'm all done.
![]() |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,192
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You wan to work with the oeb object, like this:
Code:
for item in oeb.spine: for a in item.data.xpath('//*[local-name()="a" and @href]'): href = a.get('href') a.set('href', mapping[href]) |
![]() |
![]() |
![]() |
#7 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
|
Finished! It generates very nice epub files, and pretty darned nice mobi files:
https://github.com/marumari/codeless...esscode.recipe Thanks again for your help in pointing me in the right direction. ![]() |
![]() |
![]() |
![]() |
#8 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
|
I'm having a bit of trouble with having it automatically resize the images that it fetches. I've set:
scale_news_images = (600, 400) But when I run ebook-convert, and go into feed_0, I see: april@machine(feed_0)$ find . -name '*.jpg' -exec exiftool {} \; | grep Height Image Height : 402 Image Height : 389 Image Height : 446 Image Height : 557 Image Height : 196 Image Height : 400 Image Height : 424 And so it didn't resize them at all. Is there something I'm missing here? Thanks! |
![]() |
![]() |
![]() |
#9 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,306
Karma: 78876004
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
Are you setting compress_news_image to true?
From the documentation: Quote:
|
|
![]() |
![]() |
![]() |
#10 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
|
Derp! Thanks!
|
![]() |
![]() |
![]() |
#11 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
|
Okay, so I think I've gotten the recipe to pretty much a "final" state. Produces really nice EPUB and MOBI files now, without some of the superfluous stuff that comes with the BasicNewsRecipe. (ie, article listings, duplicate indexes, etc.)
How would I go about getting it included in the next version of Calibre? Thanks! |
![]() |
![]() |
![]() |
#12 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,192
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You can send a pull request (put your recipe in the recipes folder)
|
![]() |
![]() |
![]() |
#13 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,192
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You might want to update your recipe to take advantage of this
https://github.com/kovidgoyal/calibr...eb35357eefc698 |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert external links to internal links | sup | Recipes | 2 | 11-28-2013 09:39 AM |
Internal Links best Practices | Jamestoo | ePub | 2 | 02-26-2012 11:26 AM |
Links to URLs work, internal links don't? | NewDay | ePub | 36 | 10-27-2010 04:09 AM |
internal links and chapter division | .mau. | Sigil | 23 | 07-28-2010 04:01 PM |
Internal Links??? | Guns4Hire | PocketBook | 11 | 04-18-2010 02:25 AM |