Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 04-09-2015, 11:20 AM   #1
marumari
Junior Member
marumari began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
Trouble with internal links

Hey there! I'm having trouble with getting internal links working, with a BasicNewsRecipe. I'm creating a recipe for The Codeless Code. Almost everything works, except for internal links. Basically, it is grabbing every article that has a URL that resembles this:

http://thecodelesscode.com/case/171

And adding that to the feed. All the way from 1 up to 184, and putting them into a nice book. In preprocess_html, I am stripping out all of the links, except the ones that begin with /case/. However, no matter what I do, I can't seem to get those links within the book to work at all.

If I leave them alone (ie, href="/case/152"), it generates this error:
Referenced file u'/case/152' not found

If I change it to the full URI (ie, http://thecodelesscode.com/case/152), then it works fine, but it leaves a hyperlink to the website, not to the chapter inside the ebook.

If I change it to a relative URI (ie, href="152") it will just say that it can't find u'152'.

Is there a trick to what I'm trying to do? Or is the BasicNewsRecipe just not intended for this sort of thing?

Thanks!
marumari is offline   Reply With Quote
Old 04-09-2015, 11:31 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,192
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Downloaded articles are named according to a particular scheme ass feed_n/article_n/index.html

You need to convert your internal links to refer to those names. There is no easy way to do that, since the recipe download system is not designed for it. Essentially, you need to override create_opf() in your recipe class to store a mapping of article.orig_url -> filename

Then implement postprocess_book() to use that mapping to replace the links in the downloaded articles using the previously stored mapping.
kovidgoyal is offline   Reply With Quote
Advert
Old 04-09-2015, 11:34 AM   #3
marumari
Junior Member
marumari began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
Awesome, thank you.
marumari is offline   Reply With Quote
Old 04-09-2015, 02:32 PM   #4
marumari
Junior Member
marumari began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
Okay, so I thought I had everything entirely figured out. I've generated the proper mappings in create_opf without any issue.

And in postprocess_book, I can even find every HTML file and fix the hrefs, for example:


Code:
    def postprocess_book(self, oeb, opts, log):
      output_files = [ self.path_remappings[key] for key in self.path_remappings.keys() ]
 
      for output in output_files:
        # Load the HTML file in
        f = open(self.output_dir + '/feed_0/' + output)
        soup = bs(f)
        f.close()

        # Replace all the anchors
        anchors = soup.findAll('a')
        for anchor in anchors:
          if '/case/' in anchor['href']:
            if anchor['href'] in self.path_remappings:
              anchor['href'] = '../' + self.path_remappings[ anchor['href'] ]

        # Write it back out
        with open(self.output_dir + '/feed_0/' + output, "wb") as f:
          html = unicode(soup)
          f.write(html.encode('utf-8'))
          f.close()
Looking at the Soup, I see that the href went from:
their <a href="/case/174">newly appointed</a> master-in-training Zjing decided that they should work in separate shifts -- Landhwa by day, Wangohan by night.</p>

To:
their <a href="../article_5/index.html">newly appointed</a> master-in-training Zjing decided that they should work in separate shifts -- Landhwa by day, Wangohan by night.</p>

However, in the very final file (article_5/index_u1.html), it ends up like this:
their <a href="../..//case/174">newly appointed</a> master-in-training Zjing decided that they should work in separate shifts -- Landhwa by day, Wangohan by night.</p>

Am I going about this the wrong way, by messing with the HTML files in the output directory? Should I instead be mucking around with some internal structure in oeb?

Last edited by marumari; 04-09-2015 at 02:57 PM.
marumari is offline   Reply With Quote
Old 04-09-2015, 05:30 PM   #5
marumari
Junior Member
marumari began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
Never mind, I think I figured out how it's internally represented in memory. I'll post the final recipe when I'm all done.
marumari is offline   Reply With Quote
Advert
Old 04-09-2015, 08:31 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,192
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You wan to work with the oeb object, like this:

Code:
for item in oeb.spine:
   for a in item.data.xpath('//*[local-name()="a" and @href]'):
       href = a.get('href')
       a.set('href', mapping[href])
kovidgoyal is offline   Reply With Quote
Old 04-10-2015, 12:52 AM   #7
marumari
Junior Member
marumari began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
Finished! It generates very nice epub files, and pretty darned nice mobi files:

https://github.com/marumari/codeless...esscode.recipe

Thanks again for your help in pointing me in the right direction.
marumari is offline   Reply With Quote
Old 04-10-2015, 01:49 PM   #8
marumari
Junior Member
marumari began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
I'm having a bit of trouble with having it automatically resize the images that it fetches. I've set:

scale_news_images = (600, 400)

But when I run ebook-convert, and go into feed_0, I see:

april@machine(feed_0)$ find . -name '*.jpg' -exec exiftool {} \; | grep Height
Image Height : 402
Image Height : 389
Image Height : 446
Image Height : 557
Image Height : 196
Image Height : 400
Image Height : 424

And so it didn't resize them at all. Is there something I'm missing here? Thanks!
marumari is offline   Reply With Quote
Old 04-10-2015, 02:21 PM   #9
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
Posts: 13,306
Karma: 78876004
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
Are you setting compress_news_image to true?

From the documentation:
Quote:
compress_news_images = False
Set this to False to ignore all scaling and compression parameters and pass images through unmodified. If True and the other compression parameters are left at their default values, jpeg images will be scaled to fit in the screen dimensions set by the output profile and compressed to size at most (w * h)/16 where w x h are the scaled image dimensions.
PeterT is offline   Reply With Quote
Old 04-10-2015, 02:35 PM   #10
marumari
Junior Member
marumari began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
Derp! Thanks!
marumari is offline   Reply With Quote
Old 04-10-2015, 06:23 PM   #11
marumari
Junior Member
marumari began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2015
Device: Kindle Voyage
Okay, so I think I've gotten the recipe to pretty much a "final" state. Produces really nice EPUB and MOBI files now, without some of the superfluous stuff that comes with the BasicNewsRecipe. (ie, article listings, duplicate indexes, etc.)

How would I go about getting it included in the next version of Calibre?

Thanks!
marumari is offline   Reply With Quote
Old 04-10-2015, 10:40 PM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,192
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You can send a pull request (put your recipe in the recipes folder)
kovidgoyal is offline   Reply With Quote
Old 04-11-2015, 03:40 AM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,192
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You might want to update your recipe to take advantage of this

https://github.com/kovidgoyal/calibr...eb35357eefc698
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert external links to internal links sup Recipes 2 11-28-2013 09:39 AM
Internal Links best Practices Jamestoo ePub 2 02-26-2012 11:26 AM
Links to URLs work, internal links don't? NewDay ePub 36 10-27-2010 04:09 AM
internal links and chapter division .mau. Sigil 23 07-28-2010 04:01 PM
Internal Links??? Guns4Hire PocketBook 11 04-18-2010 02:25 AM


All times are GMT -4. The time now is 10:18 PM.


MobileRead.com is a privately owned, operated and funded community.