Custom recipes (archive, read-only) - Page 147

bhandarisaurabh · 06-23-2010, 10:27 PM

can someone help me with the recipe of forbes india,the recipe has been already made but I want the recipe to be made without using feeds using
http://business.in.com/magazine/magazinearchive/1/1
the latest issue from this archive not the actual latest issue because most of the links get activated very late

nook.life · 06-24-2010, 08:19 PM

Ok so I have been trying to get the rotate images code into the code Starson17 created and have got it to work. The recipe would be perfect, except that some images are still getting clipped at the top. See image in the bottom to see what I am talking about. Is there any way to justify the image so that the extra space at the bottom (or i guess the left once it is rotated) is deleted? Any help would be greatly appreciated. Thanks once again!

Code:

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from calibre import strftime, __appname__, __version__
import calibre.utils.PythonMagickWand as pw
import calibre.utils.PythonMagickWand

class Explosm(BasicNewsRecipe):
    title               = 'Explosm 3'
    __author__          = 'Starson17'
    description         = 'Explosm'
    language            = 'en'
    use_embedded_content= False
    no_stylesheets      = True
    linearize_tables      = True
    oldest_article      = 24
    remove_javascript   = True
    remove_empty_feeds    = True
    max_articles_per_feed = 10

    feeds = [
             (u'Explosm Feed', u'http://feeds.feedburner.com/Explosm')
             ]

    def get_article_url(self, article):
        return article.get('link', None)

    keep_only_tags     = [dict(name='div', attrs={'id':'maincontent'})]

    def preprocess_html(self, soup):
        table_tags = soup.findAll('table')
        table_tags[1].extract() 
        NavTag = soup.find(text='&laquo; First') 
        NavTag.parent.parent.extract()
        return soup

    def postprocess_html(self, soup, first):
        #process all the images. assumes that the new html has the correct path
        for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
            iurl = tag['src']
            print 'resizing image' + iurl
            with pw.ImageMagick():
                img = pw.NewMagickWand()
                p = pw.NewPixelWand()
                if img < 0:
                    raise RuntimeError('Out of memory')
                if not pw.MagickReadImage(img, iurl):
                    severity = pw.ExceptionType(0)
                    msg = pw.MagickGetException(img, byref(severity))
                    raise IOError('Failed to read image from: %s: %s'
                        %(iurl, msg))
                
                width = pw.MagickGetImageWidth(img)
                height = pw.MagickGetImageHeight(img)

                if( width > height ) :
                    print 'Rotate image'
                    pw.MagickRotateImage(img, p, 90)

                if not pw.MagickWriteImage(img, iurl):
                    raise RuntimeError('Failed to save image to %s'%iurl)
                pw.DestroyMagickWand(img)


        return soup

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''

Here is the image of what I am talking about with the clipping:
http://picturepush.com/public/3684920

2see.eye3 · 06-24-2010, 08:30 PM

Hello,

I have a Sony PRS-505 which I would like to use to read RFC documents from IETF (http://www.ietf.org/). The challenge I find is that the RFCs are already formatted with a certain line width. When I try to create an e-book file using calibre, the result is either that all the formatting is lost, or the line width exceeds the width of the PRS screen.

I've tried a few workarounds with various degrees of success, for example: changing the base font, editing the HTML before importing. I've tried to use Sigil before importing the files, and edit the CSS. I've also tried to print the RFCs to a PDF file.

All those workarounds have some problems, and it looks like calibre already has some of the tools necessary to get the file I would like. I am not sure though how to use those tools. If someone more experienced has any ideas, or has already done this, I would appreciate the help.

Basically, with the RFCs, the challenge becomes when for example there are ASCII diagrams in the document. Also, they already have page numbers that of course don't match up with the pages for the Sony Reader.

I have a crude process to get the RFCs in a nicer format on the Reader, but it is time consuming and not very efficient. I am sure that someone with programming experience could get this done in a much more efficient way.

My process is like this:

Step 1. Download the txt version of the RFC. For example: http://www.rfc-editor.org/rfc/rfc4271.txt

Step 2. Clean up the document:
In notepad++, use "Find and replace" and search for the following regex and replace with blank.

.*\[Page .+\] - this will remove the existing page numbers
^RFC \d+.* - this will remove the recurring page title at each page break
\f - this is to remove the FFLF characters

TextFX Edit: Delete surplus blank lines

Step 4 - To preserve the formatting, I turn the file into HTML:

Add <body> <pre> at the beginning and </pre> </body> at the end of the file

Step 5 - Open the new HTML file in Firefox, and use the Developer toolbar to add CSS:

body { margin: 0; font-weight: 900; font-size: 13;}

This is because the fixed-width fonts appear too light on the Reader.

Step 6 - File > print preview, shrink to fit, portrait, and print to a PDF file with a customized page size.

As you can see, the process is very clumsy. The end result is not bad, it does allow me to read the RFCs and it only takes about 10 minutes to go through all the steps. However, I feel that someone with more experience might be able to improve it much quicker and automate a lot of the steps, for example the Regex search could be done in calibre.

Also, by using this method, there is no TOC generated. The RFC document already has it's own TOC, but that would have to be deleted and re-generated to reflect the new page numbers. If this were possible, with automated links to the appropriate section in the document, it would be fantastic.

On IETFs website, the RFCs can be downloaded also as HTML. Maybe that is a better option that the text file. In fact, the idea to customize the font weight to make it more legible on the Reader came from looking at the source code of their HTML.

Anyway, if someone has any ideas on how to bring RFCs to life on the Sony PRS-505, without the painful process I have described, it would be great. The news fetching feature of calibre caught my eye as a good way to download the RFCs and have them automatically converted, but I don't know how to take this further. Any assistance would be greatly appreciated.

I've also attached some sample files, please have a look.

Thanks,

Vladimir

kovidgoyal · 06-24-2010, 09:26 PM

@nook.life: What output profile are you using?

nook.life · 06-25-2010, 02:35 AM

Quote:

Originally Posted by kovidgoyal

@nook.life: What output profile are you using?

I am using the normal output profile for the nook using epub... didnt change anything (that I'm aware of...). do you think that is what is causing the problem? Is there a way to revert back to the standard output profile just to make sure? Thanks for looking into this.

kovidgoyal · 06-25-2010, 03:04 AM

No the nook output profile should be fine. Looking at your screenshot, the image has some margin on the left. You probably just need to process the html to remove that margin, so that the image fits on the screen (withthe nook output profile calibre will automatically resize the image to be no larger than the nook screen)

pubolab · 06-25-2010, 04:54 AM

zaobao.com updated their html layout.

Attached is the updated recipe

rty · 06-25-2010, 08:01 AM

Quote:

Originally Posted by pubolab

zaobao.com updated their html layout.

Attached is the updated recipe

Hi pubo, thanks for fixing the zaobao recipe. Would you like to take a look at this discussion https://www.mobileread.com/forums/sho...d.php?p=978239 on using the Droidfont? It might be a better solution.

Starson17 · 06-25-2010, 09:54 AM

Quote:

Originally Posted by nook.life

Ok so I have been trying to get the rotate images code into the code Starson17 created and have got it to work. The recipe would be perfect, except that some images are still getting clipped at the top.

Run your recipe with this:

Code:

ebook-convert explosm.recipe explosm --test -vv > explosm.txt

Then use FireFox to look at the results. You'll see why you have margin problems. You've got your main image in a table, and you're putting other images above it (to the left once you've rotated your device). You don't want that iphone app image or the two black/white images. Look at the images directory in the explosm folder created with the above and look at the page source. I'd strip the extra images and remove the table.

nook.life · 06-25-2010, 03:34 PM

Quote:

Originally Posted by Starson17

Run your recipe with this:

Code:

ebook-convert explosm.recipe explosm --test -vv > explosm.txt

Then use FireFox to look at the results. You'll see why you have margin problems. You've got your main image in a table, and you're putting other images above it (to the left once you've rotated your device). You don't want that iphone app image or the two black/white images. Look at the images directory in the explosm folder created with the above and look at the page source. I'd strip the extra images and remove the table.

Ok, so I am pretty new at this and really do not know much about coding, so it took me about an hour just to figure out where on earth to enter the ebook-convert command (first i placed it in the code, then tried to place it somewhere in the settings, then spent forever trying to figure out how to locate the calibre command line until I realized you simply just had to start cmd). However, I still cant get it to work as I keep getting a "ValueError: Failed to find builtin recipe:explosm". I then realized I probably had to be in the right folder, but neither the calibre library nor the calibre2\resources program folder worked.

Sorry for asking, i know it's a stupid question, and I feel really lame I'm just having a hard time grasping how all this works, I've never dealt with code or even the cmd in my life (figuring out that I needed to use cd command to go into folders also took a while...). Thanks for all the help.

Starson17 · 06-25-2010, 03:50 PM

Quote:

Originally Posted by nook.life

Ok, so I am pretty new at this

Don't feel lame - you're doing great. Your initial post was sophisticated enough, I didn't add a lot of details, figuring you'd ask - and you did.

OK, the details are here. Start by saving your recipe as a text file named explosm.recipe in any folder - say c:\recipes\explosm

Then make a batch file and save it there called run_recipe.bat and put this in it:

Code:

c:
cd \recipes\explosm
ebook-convert explosm.recipe explosm_1 --test -vv > explosm.txt

save and run the batch file, and it will download the recipe as html files into the folder explosm_1 in that directory.

Then you can look at the error messages (if any) in the explosm.txt file and the images in the html files. Fundamentally, you want to just clean up your recipe.

All the stuff above is just to make writing recipes easier. Once you have it debugged, you just switch it into Calibre.

BTW, I wrote an explosm recipe, but IIRC, I never posted it. Someone wanted it, but never responded to a couple questions I posted (hope it wasn't you). It didn't do any rotation, but it did clean up most of what you need cleaned. If you want it, I'll dig it up later tonight or tomorrow and post it.

nook.life · 06-25-2010, 07:26 PM

Quote:

Originally Posted by Starson17

BTW, I wrote an explosm recipe, but IIRC, I never posted it. Someone wanted it, but never responded to a couple questions I posted (hope it wasn't you). It didn't do any rotation, but it did clean up most of what you need cleaned. If you want it, I'll dig it up later tonight or tomorrow and post it.

Yeah unfortunately that was me (although google searching the forum shows that there were three other people who requested the explosm feed as well, but i dont think you are referring to them). I apologized for overlooking your post in a previous post, dont know how I missed it. Thanks for the support, I cant claim any credit for anything, the recipe code is yours Starson (you posted it two posts after asking the questions), all I did was insert the rotate code from another poster and it happened to work. Like I said, I really have never dealt with code before so everything I do seems to take hours and gets me nowhere.

So I did what you said and I was able to get the file and look at the images and look at the code in notepad++. Problem is that it sequentially labels its images on the site (ie sometimes the actual cartoon is img3, other times img8, etc). So I dont really know how I would go about cleaning it up. Also, how do you remove the table, I tried simply taking it out, but it broke the recipe. I tried using something like

Code:

	 remove_tags 	   =[
							dict(name='div', attrs={'id':['logobottom']})
							# dict(name='span', attrs={'id':'logobottom'})
							# dict(name='table')
						 ]

right after the keep_only_tags command. I've been trying to figure out how the command works, but it has been rough. If you could look for the recipe you already made that would be great. It would sure be nice to see how the code looks when it works so I try to figure out how you did it. Thanks so much.

Starson17 · 06-25-2010, 09:12 PM

Quote:

Originally Posted by nook.life

Yeah unfortunately that was me

I was probably in a grumpy mood that day

Whatever I posted, it wasn't the final recipe, as what you were working with still had lots of junk in it. This is closer to the final I came up with, but my earlier version had some text that identified the comic. You want it rotated, so I removed the text above to give more room for the comic.

Try this:

Code:

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
import re
import calibre.utils.PythonMagickWand as pw
import calibre.utils.PythonMagickWand

class Explosm(BasicNewsRecipe):
    title               = 'Explosm'
    __author__          = 'Starson17'
    description         = 'Explosm'
    language            = 'en'
    use_embedded_content= False
    no_stylesheets      = True
    oldest_article      = 24
    remove_javascript   = True
    remove_empty_feeds    = True
    max_articles_per_feed = 10

    feeds = [
             (u'Explosm Feed', u'http://feeds.feedburner.com/Explosm')
             ]

    keep_only_tags     = [dict(name='div', attrs={'align':'center'})]
    remove_tags = [dict(name='span'),
                   dict(name='table')]

    def postprocess_html(self, soup, first):
        #process all the images. assumes that the new html has the correct path
        for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
            iurl = tag['src']
            print 'resizing image' + iurl
            with pw.ImageMagick():
                img = pw.NewMagickWand()
                p = pw.NewPixelWand()
                if img < 0:
                    raise RuntimeError('Out of memory')
                if not pw.MagickReadImage(img, iurl):
                    severity = pw.ExceptionType(0)
                    msg = pw.MagickGetException(img, byref(severity))
                    raise IOError('Failed to read image from: %s: %s'
                        %(iurl, msg))
                width = pw.MagickGetImageWidth(img)
                height = pw.MagickGetImageHeight(img)
                if( width > height ) :
                    print 'Rotate image'
                    pw.MagickRotateImage(img, p, 90)
                if not pw.MagickWriteImage(img, iurl):
                    raise RuntimeError('Failed to save image to %s'%iurl)
                pw.DestroyMagickWand(img)
        return soup

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''

rty · 06-26-2010, 03:38 AM

@Kovid,

If you open the STYLESHEET.CSS of the news EPUB generated by Calibre, you will see that the section called ".articledescription", always has a fixed "Font-family: sans" and for some reason, I cannot replace it through the Extra_CSS in my recipe.

Code:

.articledescription {
    display: block;
    font-family: sans;
    font-size: 0.7em;
    text-indent: 0
    }

I need to get the FONT-FAMILY replaced with the line below

Code:

    font-family: "DroidFont", serif;

Calibre seems to ignore the Extra_CSS in my recipe to replace the FONT-FAMILY from the default "SANS" to "DROIDFONT" in this ".articledescription" section. Is this a bug?

Here is my recipe written for BBC Chinese. Please look at the Extra_css with which I try to overwrite the font-family of ".articledescription".

Spoiler:

If only this works, all the XML feeds of websites encoded in UTF-8 Chinese will display well on Nook (and most probably Sony etc)

Thanks.

ps. I have already opened a bug ticket for this (Ticket #5982). Is there really such a font name called "SANS"?

Related discussion: https://www.mobileread.com/forums/sho...925#post979925

Update: @Kovid, if it is too hard to fix, can you please simply remove the line "Font-Family: Sans" from ".articledescription" of the stylesheet.css?

nook12 · 06-26-2010, 12:00 PM

Can I request a recipe done for pharmacistletter.com and medscape.com

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 03:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 01:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 06:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 05:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 03:37 PM

06-23-2010, 10:27 PM	#2191
bhandarisaurabh Enthusiast Posts: 49 Karma: 10 Join Date: Aug 2009 Device: none	can someone help me with the recipe of forbes india,the recipe has been already made but I want the recipe to be made without using feeds using http://business.in.com/magazine/magazinearchive/1/1 the latest issue from this archive not the actual latest issue because most of the links get activated very late

06-24-2010, 09:26 PM	#2194
kovidgoyal creator of calibre Posts: 45,626 Karma: 28549046 Join Date: Oct 2006 Location: Mumbai, India Device: Various	@nook.life: What output profile are you using?

06-25-2010, 03:04 AM	#2196
kovidgoyal creator of calibre Posts: 45,626 Karma: 28549046 Join Date: Oct 2006 Location: Mumbai, India Device: Various	No the nook output profile should be fine. Looking at your screenshot, the image has some margin on the left. You probably just need to process the html to remove that margin, so that the image fits on the screen (withthe nook output profile calibre will automatically resize the image to be no larger than the nook screen)

06-26-2010, 12:00 PM	#2205
nook12 Junior Member Posts: 5 Karma: 10 Join Date: Jun 2010 Device: nook	Can I request a recipe done for pharmacistletter.com and medscape.com

Advert

Advert