![]() |
#2191 |
Enthusiast
![]() Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
|
can someone help me with the recipe of forbes india,the recipe has been already made but I want the recipe to be made without using feeds using
http://business.in.com/magazine/magazinearchive/1/1 the latest issue from this archive not the actual latest issue because most of the links get activated very late |
![]() |
![]() |
#2192 |
Member
![]() Posts: 12
Karma: 10
Join Date: May 2010
Device: Nook
|
Ok so I have been trying to get the rotate images code into the code Starson17 created and have got it to work. The recipe would be perfect, except that some images are still getting clipped at the top. See image in the bottom to see what I am talking about. Is there any way to justify the image so that the extra space at the bottom (or i guess the left once it is rotated) is deleted? Any help would be greatly appreciated. Thanks once again!
Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup from calibre import strftime, __appname__, __version__ import calibre.utils.PythonMagickWand as pw import calibre.utils.PythonMagickWand class Explosm(BasicNewsRecipe): title = 'Explosm 3' __author__ = 'Starson17' description = 'Explosm' language = 'en' use_embedded_content= False no_stylesheets = True linearize_tables = True oldest_article = 24 remove_javascript = True remove_empty_feeds = True max_articles_per_feed = 10 feeds = [ (u'Explosm Feed', u'http://feeds.feedburner.com/Explosm') ] def get_article_url(self, article): return article.get('link', None) keep_only_tags = [dict(name='div', attrs={'id':'maincontent'})] def preprocess_html(self, soup): table_tags = soup.findAll('table') table_tags[1].extract() NavTag = soup.find(text='« First') NavTag.parent.parent.extract() return soup def postprocess_html(self, soup, first): #process all the images. assumes that the new html has the correct path for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')): iurl = tag['src'] print 'resizing image' + iurl with pw.ImageMagick(): img = pw.NewMagickWand() p = pw.NewPixelWand() if img < 0: raise RuntimeError('Out of memory') if not pw.MagickReadImage(img, iurl): severity = pw.ExceptionType(0) msg = pw.MagickGetException(img, byref(severity)) raise IOError('Failed to read image from: %s: %s' %(iurl, msg)) width = pw.MagickGetImageWidth(img) height = pw.MagickGetImageHeight(img) if( width > height ) : print 'Rotate image' pw.MagickRotateImage(img, p, 90) if not pw.MagickWriteImage(img, iurl): raise RuntimeError('Failed to save image to %s'%iurl) pw.DestroyMagickWand(img) return soup extra_css = ''' h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;} h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;} p{font-family:Arial,Helvetica,sans-serif;font-size:small;} body{font-family:Helvetica,Arial,sans-serif;font-size:small;} ''' Here is the image of what I am talking about with the clipping: http://picturepush.com/public/3684920 ![]() |
![]() |
Advert | |
|
![]() |
#2193 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Jun 2010
Device: Sony PRS-505
|
Recipe for IETF RFCs
Hello,
I have a Sony PRS-505 which I would like to use to read RFC documents from IETF (http://www.ietf.org/). The challenge I find is that the RFCs are already formatted with a certain line width. When I try to create an e-book file using calibre, the result is either that all the formatting is lost, or the line width exceeds the width of the PRS screen. I've tried a few workarounds with various degrees of success, for example: changing the base font, editing the HTML before importing. I've tried to use Sigil before importing the files, and edit the CSS. I've also tried to print the RFCs to a PDF file. All those workarounds have some problems, and it looks like calibre already has some of the tools necessary to get the file I would like. I am not sure though how to use those tools. If someone more experienced has any ideas, or has already done this, I would appreciate the help. Basically, with the RFCs, the challenge becomes when for example there are ASCII diagrams in the document. Also, they already have page numbers that of course don't match up with the pages for the Sony Reader. I have a crude process to get the RFCs in a nicer format on the Reader, but it is time consuming and not very efficient. I am sure that someone with programming experience could get this done in a much more efficient way. My process is like this: Step 1. Download the txt version of the RFC. For example: http://www.rfc-editor.org/rfc/rfc4271.txt Step 2. Clean up the document: In notepad++, use "Find and replace" and search for the following regex and replace with blank. .*\[Page .+\] - this will remove the existing page numbers ^RFC \d+.* - this will remove the recurring page title at each page break \f - this is to remove the FFLF characters TextFX Edit: Delete surplus blank lines Step 4 - To preserve the formatting, I turn the file into HTML: Add <body> <pre> at the beginning and </pre> </body> at the end of the file Step 5 - Open the new HTML file in Firefox, and use the Developer toolbar to add CSS: body { margin: 0; font-weight: 900; font-size: 13;} This is because the fixed-width fonts appear too light on the Reader. Step 6 - File > print preview, shrink to fit, portrait, and print to a PDF file with a customized page size. As you can see, the process is very clumsy. The end result is not bad, it does allow me to read the RFCs and it only takes about 10 minutes to go through all the steps. However, I feel that someone with more experience might be able to improve it much quicker and automate a lot of the steps, for example the Regex search could be done in calibre. Also, by using this method, there is no TOC generated. The RFC document already has it's own TOC, but that would have to be deleted and re-generated to reflect the new page numbers. If this were possible, with automated links to the appropriate section in the document, it would be fantastic. On IETFs website, the RFCs can be downloaded also as HTML. Maybe that is a better option that the text file. In fact, the idea to customize the font weight to make it more legible on the Reader came from looking at the source code of their HTML. Anyway, if someone has any ideas on how to bring RFCs to life on the Sony PRS-505, without the painful process I have described, it would be great. The news fetching feature of calibre caught my eye as a good way to download the RFCs and have them automatically converted, but I don't know how to take this further. Any assistance would be greatly appreciated. I've also attached some sample files, please have a look. Thanks, Vladimir |
![]() |
![]() |
#2194 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,395
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@nook.life: What output profile are you using?
|
![]() |
![]() |
#2195 |
Member
![]() Posts: 12
Karma: 10
Join Date: May 2010
Device: Nook
|
I am using the normal output profile for the nook using epub... didnt change anything (that I'm aware of...). do you think that is what is causing the problem? Is there a way to revert back to the standard output profile just to make sure? Thanks for looking into this. |
![]() |
Advert | |
|
![]() |
#2196 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,395
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No the nook output profile should be fine. Looking at your screenshot, the image has some margin on the left. You probably just need to process the html to remove that margin, so that the image fits on the screen (withthe nook output profile calibre will automatically resize the image to be no larger than the nook screen)
|
![]() |
![]() |
#2197 |
Member
![]() Posts: 17
Karma: 10
Join Date: May 2008
Device: CASIO pocket viewer S1600, Sony PRS-505 and Cybook Gen 3
|
zaobao.com receipe update
zaobao.com updated their html layout.
Attached is the updated recipe |
![]() |
![]() |
#2198 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
|
Quote:
|
|
![]() |
![]() |
#2199 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
ebook-convert explosm.recipe explosm --test -vv > explosm.txt |
|
![]() |
![]() |
#2200 | |
Member
![]() Posts: 12
Karma: 10
Join Date: May 2010
Device: Nook
|
Quote:
Sorry for asking, i know it's a stupid question, and I feel really lame I'm just having a hard time grasping how all this works, I've never dealt with code or even the cmd in my life (figuring out that I needed to use cd command to go into folders also took a while...). Thanks for all the help. |
|
![]() |
![]() |
#2201 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Don't feel lame - you're doing great. Your initial post was sophisticated enough, I didn't add a lot of details, figuring you'd ask - and you did.
OK, the details are here. Start by saving your recipe as a text file named explosm.recipe in any folder - say c:\recipes\explosm Then make a batch file and save it there called run_recipe.bat and put this in it: Code:
c: cd \recipes\explosm ebook-convert explosm.recipe explosm_1 --test -vv > explosm.txt Then you can look at the error messages (if any) in the explosm.txt file and the images in the html files. Fundamentally, you want to just clean up your recipe. All the stuff above is just to make writing recipes easier. Once you have it debugged, you just switch it into Calibre. BTW, I wrote an explosm recipe, but IIRC, I never posted it. Someone wanted it, but never responded to a couple questions I posted (hope it wasn't you). It didn't do any rotation, but it did clean up most of what you need cleaned. If you want it, I'll dig it up later tonight or tomorrow and post it. |
![]() |
![]() |
#2202 | |
Member
![]() Posts: 12
Karma: 10
Join Date: May 2010
Device: Nook
|
Quote:
So I did what you said and I was able to get the file and look at the images and look at the code in notepad++. Problem is that it sequentially labels its images on the site (ie sometimes the actual cartoon is img3, other times img8, etc). So I dont really know how I would go about cleaning it up. Also, how do you remove the table, I tried simply taking it out, but it broke the recipe. I tried using something like Code:
remove_tags =[ dict(name='div', attrs={'id':['logobottom']}) # dict(name='span', attrs={'id':'logobottom'}) # dict(name='table') ] Last edited by nook.life; 06-25-2010 at 06:30 PM. |
|
![]() |
![]() |
#2203 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I was probably in a grumpy mood that day
![]() Whatever I posted, it wasn't the final recipe, as what you were working with still had lots of junk in it. This is closer to the final I came up with, but my earlier version had some text that identified the comic. You want it rotated, so I removed the text above to give more room for the comic. Try this: Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup import re import calibre.utils.PythonMagickWand as pw import calibre.utils.PythonMagickWand class Explosm(BasicNewsRecipe): title = 'Explosm' __author__ = 'Starson17' description = 'Explosm' language = 'en' use_embedded_content= False no_stylesheets = True oldest_article = 24 remove_javascript = True remove_empty_feeds = True max_articles_per_feed = 10 feeds = [ (u'Explosm Feed', u'http://feeds.feedburner.com/Explosm') ] keep_only_tags = [dict(name='div', attrs={'align':'center'})] remove_tags = [dict(name='span'), dict(name='table')] def postprocess_html(self, soup, first): #process all the images. assumes that the new html has the correct path for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')): iurl = tag['src'] print 'resizing image' + iurl with pw.ImageMagick(): img = pw.NewMagickWand() p = pw.NewPixelWand() if img < 0: raise RuntimeError('Out of memory') if not pw.MagickReadImage(img, iurl): severity = pw.ExceptionType(0) msg = pw.MagickGetException(img, byref(severity)) raise IOError('Failed to read image from: %s: %s' %(iurl, msg)) width = pw.MagickGetImageWidth(img) height = pw.MagickGetImageHeight(img) if( width > height ) : print 'Rotate image' pw.MagickRotateImage(img, p, 90) if not pw.MagickWriteImage(img, iurl): raise RuntimeError('Failed to save image to %s'%iurl) pw.DestroyMagickWand(img) return soup extra_css = ''' h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;} h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;} p{font-family:Arial,Helvetica,sans-serif;font-size:small;} body{font-family:Helvetica,Arial,sans-serif;font-size:small;} ''' |
![]() |
![]() |
#2204 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
|
@Kovid,
If you open the STYLESHEET.CSS of the news EPUB generated by Calibre, you will see that the section called ".articledescription", always has a fixed "Font-family: sans" and for some reason, I cannot replace it through the Extra_CSS in my recipe. Code:
.articledescription { display: block; font-family: sans; font-size: 0.7em; text-indent: 0 } Code:
font-family: "DroidFont", serif; Here is my recipe written for BBC Chinese. Please look at the Extra_css with which I try to overwrite the font-family of ".articledescription". Spoiler:
If only this works, all the XML feeds of websites encoded in UTF-8 Chinese will display well on Nook (and most probably Sony etc) Thanks. ps. I have already opened a bug ticket for this (Ticket #5982). Is there really such a font name called "SANS"? Related discussion: https://www.mobileread.com/forums/sho...925#post979925 Update: @Kovid, if it is too hard to fix, can you please simply remove the line "Font-Family: Sans" from ".articledescription" of the stylesheet.css? ![]() Last edited by rty; 07-03-2010 at 03:00 AM. |
![]() |
![]() |
#2205 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2010
Device: nook
|
Can I request a recipe done for pharmacistletter.com and medscape.com
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |