![]() |
#1 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Nov 2009
Device: kindle2
|
![]()
Renewed interest in a recipe for Mexican newspaper La Jornada. My recipe below has two problems:
1. The Photos go to the bottom of each article, why? 2. I would like to include the front cover but I don't know how to. The source provides a front cover PDF, here's my bash script to download it: Code:
Y=`date +%Y` # Year 4 digit m=`date +%m` # Month 2 digit d=`date +%d` # day 2 digit wget -q "http://www.jornada.unam.mx/$Y/$m/$d/portda.pdf" -O - | convert pdf:- png:- Code:
__license__ = 'GPL v3' __copyright__ = '2009, Pablo Funes <pablo at imprentaluz.com>' ''' La Jornada ''' # TODO: Pictures should go to the top, not the bottom of each article. # TODO: Front cover? class AdvancedUserRecipe1262065387(BasicNewsRecipe): title = u'La Jornada' oldest_article = 7 max_articles_per_feed = 100 feeds = [ ('opinion','http://www.jornada.unam.mx/rss/opinion.xml'), ('politica','http://www.jornada.unam.mx/rss/politica.xml'), ('economia','http://www.jornada.unam.mx/rss/economia.xml'), ('mundo','http://www.jornada.unam.mx/rss/mundo.xml'), ('estados','http://www.jornada.unam.mx/rss/estados.xml'), ('capital','http://www.jornada.unam.mx/rss/capital.xml'), ('sociedad','http://www.jornada.unam.mx/rss/sociedad.xml'), ('ciencias','http://www.jornada.unam.mx/rss/ciencias.xml'), ('cultura','http://www.jornada.unam.mx/rss/cultura.xml'), ('gastronomia','http://www.jornada.unam.mx/rss/gastronomia.xml') , ('espectaculos','http://www.jornada.unam.mx/rss/espectaculos.xml '), ('deportes','http://www.jornada.unam.mx/rss/deportes.xml'), ('cartones','http://www.jornada.unam.mx/rss/cartones.xml'), ] keep_only_tags = [ dict(name='div', attrs={'class':["sumarios","cabeza","te xt","foto"]}), ] |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,305
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You can use the pdf metadata reader in calibre to convert the PDF to a jpeg for the cover.
Not sure about pictures going to bottom, probably they use javascript in the website to move them to the top. You can do something similar by overriding the postprocess_html method in the recipe |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14
Karma: 87858
Join Date: Dec 2009
Device: Kindle 3
|
PDF metadata reader
|
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,305
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
look at the file calibre/ebooks/metadata/pdf.py in the calibre source code
|
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Nov 2009
Device: kindle2
|
![]()
Hi Rogelio, I saw your La Jornada script as well... It doesn't have the pictures-at-the bottom problem but it loses all formatting, perhaps that's too radical. Perhaps not. I'd sent you an email as well. Yo do seem to know more Python than me ;-)
We need to figure out several steps: how to generate the YYYY/MM/DD field, how to download the PDF and how to inject it as the Calibre doc cover. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14
Karma: 87858
Join Date: Dec 2009
Device: Kindle 3
|
New version
Hi,
This is the new version (rename it to .recipe), it uses the per section feeds and a cover.jpg available in the site. Next version: * Use portada.pdf to get a better resolution cover (using kovidgoyal recommendation) * Get URL for portada.* in a more reliable way (I'm using ugly code right now) * Do not lose all the article formatting * 'Cartones' section! Last edited by rogeliodh; 12-30-2009 at 05:04 PM. |
![]() |
![]() |
![]() |
#7 | |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14
Karma: 87858
Join Date: Dec 2009
Device: Kindle 3
|
Add conversion to jpg to covers
Quote:
But I think that it would be better if calibre itself autoconvert cover files to a reliable format (as jpg). So I implemented a change in BasicNewsRecipe so that recipes could tell calibre that the cover needs conversion to jpg Attached is the change and the recipe using it. It is my first digging into calibre's source code and I'm a python beginner so any comments are welcome, hoping this change to be included in a future calibre release. Best regards. Rogelio |
|
![]() |
![]() |
![]() |
#8 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,305
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Using imagemagick to render PDf is not very portable as IIRC it requires ghostscript. Here's a patch to feeds/news.py taht will automatically handle a pdf cover, test it and let me know
Code:
=== modified file 'src/calibre/web/feeds/news.py' --- src/calibre/web/feeds/news.py 2009-12-29 16:04:28 +0000 +++ src/calibre/web/feeds/news.py 2009-12-30 22:47:18 +0000 @@ -823,6 +823,14 @@ cpath = os.path.join(self.output_dir, 'cover.'+ext) with nested(open(cpath, 'wb'), closing(self.browser.open(cu))) as (cfile, r): cfile.write(r.read()) + if ext.lower() == 'pdf': + from calibre.ebook.metadata.pdf import get_metadata + stream = open(cpath, 'rb') + mi = get_metadata(stream) + cpath = None + if mi.cover_data and mi.cover_data[1]: + cpath = os.path.join(self.output_dir, 'cover.png') + open(cpath, 'wb').write(mi.cover_data[1]) self.cover_path = cpath |
![]() |
![]() |
![]() |
#9 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14
Karma: 87858
Join Date: Dec 2009
Device: Kindle 3
|
It works perfectly! Just a little typo:
Code:
- from calibre.ebook.metadata.pdf import get_metadata + from calibre.ebooks.metadata.pdf import get_metadata Thanks, Rogelio |
![]() |
![]() |
![]() |
#10 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,305
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It will be in the next release
|
![]() |
![]() |
![]() |
#12 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14
Karma: 87858
Join Date: Dec 2009
Device: Kindle 3
|
It is now included in calibre 0.7.15 :-)
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Buy Long shot--looking for HP Jornada 720 | PsyDocJoanne | Flea Market | 8 | 10-09-2008 03:45 PM |
Okay need help with Jornada 720 | The Skwerl | Alternative Devices | 2 | 07-23-2008 10:00 PM |
Ended HP Jornada 720 + extras for $100 | Nate the great | Flea Market | 8 | 07-04-2008 12:36 PM |
Microsoft Reader for Jornada 720 | The Skwerl | Reading and Management | 1 | 05-26-2008 01:58 PM |