Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-29-2009, 08:17 PM   #1
pablofunes
Junior Member
pablofunes began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2009
Device: kindle2
Question La Jornada

Renewed interest in a recipe for Mexican newspaper La Jornada. My recipe below has two problems:

1. The Photos go to the bottom of each article, why?

2. I would like to include the front cover but I don't know how to. The source provides a front cover PDF, here's my bash script to download it:

Code:
Y=`date +%Y` # Year 4 digit
m=`date +%m` # Month 2 digit
d=`date +%d` # day 2 digit
wget -q "http://www.jornada.unam.mx/$Y/$m/$d/portda.pdf" -O - | convert pdf:- png:-
Here's my basic recipe:

Code:
__license__   = 'GPL v3'
__copyright__ = '2009, Pablo Funes <pablo at imprentaluz.com>'
'''
La Jornada
'''

# TODO: Pictures should go to the top, not the bottom of each article.  
# TODO: Front cover? 

class AdvancedUserRecipe1262065387(BasicNewsRecipe):
    title          = u'La Jornada'
    oldest_article = 7
    max_articles_per_feed = 100

    feeds          = [
                ('opinion','http://www.jornada.unam.mx/rss/opinion.xml'),
                ('politica','http://www.jornada.unam.mx/rss/politica.xml'),
                ('economia','http://www.jornada.unam.mx/rss/economia.xml'),
                ('mundo','http://www.jornada.unam.mx/rss/mundo.xml'),
                ('estados','http://www.jornada.unam.mx/rss/estados.xml'),
                ('capital','http://www.jornada.unam.mx/rss/capital.xml'),
                ('sociedad','http://www.jornada.unam.mx/rss/sociedad.xml'),
                ('ciencias','http://www.jornada.unam.mx/rss/ciencias.xml'),
                ('cultura','http://www.jornada.unam.mx/rss/cultura.xml'),
                ('gastronomia','http://www.jornada.unam.mx/rss/gastronomia.xml')
,
                ('espectaculos','http://www.jornada.unam.mx/rss/espectaculos.xml
'),
                ('deportes','http://www.jornada.unam.mx/rss/deportes.xml'),
                ('cartones','http://www.jornada.unam.mx/rss/cartones.xml'),

                ]


    keep_only_tags = [
                        dict(name='div', attrs={'class':["sumarios","cabeza","te
xt","foto"]}),
                          ]
pablofunes is offline   Reply With Quote
Old 12-29-2009, 08:26 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,305
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You can use the pdf metadata reader in calibre to convert the PDF to a jpeg for the cover.

Not sure about pictures going to bottom, probably they use javascript in the website to move them to the top. You can do something similar by overriding the postprocess_html method in the recipe
kovidgoyal is offline   Reply With Quote
Advert
Old 12-30-2009, 10:44 AM   #3
rogeliodh
Member
rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.
 
Posts: 14
Karma: 87858
Join Date: Dec 2009
Device: Kindle 3
PDF metadata reader

Quote:
Originally Posted by kovidgoyal View Post
You can use the pdf metadata reader in calibre to convert the PDF to a jpeg for the cover.
Where can I find how to use this pdf metadata reader?
rogeliodh is offline   Reply With Quote
Old 12-30-2009, 11:17 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,305
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
look at the file calibre/ebooks/metadata/pdf.py in the calibre source code
kovidgoyal is offline   Reply With Quote
Old 12-30-2009, 11:27 AM   #5
pablofunes
Junior Member
pablofunes began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2009
Device: kindle2
Thumbs up PDF or not

Hi Rogelio, I saw your La Jornada script as well... It doesn't have the pictures-at-the bottom problem but it loses all formatting, perhaps that's too radical. Perhaps not. I'd sent you an email as well. Yo do seem to know more Python than me ;-)

We need to figure out several steps: how to generate the YYYY/MM/DD field, how to download the PDF and how to inject it as the Calibre doc cover.
pablofunes is offline   Reply With Quote
Advert
Old 12-30-2009, 12:56 PM   #6
rogeliodh
Member
rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.
 
Posts: 14
Karma: 87858
Join Date: Dec 2009
Device: Kindle 3
New version

Hi,

This is the new version (rename it to .recipe), it uses the per section feeds and a cover.jpg available in the site.

Next version:
* Use portada.pdf to get a better resolution cover (using kovidgoyal recommendation)
* Get URL for portada.* in a more reliable way (I'm using ugly code right now)
* Do not lose all the article formatting
* 'Cartones' section!
Attached Files
File Type: txt la_jornada.txt (2.6 KB, 205 views)

Last edited by rogeliodh; 12-30-2009 at 05:04 PM.
rogeliodh is offline   Reply With Quote
Old 12-30-2009, 04:52 PM   #7
rogeliodh
Member
rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.
 
Posts: 14
Karma: 87858
Join Date: Dec 2009
Device: Kindle 3
Add conversion to jpg to covers

Quote:
Originally Posted by kovidgoyal View Post
look at the file calibre/ebooks/metadata/pdf.py in the calibre source code
I didn't get howto use the metadata :-(

But I think that it would be better if calibre itself autoconvert cover files to a reliable format (as jpg). So I implemented a change in BasicNewsRecipe so that recipes could tell calibre that the cover needs conversion to jpg

Attached is the change and the recipe using it. It is my first digging into calibre's source code and I'm a python beginner so any comments are welcome, hoping this change to be included in a future calibre release.

Best regards.
Rogelio
Attached Files
File Type: txt la_jornada.txt (2.6 KB, 249 views)
File Type: txt my-changes.txt (3.5 KB, 260 views)
rogeliodh is offline   Reply With Quote
Old 12-30-2009, 05:49 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,305
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Using imagemagick to render PDf is not very portable as IIRC it requires ghostscript. Here's a patch to feeds/news.py taht will automatically handle a pdf cover, test it and let me know

Code:
=== modified file 'src/calibre/web/feeds/news.py'
--- src/calibre/web/feeds/news.py       2009-12-29 16:04:28 +0000
+++ src/calibre/web/feeds/news.py       2009-12-30 22:47:18 +0000
@@ -823,6 +823,14 @@
             cpath = os.path.join(self.output_dir, 'cover.'+ext)
             with nested(open(cpath, 'wb'), closing(self.browser.open(cu))) as (cfile, r):
                 cfile.write(r.read())
+            if ext.lower() == 'pdf':
+                from calibre.ebook.metadata.pdf import get_metadata
+                stream = open(cpath, 'rb')
+                mi = get_metadata(stream)
+                cpath = None
+                if mi.cover_data and mi.cover_data[1]:
+                    cpath = os.path.join(self.output_dir, 'cover.png')
+                    open(cpath, 'wb').write(mi.cover_data[1])
             self.cover_path = cpath
kovidgoyal is offline   Reply With Quote
Old 12-31-2009, 12:07 PM   #9
rogeliodh
Member
rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.
 
Posts: 14
Karma: 87858
Join Date: Dec 2009
Device: Kindle 3
It works perfectly! Just a little typo:

Code:
-                from calibre.ebook.metadata.pdf import get_metadata
+                from calibre.ebooks.metadata.pdf import get_metadata
Please let me know if this will be included in a future version.

Thanks,
Rogelio
rogeliodh is offline   Reply With Quote
Old 12-31-2009, 12:25 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,305
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It will be in the next release
kovidgoyal is offline   Reply With Quote
Old 08-19-2010, 11:09 PM   #11
rogeliodh
Member
rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.
 
Posts: 14
Karma: 87858
Join Date: Dec 2009
Device: Kindle 3
I've created a new recipe for La Jornada. You can find it here
rogeliodh is offline   Reply With Quote
Old 08-20-2010, 06:02 PM   #12
rogeliodh
Member
rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.rogeliodh knows better than to ask about the Gravitic Imploder Lance.
 
Posts: 14
Karma: 87858
Join Date: Dec 2009
Device: Kindle 3
It is now included in calibre 0.7.15 :-)
rogeliodh is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Buy Long shot--looking for HP Jornada 720 PsyDocJoanne Flea Market 8 10-09-2008 03:45 PM
Okay need help with Jornada 720 The Skwerl Alternative Devices 2 07-23-2008 10:00 PM
Ended HP Jornada 720 + extras for $100 Nate the great Flea Market 8 07-04-2008 12:36 PM
Microsoft Reader for Jornada 720 The Skwerl Reading and Management 1 05-26-2008 01:58 PM


All times are GMT -4. The time now is 05:00 AM.


MobileRead.com is a privately owned, operated and funded community.