Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 04-27-2011, 08:25 AM   #1
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Questions about skipping re-downloading and parse_index browse to next page

Hi all,

I'm planning to utilize a calibre server for my friends, who happen to use different e-readers.

I'm asking is there a possibility, to reuse the already downloaded news/html files for conversions to different output formats without re-downloading them? (e.g. I shall create a .mobi file from the recipe via the CLI/ebook-convert, then want to create a .epub from the same recipe and don't want to waste bandwith (currently on an EC2 instance).)

My 2nd question is: how does one create a recipe/parse_index for a page without rss AND has multiple section pages? E.g. there is a technology section on a site, and the last link is "next page" (on every page, but the last), and I want to add the "h2" article items with the same article date to the feed from every page...

Thanks for any advice!
hiperlink is offline   Reply With Quote
Old 04-27-2011, 02:02 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by hiperlink View Post
I'm asking is there a possibility, to reuse the already downloaded news/html files for conversions to different output formats without re-downloading them? (e.g. I shall create a .mobi file from the recipe via the CLI/ebook-convert, then want to create a .epub from the same recipe and don't want to waste bandwith (currently on an EC2 instance).)
This is a not an uncommon request. The usual answer is:
1) Use Windows scheduler or cron
2) to run script or batch file
3) to run ebook-convert first to make recipe-created book
4) then to run ebook-convert to convert recipe-created ebook to 2nd, 3rd, 4th formats,
5) then to run calibredb with the add option to add the books to Calibre.
Quote:
My 2nd question is: how does one create a recipe/parse_index for a page without rss AND has multiple section pages? E.g. there is a technology section on a site, and the last link is "next page" (on every page, but the last), and I want to add the "h2" article items with the same article date to the feed from every page...

Thanks for any advice!
parse_index is for non-RSS feed sites. If you have an RSS feed, you don't need parse_index. To do what you want, just grab the 2nd and 3rd pages with index_to_soup and build the feed.
Starson17 is offline   Reply With Quote
 
Advertisement
Old 04-27-2011, 05:43 PM   #3
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
for your answer!

I'll try out the solution for the first question tomorow at my devdesk.

But for the second, there's a little misunderstanding: the site I want to create a feed has a feed, but that's rather unuseable: most of the times there are duplicate items and it has only the 10 articles of wich usually at least 4 is duplicate, e.g.:
  • article 1
  • article 2
  • article 1
  • article 3
  • article 4
  • article 3
  • article 1
...
So I don't want to use it (yeah I know I can filter out the duplicates). The site has sections (e.g. Sport, Technology: it's a local newspaper site).
The sections has all the articles related to their sections, like

<h1>article 1</h1>
<p>2011-04-27</p>
(snippet)
<h1>article 2</h1>
<p>2011-04-27</p>
(snippet)
<h1>article 3</h1>
<p>2011-04-27</p>
(snippet)
<h1>article 4</h1>
<p>2011-04-27</p>
(snippet)
<h1>article 5</h1>
<p>2011-04-27</p>
(snippet)
<link to next page>

And on the next page:
<h1>article 6</h1>
<p>2011-04-27</p>
(snippet)
<h1>article 7</h1>
<p>2011-04-27</p>
(snippet)
<h1>article 8</h1>
<p>2011-04-27</p>
(snippet)
<h1>article 9</h1>
<p>2011-04-26</p>
(snippet)
<h1>article 10</h1>
<p>2011-04-26</p>
(snippet)
<link to next page>

Now what I want is to generate a custom feed for todays all articles, for which I have to open the index page of the section, then click the link to next page(s) until I can find and add articles to the feed with todays date.

I can parse the index page, and create the feed for it, but how to get to the next page?

Thanks in advance!
hiperlink is offline   Reply With Quote
Old 04-28-2011, 08:55 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by hiperlink View Post
I can parse the index page, and create the feed for it, but how to get to the next page?
Use parse_index on the first RSS feed page, then grab the 2nd and 3rd pages with index_to_soup to build the complete feed.
Starson17 is offline   Reply With Quote
Old 04-28-2011, 10:21 AM   #5
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Thank You, I'm experimenting with that now!

For the record, the reusing of the already downloaded HTML files was pretty easy:

Code:
ebook-convert some.recipe somedir/ && for format in mobi epub pdf ; do ebook-convert somedir/index.html "some.${format}" ; done
Anyway, thanks again!
hiperlink is offline   Reply With Quote
Old 04-28-2011, 10:49 AM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by hiperlink View Post
For the record, the reusing of the already downloaded HTML files was pretty easy
Be aware that a recipe-created ebook of mobi format is not necessarily the same as a recipe-created ebook of epub format that has been converted to mobi format. When the device is set to Kindle, the recipe system makes some changes to the ebook that aren't made in a straight conversion.
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Downloading Metadata - couple of questions sadievan Calibre 6 12-14-2010 10:27 PM
How does my new ebook downloading support page look? Falbe Publishing General Discussions 6 09-14-2010 05:30 PM
Classic Complete Beginner Questions For The Nook, Especially About Downloading Free Books sun surfer Barnes & Noble NOOK 13 08-08-2010 05:30 AM
PRS-505 PRS+: 2 Questions on "Browse Folders" and Dictionary Format crc Sony Reader 2 06-23-2010 02:36 AM
Article Dates with parse_index EnergyLens Calibre 6 04-21-2010 11:13 PM


All times are GMT -4. The time now is 04:34 AM.


MobileRead.com is a privately owned, operated and funded community.