Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-21-2010, 08:44 AM   #1
oecherprinte
Zealot
oecherprinte began at the beginning.
 
Posts: 115
Karma: 20
Join Date: Jul 2010
Device: Kindle3 3G, Kindle Paperwhite 2
Downloading Webpages without RSS

Hi,

I would like to download several hierarchically ordered
web pages. But there is no RSS used here. I would just
like to follow the links from the top page and create articles
for the sub-pages.

Is there an easy way to do this with recipes? They seem to be pretty powerfull. Is there a tutorial for creating ebooks
with recipes from web sites without RSS?

Thanks,

Jens
oecherprinte is offline   Reply With Quote
Old 10-21-2010, 09:23 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by oecherprinte View Post
Is there a tutorial for creating ebooks
with recipes from web sites without RSS?
You can use wget or htttrack to pull a web site, then just save the html. No recipe needed. Or you can do it with a recipe using parse_index.

Here are some links:
https://www.mobileread.com/forums/sho...postcount=1878

These two are the best for parse_index:
http://calibre-ebook.com/user_manual...ownloaded-html
And:
http://bugs.calibre-ebook.com/wiki/recipeGuide_advanced
Starson17 is offline   Reply With Quote
Advert
Old 10-21-2010, 09:42 AM   #3
oecherprinte
Zealot
oecherprinte began at the beginning.
 
Posts: 115
Karma: 20
Join Date: Jul 2010
Device: Kindle3 3G, Kindle Paperwhite 2
Thanks for the answer. I have a question regarding the recursive html download. I am familiar with wget. So when I have the site downloaded, do I convert it with calibre or mobireader creator? Will is keep the links in between the files and put it all into a single document?

The calibre recipe would of course enable me to cut away undesired part from the webpage and get print versions ...

Thanks,

Jens
oecherprinte is offline   Reply With Quote
Old 10-21-2010, 10:06 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by oecherprinte View Post
when I have the site downloaded, do I convert it with calibre or mobireader creator?
I'd just drag the downloaded index.html file into Calibre, and let it make the html ebook, then convert as needed.

Quote:
Will is keep the links in between the files and put it all into a single document?
Yes

Quote:
The calibre recipe would of course enable me to cut away undesired part from the webpage and get print versions ...
Correct. The recipe would give more control - it depends on your source as to which works better/easier. I don't write many recipes for one-off website scrapes, but it will certainly work.
Starson17 is offline   Reply With Quote
Old 10-21-2010, 10:27 AM   #5
oecherprinte
Zealot
oecherprinte began at the beginning.
 
Posts: 115
Karma: 20
Join Date: Jul 2010
Device: Kindle3 3G, Kindle Paperwhite 2
Quote:
Originally Posted by Starson17 View Post
I'd just drag the downloaded index.html file into Calibre, and let it make the html ebook, then convert as needed.
How does Calibre know which links within index.html to follow?
oecherprinte is offline   Reply With Quote
Advert
Old 10-21-2010, 10:44 AM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by oecherprinte View Post
How does Calibre know which links within index.html to follow?
Relative links are followed, externals are not. Your scraper software (wget, etc.) controls downloading and converting links to relative links.
Starson17 is offline   Reply With Quote
Old 10-22-2010, 07:44 AM   #7
oecherprinte
Zealot
oecherprinte began at the beginning.
 
Posts: 115
Karma: 20
Join Date: Jul 2010
Device: Kindle3 3G, Kindle Paperwhite 2
I found that Mobireader Creator is doing a much better job than Calibre converting complete websites to the mobi format.
oecherprinte is offline   Reply With Quote
Old 10-22-2010, 10:58 AM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by oecherprinte View Post
I found that Mobireader Creator is doing a much better job than Calibre converting complete websites to the mobi format.
What does it do better? More specifically, what do you start with - the local downloaded html from a scraper (wget, htttrack), a recipe, a link? What settings did you use for converting your starting format?
Starson17 is offline   Reply With Quote
Old 10-22-2010, 04:27 PM   #9
oecherprinte
Zealot
oecherprinte began at the beginning.
 
Posts: 115
Karma: 20
Join Date: Jul 2010
Device: Kindle3 3G, Kindle Paperwhite 2
Sorry I was a bit to brief:

I downloaded the site using htttack. Then I used calibre to read to index.html file. Afterwards I downloaded the index.html to the kindle. The result did not look nice and the kindle got somehow stuck when reading the generated book. I.e. it went back to the main menu. So something went wrong. However, I can't see in calibre where I have any control about the conversion.

In Mobipocket Creator I can add all html files by dragging them into the tool. I can set html tags for creating a table of contents and the conversion to prc format is pretty quick. The results looked much better than in calibre and did not crash.
oecherprinte is offline   Reply With Quote
Old 10-22-2010, 04:46 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by oecherprinte View Post
I downloaded the site using htttack. Then I used calibre to read to index.html file. Afterwards I downloaded the index.html to the kindle.
I'm surprised you didn't do a conversion from html to something else.

Quote:
The result did not look nice and the kindle got somehow stuck when reading the generated book. I.e. it went back to the main menu. So something went wrong. However, I can't see in calibre where I have any control about the conversion.
You have a great deal of control during conversion in Calibre. You can use header/footer removal to eliminate things. You can control TOC with XPATH, control breaks, etc. If you go to EPUB, you can always use Sigil to clean things up. Regardless, the first control you have is the control from the download/scraper that builds the index.html. One of the reasons I like wget is the control it gives me over the download.

Quote:
In Mobipocket Creator I can add all html files by dragging them into the tool. I can set html tags for creating a table of contents and the conversion to prc format is pretty quick. The results looked much better than in calibre and did not crash.
OK. I'm not trying to recommend one or the other - whatever works best/easiest is what you should use, but I suspect it's mostly dependent on the starting html, and that's mostly dependent on the format of the site you are scraping and the tool you use to do the scrape.

I'm glad it worked for you.
Starson17 is offline   Reply With Quote
Old 10-23-2010, 04:26 AM   #11
oecherprinte
Zealot
oecherprinte began at the beginning.
 
Posts: 115
Karma: 20
Join Date: Jul 2010
Device: Kindle3 3G, Kindle Paperwhite 2
Apparently, I did not grasp the power of calibre. Is there a good tutorial for:

- converting html to mobi
- converting pdf to mobi

with emphasis on how to create a decent toc and how to influence the ordering of html subpages within a site? Currently, I download a site using wget or htttrack or convert pdf to html. Then I look into the html code with firebug and try to identify chapter section borders and insert tags manually (automatic recursive replace) into the html code. Then I generate a prc using Mobipocket Creator using the manually inserted tags to create a toc. Then I go onto calibre and convert the prc into mobi to be able to navigate through chapters using the 4 way button. Here, again I have to enter the html chapter tags so I do not loose the toc hierarchy.

Quite an elaborate process. I am happy about any suggestions for speeding up this process. I suspect calibre could handle that in an efficient way, if I would just be able to use it properly.

Thanks,

Jens
oecherprinte is offline   Reply With Quote
Old 10-23-2010, 09:52 AM   #12
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by oecherprinte View Post
Apparently, I did not grasp the power of calibre. Is there a good tutorial for:

- converting html to mobi
- converting pdf to mobi
I don't use mobi formats, but many here do. The user guide is where I'd point you. Then ask questions here.

Quote:
with emphasis on how to create a decent toc
This is in the Preferences | Conversion | Common Options |TOC area. XPath is used to build the TOC. There's a wizard to help.

Quote:
and how to influence the ordering of html subpages within a site?
This is controlled by the site or your scraper, not Calibre.

Quote:
Currently, I download a site using wget or htttrack or convert pdf to html. Then I look into the html code with firebug and try to identify chapter section borders and insert tags manually (automatic recursive replace) into the html code.
XPath in Calibre will find existing tags, whatever they are, and use them for TOC creation, assuming there are tags at the right spots.


Quote:
Then I generate a prc using Mobipocket Creator using the manually inserted tags to create a toc. Then I go onto calibre and convert the prc into mobi to be able to navigate through chapters using the 4 way button. Here, again I have to enter the html chapter tags so I do not loose the toc hierarchy.

Quite an elaborate process. I am happy about any suggestions for speeding up this process. I suspect calibre could handle that in an efficient way, if I would just be able to use it properly.
Perhaps someone else can make suggestions. I know there are some oddities about Kindle and TOC, but I'm not familiar with them.
Starson17 is offline   Reply With Quote
Old 10-23-2010, 01:03 PM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,600
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
http://calibre-ebook.com/user_manual...le-of-contents
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Classic G:RSS: Optimized Google Reader (RSS) for the Nook [BETA Testers needed] Fmstrat Barnes & Noble NOOK 24 12-28-2010 12:22 PM
Downloading and Converting Print version of RSS article Daanish87 Calibre 1 06-11-2010 02:08 AM
Is there a good way to convert partial rss to full rss feeds. Zorz Other formats 5 05-29-2010 12:17 PM
how to do, webpages to?¿ whopper Workshop 0 11-24-2008 07:31 PM
Webpages to Mobi? flashman Alternative Devices 0 07-29-2007 04:36 PM


All times are GMT -4. The time now is 03:32 AM.


MobileRead.com is a privately owned, operated and funded community.