Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-26-2010, 09:11 PM   #1
JPD
Member
JPD began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
web page image incorrecly appears at top of conversion

I'm trying to convert multiple nested pages from a web site into an ebook. It's a scientific site, and the pages are image-rich with gifs. I open a page, e.g., http://www.chemguide.co.uk/analysis/...ation.html#top , in FireFox, save it to my desktop (and the save from FF saves the images locally in a matching folder), and drag the saved html into Calibre. The file appears in Calibre as a zip, and I convert it to mobi without making any changes to the metadata. When complete and I view the converted file, and regardless of what page from this site I'm working on, one of the image from the page appears at position 1.0 where the content should actually begin, above the page heading where the book should actually start. That image also still shows correctly where it's supposed to as well. It's almost as if the image is being inserted at the beginning as a book cover, although to the right it shows the generic book image.

I don't know if this is related, but when I quit calibre I get this error message:

IOError: [Errno 2] No such file or directory: '/var/folders/3g/3g++kTeeHJmwGtYBJz9CQk+++TI/-Tmp-/calibre_0.7.20_tmp_gFSqaR/ipc_result_1_7_q_9c8r.pickle'

ERROR: ERROR: Unhandled exception: <b>IOError</b>:[Errno 2] No such file or directory: '/var/folders/3g/3g++kTeeHJmwGtYBJz9CQk+++TI/-Tmp-/calibre_0.7.20_tmp_gFSqaR/ipc_result_1_7_q_9c8r.pickle'

Traceback (most recent call last):
File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 147, in main
return run_entry_point()
File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 116, in run_entry_point
return getattr(pmod, func)()
File "site-packages/calibre/utils/ipc/worker.py", line 101, in main

I'm using a PPC Apple iBook running Mac OS 10.5.8, FF 3.6.10, and Calibre 0.7.20.

Any help that can offered would be appreciated.
JPD is offline   Reply With Quote
Old 09-27-2010, 12:41 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
That's happening most likely because you're not also using the original site's css. Without the original css an html page will often look nothing like the original formatting on the web.

Anyway just saving the html is not generally the best way to convert a site to an ebook - you should read up on creating recipes, and then create a recipe for that site, only extracting the data that's pertinent to the ebook from the site and leaving the rest of the interactive cruft out.
ldolse is offline   Reply With Quote
Advert
Old 09-27-2010, 11:13 PM   #3
JPD
Member
JPD began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
CSS

The site is actually very clean, no ads, doesn't even look bad if I put it in pdf except pdfs on Kindle can't take advantage of most of its features, hence my wanting to convert to mobi. It's a static site, the content does not change, so creating an ebook of it would be a one time thing, so while I initially went into it thinking recipes were the way to go (I originally posted in recipes), it's since become apparent to me that's not the best approach here. In the recipe forum it was determined the stray image problem was because MOBI doesn't support floating images, so calibre puts them where they appear in the source document markup. I don't know what that means or how to address it.

If you can steer me to any site where I might be able to learn the needed skills it would be appreciated.
JPD is offline   Reply With Quote
Old 09-28-2010, 08:14 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by JPD View Post
The site is actually very clean, no ads, doesn't even look bad if I put it in pdf except pdfs on Kindle can't take advantage of most of its features, hence my wanting to convert to mobi. It's a static site, the content does not change, so creating an ebook of it would be a one time thing, so while I initially went into it thinking recipes were the way to go (I originally posted in recipes), it's since become apparent to me that's not the best approach here. In the recipe forum it was determined the stray image problem was because MOBI doesn't support floating images, so calibre puts them where they appear in the source document markup. I don't know what that means or how to address it.

If you can steer me to any site where I might be able to learn the needed skills it would be appreciated.
You are right that a recipe is probably not the best option for a one-off conversion. I'd suggest that you edit the html directly (notepad+ is a good option, but there are lots of html editors), or convert the zip/html to epub and use sigil. As Kovid said, this is simply a problem with placement of the image - Calibre does not handle floating images, and instead puts it where the code appears in the html file. Just open up the html or epub and cut the code for the image, then paste it where you want it to appear. If you post the file here, or the relevant parts, someone can help you identify the part to be cut and moved.
Starson17 is offline   Reply With Quote
Old 09-28-2010, 08:48 AM   #5
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
I know this is considered by most to be a no-go, but did you try just taking the PDF file you said you have and just converting that?
Manichean is offline   Reply With Quote
Advert
Old 09-28-2010, 09:14 AM   #6
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
There don't appear to be any floating images in the linked article, so I'm wondering if there is actually a bug in Calibre somewhere.

Anyway, here is a foolproof, but manual way to do this:
  1. Install the Firebug add-on to Firefox
  2. View the desired page in Firefox
  3. Open Firebug by clicking the little bug icon on the lower right corner of Firefox
  4. Now you need to click the 'Inspect' Icon in Firebug, it's the second icon from the left in Firebug's toolbar, the one with an arrow and a box
  5. Once that's clicked, hover over the different elements of the document, you want to hover over the element that covers all the text, you'll know you have the right one when you highlight the one that says <table....xxxx.....>
  6. Now, in the lower half of the window where firebug is showing the DOM, right click on the table element and click copy innerhtml
  7. Paste that into Notepad or some other text editor
  8. Save that as a new html file, then load that in FIrefox
  9. Save the simplified version of the doc in Firefox as a complete web page
  10. Load the new html file into Calibre
  11. Convert

You'll get output that looks like what I've attached.

The recipe framework can be used to do this all of this manual work automatically as I mentioned before. It's definitely a bit more work than a standard recipe because there isn't any rss feed, but it's doable - you could also create your own list of articles to use as the 'feed'.
Attached Files
File Type: epub Electromagnetic Radiation - Jim Clark.epub (117.8 KB, 211 views)
ldolse is offline   Reply With Quote
Old 09-28-2010, 11:53 AM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by ldolse View Post
There don't appear to be any floating images in the linked article, so I'm wondering if there is actually a bug in Calibre somewhere.
I tested his page by just saving the page/html out of FireFox, dragging the index.html into Calibre and viewing. That worked correctly. I assume the EPUB conversion would have been fine, too. His problem was the conversion to mobi. I tried that conversion and confirmed the problem. Kovid posted that floating images weren't supported in the mobi conversion. I didn't go any further, but if there aren't any floating images, maybe it's a bug in the mobi conversion code?
Starson17 is offline   Reply With Quote
Old 09-28-2010, 11:59 AM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That site uses tables for layout. Use the linearize tables option. Table support in MOBI is terrible.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Old Thread] epub -> mobi conversion; blank page after image joubert Calibre 10 03-07-2011 05:26 PM
Amazon web page appears strange with Chrome? soondai Amazon Kindle 3 08-30-2010 07:21 AM
Scene breaks at page top/bottom radius Workshop 20 12-15-2009 06:59 PM
epub conversion - cover image Nate the great Calibre 15 09-14-2009 05:15 PM
Remove first image in file during conversion? itimpi Calibre 3 02-08-2009 12:57 AM


All times are GMT -4. The time now is 06:27 PM.


MobileRead.com is a privately owned, operated and funded community.