MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Workshop (https://www.mobileread.com/forums/forumdisplay.php?f=178)
-   -   multi-page HTML with images to ePub or LRF (https://www.mobileread.com/forums/showthread.php?t=50365)

Nvidiot 07-06-2009 07:39 PM

multi-page HTML with images to ePub or LRF
 
I'm trying to convert a multi-page html book (http://www.hq.nasa.gov/office/pao/Hi.../contents.html) to something I can read on my PRS-700. I've tried copying and pasting the text into an RTF file, and then using Calibre to convert to LRF or ePUB. This works, however, the images dissapear. The same thing happens when I just toss the RTF file on my reader. When I open the RTF file with MS Word (2007), the images are there and visible.

Any tips?

Nate the great 07-06-2009 08:01 PM

I don't think RTF support on the Sony Reader includes images. The best quick way to do it is copy it into calibre and convert to either LRF or Epub.

Nvidiot 07-06-2009 08:11 PM

Quote:

Originally Posted by Nate the great (Post 514111)
I don't think RTF support on the Sony Reader includes images. The best quick way to do it is copy it into calibre and convert to either LRF or Epub.

The problem is, when I import the 'contents.html' into calibre, it thinks that that file is everything, obviously not what I want. When I import the RTF that I made with MS Word (with the pictures) and then convert to LRF or epub it converts but again misses the images.

Nate the great 07-06-2009 08:27 PM

Well shoot. Is it possible to open the RTF in MSWord? You could save it as DOC, and then use calibre to convert that (I think).

Thanks for pointing this out. I spidered it, and I will place it on top of my TBC pile. If I get a chance, I'll throw up a Q&D conversion tonight.

Nvidiot 07-06-2009 08:33 PM

Quote:

Originally Posted by Nate the great (Post 514126)
Well shoot. Is it possible to open the RTF in MSWord? You could save it as DOC, and then use calibre to convert that (I think).

Thanks for pointing this out. I spidered it, and I will place it on top of my TBC pile. If I get a chance, I'll throw up a Q&D conversion tonight.

Doc2lrf is not supported in Calibre (at least not in 0.5.14) :(

I'd be VERY happy with a Q&D conversion (especially if you tell me how you did it). I don't care about non-working links to footnotes etc, if they are at the end of a chapter I can find 'm easily enough, the chapters are short anyway.

I tried copying & pasting the text to the Atlantis editor and using it's epub export option. That does seem to work better, however I'll have to copy & paste the images one at a time. Selecting all of the html and pasting it in will not put the images in. Also, the right side of the images is cut off on the reader. At least it's progress :)

wallcraft 07-06-2009 08:44 PM

Try importing the RTF into OpenOffice and exporting it as an ODT file. This should be readable (with images I think) by Calibre. Anoother possibility is save as "web page filtered" from Word.

Nvidiot 07-06-2009 08:55 PM

Fixing the links to point to local pages (wget -k) did the trick. Calibre correctly read in all the html files and made a decent LRF out of it. Only problem I have is that for some reason it put some chapters in front of of others when they should not be. Not sure what's going on with that, I opened the 'contents.html', which has all the chapters/pages linked, in the proper order. :help:

Nate the great 07-06-2009 10:29 PM

Here is the Q&D edition in Epub and Mobipocket.

I haven't done anything to the formatting, and I make no claims about the quality becuase the original html is horrible. But I will say that the links _should_ work correctly, the files _should_ be in the correct order, and all the important images _should_ have been included.

Enjoy.


EDIT: Having looked at the ebooks I must say that they're a lot better than I expected.

SECOND EDIT: I moved the files to the book upload section so others can find them.

Epub:
https://www.mobileread.com/forums/showthread.php?t=50384

Mobi:
https://www.mobileread.com/forums/showthread.php?t=50385

Nvidiot 07-06-2009 10:44 PM

Awesome! :thanks::iloveyou:

If you could tell me how you did it I can do it myself next time around :)

Nate the great 07-06-2009 11:02 PM

1. Downloaded the set of pages with WinHTTrack.
2. Started a new ebook project in Mobipocket Creator, and carefully added the files a few at a time to make sure they were in the correct order.
3. Failed to build the ebook several times so I could identify and delete the bad files created in the download step. (Don't worry, they were created by the download program and weren't source content.)
4. Built the Mobipocket ebook. Saved the ebook project.
5.Used html2epub.exe with the ebook project files to make the Epub version.


Total time invested: about an hour

nrapallo 07-07-2009 01:54 PM

Quote:

Originally Posted by Nate the great (Post 514216)
1. Downloaded the set of pages with WinHTTrack.

This is absolutely the RIGHT tool for building ebooks from webpages; much easier when the webpages stay on the same domain and go "downwards" from there. Did you realize there was a "cover.html" that would have been the best place to start the spidering instead of the "contents.html"? I spidered it last night and it took all of 6 minutes. The ensuing ebook conversion to .imp took several hours more (see below). :rolleyes:

Quote:

2. Started a new ebook project in Mobipocket Creator, and carefully added the files a few at a time to make sure they were in the correct order.
I replicated the .html files ordering in TOC within the "contents.html" and used that as my starting point for the .opf.

Quote:

3. Failed to build the ebook several times so I could identify and delete the bad files created in the download step. (Don't worry, they were created by the download program and weren't source content.)
:thumbsup: This is the ONLY way, through several unsuccessful trials, to get things right. This takes MOST of the time to convert webpages to ebooks! :(

Quote:

4. Built the Mobipocket ebook. Saved the ebook project.
5.Used html2epub.exe with the ebook project files to make the Epub version.
After getting the .prc version , I used Mobi2IMP to convert it to .imp formats, but the eBook Publisher is a lot more picky and sensitive to badly coded html, so I had to "fix" a lot more problems, i.e.
  • ill-formed/corrupt images,
  • <h1> tags in the <head> section and BEFORE the <body> tag,
  • non-existent links due to typos,
  • non-existent images for previous, next and index links,
  • missing image retrieved from an old website copy using WayBackMachine at archive.org
  • many minor fixes to make the resulting .html look more presentable...

Quote:

Total time invested: about an hour
Total time invested: almost 3 hours

Uploading the .imp formats, which differ slightly from your (.prc) version. Check here.

I can upload my .prc/.epub versions if you would like as well?

Nate the great 07-07-2009 04:14 PM

Quote:

Originally Posted by nrapallo (Post 514716)
  • ill-formed/corrupt images,
  • <h1> tags in the <head> section and BEFORE the <body> tag, yep
  • non-existent links due to typos, yep
  • non-existent images for previous, next and index links,
  • missing image retrieved from an old website copy using WayBackMachine at archive.org
  • many minor fixes to make the resulting .html look more presentable...still working on this

Those 3 link images are there; they're just linked to in an odd way. Also, can you let me have a copy of the missing image from file ch22-6.html?

I wish I'd known about the cover but it's okay. I like the one I made.

nrapallo 07-07-2009 09:26 PM

4 Attachment(s)
Quote:

Originally Posted by Nate the great (Post 514858)
Those 3 link images are there; they're just linked to in an odd way. Also, can you let me have a copy of the missing image from file ch22-6.html?

I wish I'd known about the cover but it's okay. I like the one I made.

I changed the way those three links referenced their images to make them better to use.

The missing image was m493b.gif and is attached. There were two corrupt images that I could fix (attached as well), the others were corrupt from when the website was originally set up, as far as I can tell.

BTW, here's a snapshot of the cover page I used (basically their cover.html).

Nate the great 07-07-2009 10:18 PM

Thank you for the images.

BTW, I sent the 2 files with all the link errors to the contact email listed. I also sent a list of the errors I found, and mentioned that I was making an ebook. This afternoon I received a response.

The History Division at NASA is planning to convert all of their documents to ebooks. They wanted to know about the tools I use and my work process. I wrote a fairly lengthy email.

And yes, I did direct them here.

nrapallo 07-07-2009 11:20 PM

Quote:

Originally Posted by Nate the great (Post 515212)
Thank you for the images.

BTW, I sent the 2 files with all the link errors to the contact email listed. I also sent a list of the errors I found, and mentioned that I was making an ebook. This afternoon I received a response.

The History Division at NASA is planning to convert all of their documents to ebooks. They wanted to know about the tools I use and my work process. I wrote a fairly lengthy email.

And yes, I did direct them here.

:cool: Great news... who would have thought that a recreational hour or so would have resulted in a productive skillset valuable to others! :2thumbsup


All times are GMT -4. The time now is 06:02 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.