View Full Version : multi-page HTML with images to ePub or LRF


Nvidiot
07-06-2009, 07:39 PM
I'm trying to convert a multi-page html book (http://www.hq.nasa.gov/office/pao/History/SP-4204/contents.html) to something I can read on my PRS-700. I've tried copying and pasting the text into an RTF file, and then using Calibre to convert to LRF or ePUB. This works, however, the images dissapear. The same thing happens when I just toss the RTF file on my reader. When I open the RTF file with MS Word (2007), the images are there and visible.

Any tips?

Nate the great
07-06-2009, 08:01 PM
I don't think RTF support on the Sony Reader includes images. The best quick way to do it is copy it into calibre and convert to either LRF or Epub.

Nvidiot
07-06-2009, 08:11 PM
I don't think RTF support on the Sony Reader includes images. The best quick way to do it is copy it into calibre and convert to either LRF or Epub.

The problem is, when I import the 'contents.html' into calibre, it thinks that that file is everything, obviously not what I want. When I import the RTF that I made with MS Word (with the pictures) and then convert to LRF or epub it converts but again misses the images.

Nate the great
07-06-2009, 08:27 PM
Well shoot. Is it possible to open the RTF in MSWord? You could save it as DOC, and then use calibre to convert that (I think).

Thanks for pointing this out. I spidered it, and I will place it on top of my TBC pile. If I get a chance, I'll throw up a Q&D conversion tonight.

Nvidiot
07-06-2009, 08:33 PM
Well shoot. Is it possible to open the RTF in MSWord? You could save it as DOC, and then use calibre to convert that (I think).

Thanks for pointing this out. I spidered it, and I will place it on top of my TBC pile. If I get a chance, I'll throw up a Q&D conversion tonight.

Doc2lrf is not supported in Calibre (at least not in 0.5.14) :(

I'd be VERY happy with a Q&D conversion (especially if you tell me how you did it). I don't care about non-working links to footnotes etc, if they are at the end of a chapter I can find 'm easily enough, the chapters are short anyway.

I tried copying & pasting the text to the Atlantis editor and using it's epub export option. That does seem to work better, however I'll have to copy & paste the images one at a time. Selecting all of the html and pasting it in will not put the images in. Also, the right side of the images is cut off on the reader. At least it's progress :)

wallcraft
07-06-2009, 08:44 PM
Try importing the RTF into OpenOffice and exporting it as an ODT file. This should be readable (with images I think) by Calibre. Anoother possibility is save as "web page filtered" from Word.

Nvidiot
07-06-2009, 08:55 PM
Fixing the links to point to local pages (wget -k) did the trick. Calibre correctly read in all the html files and made a decent LRF out of it. Only problem I have is that for some reason it put some chapters in front of of others when they should not be. Not sure what's going on with that, I opened the 'contents.html', which has all the chapters/pages linked, in the proper order. :help:

Nate the great
07-06-2009, 10:29 PM
Here is the Q&D edition in Epub and Mobipocket.

I haven't done anything to the formatting, and I make no claims about the quality becuase the original html is horrible. But I will say that the links _should_ work correctly, the files _should_ be in the correct order, and all the important images _should_ have been included.

Enjoy.


EDIT: Having looked at the ebooks I must say that they're a lot better than I expected.

SECOND EDIT: I moved the files to the book upload section so others can find them.

Epub:
http://www.mobileread.com/forums/showthread.php?t=50384

Mobi:
http://www.mobileread.com/forums/showthread.php?t=50385

Nvidiot
07-06-2009, 10:44 PM
Awesome! :thanks::iloveyou:

If you could tell me how you did it I can do it myself next time around :)

Nate the great
07-06-2009, 11:02 PM
1. Downloaded the set of pages with WinHTTrack.
2. Started a new ebook project in Mobipocket Creator, and carefully added the files a few at a time to make sure they were in the correct order.
3. Failed to build the ebook several times so I could identify and delete the bad files created in the download step. (Don't worry, they were created by the download program and weren't source content.)
4. Built the Mobipocket ebook. Saved the ebook project.
5.Used html2epub.exe with the ebook project files to make the Epub version.


Total time invested: about an hour

nrapallo
07-07-2009, 01:54 PM
1. Downloaded the set of pages with WinHTTrack.

This is absolutely the RIGHT tool for building ebooks from webpages; much easier when the webpages stay on the same domain and go "downwards" from there. Did you realize there was a "cover.html" that would have been the best place to start the spidering instead of the "contents.html"? I spidered it last night and it took all of 6 minutes. The ensuing ebook conversion to .imp took several hours more (see below). :rolleyes:

2. Started a new ebook project in Mobipocket Creator, and carefully added the files a few at a time to make sure they were in the correct order.

I replicated the .html files ordering in TOC within the "contents.html" and used that as my starting point for the .opf.

3. Failed to build the ebook several times so I could identify and delete the bad files created in the download step. (Don't worry, they were created by the download program and weren't source content.)

:thumbsup: This is the ONLY way, through several unsuccessful trials, to get things right. This takes MOST of the time to convert webpages to ebooks! :(

4. Built the Mobipocket ebook. Saved the ebook project.
5.Used html2epub.exe with the ebook project files to make the Epub version.

After getting the .prc version , I used Mobi2IMP to convert it to .imp formats, but the eBook Publisher is a lot more picky and sensitive to badly coded html, so I had to "fix" a lot more problems, i.e.

ill-formed/corrupt images,
<h1> tags in the <head> section and BEFORE the <body> tag,
non-existent links due to typos,
non-existent images for previous, next and index links,
missing image retrieved from an old website copy using WayBackMachine at archive.org
many minor fixes to make the resulting .html look more presentable...


Total time invested: about an hour

Total time invested: almost 3 hours

Uploading the .imp formats, which differ slightly from your (.prc) version. Check here (http://www.mobileread.com/forums/showthread.php?t=50426).

I can upload my .prc/.epub versions if you would like as well?

Nate the great
07-07-2009, 04:14 PM
ill-formed/corrupt images,
<h1> tags in the <head> section and BEFORE the <body> tag, yep
non-existent links due to typos, yep
non-existent images for previous, next and index links,
missing image retrieved from an old website copy using WayBackMachine at archive.org
many minor fixes to make the resulting .html look more presentable...still working on this




Those 3 link images are there; they're just linked to in an odd way. Also, can you let me have a copy of the missing image from file ch22-6.html?

I wish I'd known about the cover but it's okay. I like the one I made.

nrapallo
07-07-2009, 09:26 PM
Those 3 link images are there; they're just linked to in an odd way. Also, can you let me have a copy of the missing image from file ch22-6.html?

I wish I'd known about the cover but it's okay. I like the one I made.

I changed the way those three links referenced their images to make them better to use.

The missing image was m493b.gif and is attached. There were two corrupt images that I could fix (attached as well), the others were corrupt from when the website was originally set up, as far as I can tell.

BTW, here's a snapshot of the cover page I used (basically their cover.html).

Nate the great
07-07-2009, 10:18 PM
Thank you for the images.

BTW, I sent the 2 files with all the link errors to the contact email listed. I also sent a list of the errors I found, and mentioned that I was making an ebook. This afternoon I received a response.

The History Division at NASA is planning to convert all of their documents to ebooks. They wanted to know about the tools I use and my work process. I wrote a fairly lengthy email.

And yes, I did direct them here.

nrapallo
07-07-2009, 11:20 PM
Thank you for the images.

BTW, I sent the 2 files with all the link errors to the contact email listed. I also sent a list of the errors I found, and mentioned that I was making an ebook. This afternoon I received a response.

The History Division at NASA is planning to convert all of their documents to ebooks. They wanted to know about the tools I use and my work process. I wrote a fairly lengthy email.

And yes, I did direct them here.

:cool: Great news... who would have thought that a recreational hour or so would have resulted in a productive skillset valuable to others! :2thumbsup

Nate the great
07-07-2009, 11:34 PM
:cool: Great news... who would have thought that a recreational hour or so would have resulted in a productive skillset valuable to others! :2thumbsup

I've spent more than an hour on this project, when you add the time I invested in the later versions.

Nvidiot
07-08-2009, 12:18 AM
Wow.
1. Awesome info on how to do it, next book I can probably do myself (I have to tinker with it and am currently on a laptop without a mouse. :bookworm: Drag & drop with a touchpad is not my thing)
2. Awesome support on this forum, by not only telling me how to do it but by doing it for me. :thanks:
3. Awesome that NASA is not ignoring old publications and actually intends to make ebooks out of them.

I feel all warm & fuzzy inside :2thumbsup

Nate the great
07-08-2009, 01:26 AM
Wow.
1. Awesome info on how to do it, next book I can probably do myself (I have to tinker with it and am currently on a laptop without a mouse. :bookworm: Drag & drop with a touchpad is not my thing)
2. Awesome support on this forum, by not only telling me how to do it but by doing it for me. :thanks:
3. Awesome that NASA is not ignoring old publications and actually intends to make ebooks out of them.

I feel all warm & fuzzy inside :2thumbsup

Thank you.

You know, Nick wasn't kidding before when he said we do this for fun. It's certainly true for me. You've fallen in with some rather odd people.

Also, it wasn't until you mentioned a touchpad that I realized I use 2 and 3 fingers at one time on mine. Don't ask me how; I honestly can't figure it out.

nrapallo
07-08-2009, 11:50 AM
Uploaded two additional ebook formats for Moonport, i.e.

- .lrf originally requested (http://www.mobileread.com/forums/showthread.php?t=50465)

- .pdf formatted for Sony PRS (http://www.mobileread.com/forums/showthread.php?t=50485)

Yeah, this IS for fun! ;)

DaleDe
07-13-2009, 08:20 PM
The problem is, when I import the 'contents.html' into calibre, it thinks that that file is everything, obviously not what I want. When I import the RTF that I made with MS Word (with the pictures) and then convert to LRF or epub it converts but again misses the images.

Save to HTML from Word. Place all the files, html and images in a zip file. Calibre can work with that.

Dale