|
![]() |
|
Thread Tools | Search this Thread |
![]() |
#1 |
Member
![]() Posts: 14
Karma: 10
Join Date: Feb 2010
Device: none
|
Help converting file from HTML>EPub. File is divided in several pages I want to merge
Hello.
The problem is as follows: I have a PDF book, I convert it from PDF to HTML using pdftotext (in this case pdftohtml). The HTML files look good and everything, the PDF has been copied maintaining most of its format, the indentation is, even, intact. The problem is that pdftohtml separated the book in 239 html files... a file per page. I did a mild editing on the HTMLs deleting the page number at the bottom, and then I exported them to EPUB and later, to MOBI, all this with calibre. When I read the file on my Kindle, I noticed that the device respected the disposition of the text in HTML. For example, if page3.html has 5 lines, Kindle shows those lines and nothing else, when you pass to page4.html, it shows the lines contained inside the file, doesn't merge the lines in Page3 with the ones of Page4, it doesn't matter if they are from the same chapter. I thought about opening every HTML and merge them in a single big DOC file while correcting all the strange page breaks, but I can't find a way to make Word or something similar to preserve all the indentation the book has, and that's my problem. Just to be clear, I want to find a way to remove all the page breaks (Manually if necessary) while maintaining the format as clean as possible, especially the indentation, which is my biggest problem. Is there a way to copy-paste text while keeping the original indentation? If I could do that, I would be able to merge the text of all 239 pages and then create a new ebook file. Is there a program or way to do this? |
![]() |
![]() |
![]() |
#2 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Sure. Open the Editor (shortcut key is "T") and use the "Merge Selected Text Files" right-click option in the Files Browser.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Member
![]() Posts: 14
Karma: 10
Join Date: Feb 2010
Device: none
|
Quote:
http://i.imgur.com/plxGDgQ.png Any ideas? I learned some HTML back in the day, but I never got to learn CSS, I just hit a wall there. |
|
![]() |
![]() |
![]() |
#4 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,876
Karma: 59840450
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Code:
line-height: <ANY VALUE UNDER 1.2>; There are reasons to use other than Normal. They are 'special cases' .. like where you have mixed sizes in a block and you want to tighten up to match the normal blocks. CSS is really simple once you wrap your head around that it is just a way to remote the stuff you coded in line, ONCE class="aname" just says 'use the values defined in the CSS section .aname ![]() Right click on any of the text: Inspect: find the line of interest (same line numbers as in Code view). The current inherited CSS properties are on the right. you can even do a temporary change or disable there |
|
![]() |
![]() |
![]() |
#5 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
But that isn't because they share the same stylesheet. It is because somewhere there is a style that tells the page content to use absolute positions (which is extremely unwise for basically this reason). Unfortunately the book is a PDF conversion so it is natural to expect it to need touching up... You should take a look at the styles as theducks says (but I don't think it is line-height or the unmerged pages would look the same ![]() As theducks says, CSS is pretty basic, all it does is moves the style="" attribute into its own section or file. I assume you have used html styles? The W3Schools tutorial might come in useful: http://www.w3schools.com/css/css_howto.asp Or Pablo's quick and dirty EPub_Tutorial ... If you hit a snag, remember you can paste your CSS here using the [CODE][/CODE] tags. (Rendered visible through the now-recursive power of [NOPARSE][/NOPARSE].) Last edited by eschwartz; 02-12-2016 at 12:50 PM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Member
![]() Posts: 14
Karma: 10
Join Date: Feb 2010
Device: none
|
I found the book in EPUB format, and i did some editing until it looked as I wanted to.
I will still take a look to all this CSS stuff as soon as I have some spare time. Thank you all for your help ![]() |
![]() |
![]() |
![]() |
#7 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
You're welcome.
![]() |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Converting Mobi or HTML file to Epub | Patuba | Sigil | 1 | 07-23-2011 04:14 PM |
Converting Mobi or HTML file to Epub | Patuba | ePub | 7 | 07-19-2011 12:11 PM |
After converting to html to epub, file won't open. | Cayo | Sigil | 14 | 06-19-2011 12:01 PM |
Converting multiple HTML files into one EPUB file | bigdukesix | ePub | 3 | 03-08-2011 12:12 PM |
Converting Merged HTML file to Epub/PDF Not Working | MV64 | Calibre | 1 | 06-07-2010 07:48 PM |