Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 02-11-2016, 06:58 PM   #1
leito360
Member
leito360 began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2010
Device: none
Help converting file from HTML>EPub. File is divided in several pages I want to merge

Hello.

The problem is as follows:

I have a PDF book, I convert it from PDF to HTML using pdftotext (in this case pdftohtml).
The HTML files look good and everything, the PDF has been copied maintaining most of its format, the indentation is, even, intact.
The problem is that pdftohtml separated the book in 239 html files... a file per page.

I did a mild editing on the HTMLs deleting the page number at the bottom, and then I exported them to EPUB and later, to MOBI, all this with calibre. When I read the file on my Kindle, I noticed that the device respected the disposition of the text in HTML. For example, if page3.html has 5 lines, Kindle shows those lines and nothing else, when you pass to page4.html, it shows the lines contained inside the file, doesn't merge the lines in Page3 with the ones of Page4, it doesn't matter if they are from the same chapter.

I thought about opening every HTML and merge them in a single big DOC file while correcting all the strange page breaks, but I can't find a way to make Word or something similar to preserve all the indentation the book has, and that's my problem.

Just to be clear, I want to find a way to remove all the page breaks (Manually if necessary) while maintaining the format as clean as possible, especially the indentation, which is my biggest problem.
Is there a way to copy-paste text while keeping the original indentation? If I could do that, I would be able to merge the text of all 239 pages and then create a new ebook file.

Is there a program or way to do this?
leito360 is offline   Reply With Quote
Old 02-11-2016, 09:09 PM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Sure. Open the Editor (shortcut key is "T") and use the "Merge Selected Text Files" right-click option in the Files Browser.
eschwartz is offline   Reply With Quote
Advert
Old 02-12-2016, 03:38 AM   #3
leito360
Member
leito360 began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2010
Device: none
Quote:
Originally Posted by eschwartz View Post
Sure. Open the Editor (shortcut key is "T") and use the "Merge Selected Text Files" right-click option in the Files Browser.
There's a small issue... since the pages are all separated, and share the same style sheet, they also share the same position, so I merge them that way they overlay each other this way:

http://i.imgur.com/plxGDgQ.png

Any ideas?

I learned some HTML back in the day, but I never got to learn CSS, I just hit a wall there.
leito360 is offline   Reply With Quote
Old 02-12-2016, 09:59 AM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by leito360 View Post
There's a small issue... since the pages are all separated, and share the same style sheet, they also share the same position, so I merge them that way they overlay each other this way:

http://i.imgur.com/plxGDgQ.png

Any ideas?

I learned some HTML back in the day, but I never got to learn CSS, I just hit a wall there.
in the stylesheet:, look for
Code:
line-height: <ANY VALUE UNDER 1.2>;
1.2 Is 'Normal" spacing, 1.1 is close spacing, 1 will overlap and bigger is ... bigger spacing'.

There are reasons to use other than Normal. They are 'special cases' .. like where you have mixed sizes in a block and you want to tighten up to match the normal blocks.

CSS is really simple once you wrap your head around that it is just a way to remote the stuff you coded in line, ONCE
class="aname" just says 'use the values defined in the CSS section .aname

Look at the Property Inspector tool built into the editor preview.
Right click on any of the text: Inspect: find the line of interest (same line numbers as in Code view). The current inherited CSS properties are on the right. you can even do a temporary change or disable there
theducks is offline   Reply With Quote
Old 02-12-2016, 12:48 PM   #5
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by leito360 View Post
There's a small issue... since the pages are all separated, and share the same style sheet, they also share the same position, so I merge them that way they overlay each other this way:

http://i.imgur.com/plxGDgQ.png

Any ideas?

I learned some HTML back in the day, but I never got to learn CSS, I just hit a wall there.
Yikes, that looks horrible!

But that isn't because they share the same stylesheet.
It is because somewhere there is a style that tells the page content to use absolute positions (which is extremely unwise for basically this reason).

Unfortunately the book is a PDF conversion so it is natural to expect it to need touching up...
You should take a look at the styles as theducks says (but I don't think it is line-height or the unmerged pages would look the same ).
As theducks says, CSS is pretty basic, all it does is moves the style="" attribute into its own section or file. I assume you have used html styles?

The W3Schools tutorial might come in useful: http://www.w3schools.com/css/css_howto.asp
Or Pablo's quick and dirty EPub_Tutorial


...


If you hit a snag, remember you can paste your CSS here using the [CODE][/CODE] tags.

(Rendered visible through the now-recursive power of [NOPARSE][/NOPARSE].)

Last edited by eschwartz; 02-12-2016 at 12:50 PM.
eschwartz is offline   Reply With Quote
Advert
Old 02-19-2016, 07:41 AM   #6
leito360
Member
leito360 began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2010
Device: none
I found the book in EPUB format, and i did some editing until it looked as I wanted to.
I will still take a look to all this CSS stuff as soon as I have some spare time.

Thank you all for your help
leito360 is offline   Reply With Quote
Old 02-19-2016, 12:31 PM   #7
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
You're welcome.
eschwartz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting Mobi or HTML file to Epub Patuba Sigil 1 07-23-2011 04:14 PM
Converting Mobi or HTML file to Epub Patuba ePub 7 07-19-2011 12:11 PM
After converting to html to epub, file won't open. Cayo Sigil 14 06-19-2011 12:01 PM
Converting multiple HTML files into one EPUB file bigdukesix ePub 3 03-08-2011 12:12 PM
Converting Merged HTML file to Epub/PDF Not Working MV64 Calibre 1 06-07-2010 07:48 PM


All times are GMT -4. The time now is 01:42 PM.


MobileRead.com is a privately owned, operated and funded community.