Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 07-24-2014, 01:23 PM   #1
xanguera
Member
xanguera began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jul 2014
Device: ipad3
split docx into multiple xhtml files

Hi,
I use ebooks-convert. I have managed to split a docx into different xhtml pages when it detects chapters and at page breaks. I would appreciate to be able to split also at the end of each docx page, so that each page of the docx (as seen on my word processor) would become an individual xhtml file.
Is this possible? I have seen the option --page-breaks-before but I have no clue how to use it.

Thanks,

X.
xanguera is offline   Reply With Quote
Old 07-24-2014, 04:58 PM   #2
JLius
Village idiot
JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.
 
JLius's Avatar
 
Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
That's easy when you use calibre's book editor. I don't know ebooks-convert.
In the editor, you can split the ebook anywhere you like, using the preview screen.

Normally, the editor will automatically detect where to split the files while importing a docx file, but it needs indicators (like a new chapter). But it's easy enough to do it manually, when the auto-detect is not to your liking.
JLius is offline   Reply With Quote
Old 07-24-2014, 06:00 PM   #3
xanguera
Member
xanguera began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jul 2014
Device: ipad3
Thanks JLius for your answer. Unfortunately I am trying to make it as automatic as possible, therefore using the editor is not an option for me.
xanguera is offline   Reply With Quote
Old 07-24-2014, 09:28 PM   #4
dickloraine
Guru
dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.
 
Posts: 631
Karma: 7544080
Join Date: Apr 2013
Location: Berlin
Device: PRS 350, Kobo Aura
As far as I know there is no easy automatic solution. Maybe you could take a look at the raw xml of the docx. With luck automatic page breaks are marked in there and not just made on the fly. Then you could insert some text before it and replace that in the epub with page breaks.

But if the question is allowed, why would you do that? You lose all the good things reflow and epub offers. You could just save your docx as a pdf. Then you have your pages and you would not lose much, because your epubs would be quite similar.
dickloraine is offline   Reply With Quote
Old 07-25-2014, 05:31 AM   #5
JLius
Village idiot
JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.JLius ought to be getting tired of karma fortunes by now.
 
JLius's Avatar
 
Posts: 157
Karma: 519566
Join Date: Mar 2014
Location: Belgium
Device: sony PRS T-1
Yes, I don't understand why xanguera would want that either. The end of each docx page is quite arbitrary, the epub would look like quite a mess I think, it would start a new page mid sentence etc.
JLius is offline   Reply With Quote
Old 07-25-2014, 05:45 AM   #6
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,576
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@JLuis - if was some sort of picture book or a how-to book with 1 instruction per page (eg recipes in a cook book) then that might be a reason to want each page in a separate file - think pages in a lever arch binder. And page reuse in different books could be another reason.

BR
BetterRed is online now   Reply With Quote
Old 07-25-2014, 08:58 AM   #7
dickloraine
Guru
dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.
 
Posts: 631
Karma: 7544080
Join Date: Apr 2013
Location: Berlin
Device: PRS 350, Kobo Aura
Quote:
Originally Posted by BetterRed View Post
@JLuis - if was some sort of picture book or a how-to book with 1 instruction per page (eg recipes in a cook book) then that might be a reason to want each page in a separate file - think pages in a lever arch binder.

BR
But than you would have manual page breaks (at least you should, unless the author just pressed enter till a new page - the horror). Manual page breaks should be easy to handle. You can even make a search and replace in word for them. Difficulties only arise with automatic page breaks.
dickloraine is offline   Reply With Quote
Old 07-25-2014, 09:54 AM   #8
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
The Word docx itself does not have any code telling it to end a page -- that is a purely transitory arrangement caused by the movement of your scroll wheel.

If any of your docx pages actually end, it is because you inserted a manual break -- In which case I think calibre already splits on it???
eschwartz is offline   Reply With Quote
Old 07-25-2014, 10:41 AM   #9
dickloraine
Guru
dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.
 
Posts: 631
Karma: 7544080
Join Date: Apr 2013
Location: Berlin
Device: PRS 350, Kobo Aura
Okay, out of curiosity I looked into the docx source xml. Automatic page-breaks are stated there.
<w:lastRenderedPageBreak/>
is the term used. To get to the source, you have to unzip the docx and go to the "word" folder inside it. There should be a document.xml. You could open it with something like Notepad++ and do a search and replace to insert some unique text before each page-break and the do a search and replace for this text in your epub. Be aware of course, that the pagebreaks in this new docx are on different positions (of course), but that is unimportant for the epub. Just be sure to work with a copy, not the original file.

Much hassle for something I still don't realy see the point, but it is possible after all
dickloraine is offline   Reply With Quote
Old 07-25-2014, 12:21 PM   #10
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,576
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by dickloraine View Post
But than you would have manual page breaks (at least you should, unless the author just pressed enter till a new page - the horror). Manual page breaks should be easy to handle. You can even make a search and replace in word for them. Difficulties only arise with automatic page breaks.
My bad - I did not realise that calibre offers an option in EPUB Output to create a separate xhtml file at a page break. I thought it was just on chapters in Structure Detection.

The default seems to be new file on page break which is what the OP wants, but that would be for manual page breaks not automatic page breaks inferred by the size of the paper and margins, fonts etc.

BR
BetterRed is online now   Reply With Quote
Old 07-25-2014, 03:20 PM   #11
faltradl
Guru
faltradl ought to be getting tired of karma fortunes by now.faltradl ought to be getting tired of karma fortunes by now.faltradl ought to be getting tired of karma fortunes by now.faltradl ought to be getting tired of karma fortunes by now.faltradl ought to be getting tired of karma fortunes by now.faltradl ought to be getting tired of karma fortunes by now.faltradl ought to be getting tired of karma fortunes by now.faltradl ought to be getting tired of karma fortunes by now.faltradl ought to be getting tired of karma fortunes by now.faltradl ought to be getting tired of karma fortunes by now.faltradl ought to be getting tired of karma fortunes by now.
 
Posts: 602
Karma: 1712372
Join Date: Feb 2013
Location: germany
Device: PocketBook Touch
Why are you thoinking about this? The TO isn't any more in the discussion.
faltradl is offline   Reply With Quote
Old 07-25-2014, 09:05 PM   #12
dickloraine
Guru
dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.
 
Posts: 631
Karma: 7544080
Join Date: Apr 2013
Location: Berlin
Device: PRS 350, Kobo Aura
Quote:
Originally Posted by faltradl View Post
Why are you thoinking about this? The TO isn't any more in the discussion.
Why do you think about why someone else shouldn't think about something?

More seriously : this topic is only a day old. Not everyone checks a topic every few hours. And maybe (okay the chance is slim but there) somebody else searchs sometime about this. Now there is an answer.
dickloraine is offline   Reply With Quote
Old 08-01-2014, 06:14 AM   #13
xanguera
Member
xanguera began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jul 2014
Device: ipad3
Hi. I am back
To answer @faltradl question on why I should want to do this. I am in need to create a fixed-layout epub from a long text document. Splitting the docx (taking advantage of the pagination given my the text editor) seemed to be a good option.

Thanks @dickloraine for finding that I could use the <w:lastRenderedPageBreak/> in docx to split the pages. Still, as mentioned above, in some cases this tag is not inserted (e.g. when the user changes page hitting enter).

Is there any more robust way that you could imagine to transfer the pagination information (well, I just need to know the places where the page changes) in an automatic way? Note that for it being automatic is a hard requirement, as I do not want to have to open every one of my docx files just to manually insert page breaks.

Similarly, would calibre be able to find and process tags like <w:lastRenderedPageBreak/> through xpath expressions (so that I do not need to edit the docx file externally)?

Thanks.
xanguera is offline   Reply With Quote
Old 08-01-2014, 06:35 AM   #14
dickloraine
Guru
dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.
 
Posts: 631
Karma: 7544080
Join Date: Apr 2013
Location: Berlin
Device: PRS 350, Kobo Aura
No, that tag is always in the docx, even if the user pressed enter. So it is robust. But unfortunately you have to do it in the docx, since calibre uses xpath on the intermediate html it creates (at least I think it does this).
But I would advise against a fixed layout, if there are not really very important reasons for it. And even then, why not just a pdf? Or do fixed layout epubs have an advantage above a pdf? As far as I know fixed layout epubs are epub 3. Nearly no reader supports that, but most support pdf.

Last edited by dickloraine; 08-01-2014 at 06:38 AM.
dickloraine is offline   Reply With Quote
Old 08-01-2014, 07:09 AM   #15
xanguera
Member
xanguera began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jul 2014
Device: ipad3
Well, I have tested myself and it is not so robust. Whenever there is a paragraph that ends just before the end of the page and the next paragraph is already in the next page, there is no rendering information available.
As per your question. I am working with media overlays (read aloud ebooks) which in Ibooks are only supported when the epub is in fixed-layout format. I hope that Apple will change at some point, but for now I need to abide to this.
xanguera is offline   Reply With Quote
Reply

Tags
docx, epub3, split


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Error: Cannot split: ......xhtml XML is not well formed Alt68er Sigil 2 04-23-2014 03:00 AM
How can I convert topaz ebook from multiple xhtml's (SVG) to single pdf? rglk Workshop 3 11-28-2011 04:33 PM
Converting multiple text files to xhtml? Spotnik Sigil 19 04-12-2011 10:37 PM
Merge multiple XHTML files at once gmw Sigil 1 12-28-2010 02:35 AM
multiple xhtml's to pdf monkeyman224 Amazon Kindle 3 10-16-2010 02:39 AM


All times are GMT -4. The time now is 05:59 PM.


MobileRead.com is a privately owned, operated and funded community.