Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 07-23-2014, 07:22 PM   #1
xanguera
Junior Member
xanguera began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jul 2014
Device: ipad3
conversion from docx to epub seems to break my paragraphs

Hi,
I have been using ebook-convert to go from a docx file to an epub file.
Analyzing the resulting xhtml in the epub I see that paragraphs are split into <p></p> lines (see below an example). This causes a problem as each line behaves independently when changing font size in my epub reader.
I tried saving the docx file using LibreOffice (Ubuntu) and Word (OSX) but the same behavior remains.
Oprimally I would like to have each paragraph contained in a single line (i.e. a single <p> </p>) so that font "reflow" would later work.
Is this a problem of the docx format or is there a workaround possible? I thought of applying some regexp to the xhtml to eliminate consecutive <p> lines, but it is not so simple, as there are cases where in the original document the lines are meant to appear in different lines (e.g. in poetry).

An example of what my xhtml looks like is:

<p class="calibre1">In the bosom of one of those spacious coves which indent the eastern</p>
<p class="calibre1">shore of the Hudson, at that broad expansion of the river denominated</p>
<p class="calibre1">by the ancient Dutch navigators the Tappan Zee, and where they always</p>
<p class="calibre1">prudently shortened sail and implored the protection of St. Nicholas</p>
<p class="calibre1">when they crossed, there lies a small market town or rural port, which</p>
<p class="calibre1">by some is called Greensburgh, but which is more generally and properly</p>
<p class="calibre1">known by the name of Tarry Town. This name was given, we are told, in</p>
<p class="calibre1">former days, by the good housewives of the adjacent country, from the</p>
<p class="calibre1">inveterate propensity of their husbands to linger about the village</p>
<p class="calibre1">tavern on market days. Be that as it may, I do not vouch for the fact,</p>
<p class="calibre1">but merely advert to it, for the sake of being precise and authentic.</p>
<p class="calibre1">Not far from this village, perhaps about two miles, there is a little</p>
<p class="calibre1">valley or rather lap of land among high hills, which is one of the</p>
<p class="calibre1">quietest places in the whole world. A small brook glides through it,</p>
<p class="calibre1">with just murmur enough to lull one to repose; and the occasional</p>
<p class="calibre1">whistle of a quail or tapping of a woodpecker is almost the only sound</p>
<p class="calibre1">that ever breaks in upon the uniform tranquillity.</p>
<p class="block">*</p>
<p class="calibre1">I recollect that, when a stripling, my first exploit in</p>
<p class="calibre1">squirrel-shooting was in a grove of tall walnut-trees that shades one</p>
<p class="calibre1">side of the valley. I had wandered into it at noontime, when all nature</p>
<p class="calibre1">is peculiarly quiet, and was startled by the roar of my own gun, as it</p>
<p class="calibre1">broke the Sabbath stillness around and was prolonged and reverberated</p>
<p class="calibre1">by the angry echoes. If ever I should wish for a retreat whither I might</p>
<p class="calibre1">steal from the world and its distractions, and dream quietly away the</p>
<p class="calibre1">remnant of a troubled life, I know of none more promising than this</p>
<p class="calibre1">little valley.</p>

Last edited by xanguera; 07-23-2014 at 07:51 PM.
xanguera is offline   Reply With Quote
Old 07-23-2014, 07:54 PM   #2
xanguera
Junior Member
xanguera began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jul 2014
Device: ipad3
Hi,
I have found the answer to my own question.
It appears that my problem was earlier in my pipeline than I thought. I am starting from a TXT file that I convert to a docx before attempting docx->epub with calibre. It appears that neither Word nor LibreOffice handled well the txt->docx conversion, i.e. they introduced line breaks after each line in the text. Then, of course, calibre kept these line breaks by adding multiple <p> elements.
xanguera is offline   Reply With Quote
Old 07-24-2014, 12:28 AM   #3
eschwartz
Irrational Optimist
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
Posts: 6,683
Karma: 11160789
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
calibre (and others) can convert TXT as Markdown, which assumes paragraphs cross multiple lines and are separated by blank lines.
eschwartz is offline   Reply With Quote
Reply

Tags
calibre, docx, ebook-convert, epub

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
DOCX conversion of images to EPUB/Mobi: preserve aspect ratio? tbrosz Conversion 3 03-31-2014 01:56 PM
docx to epub conversion working from command-line but not from PHP velsankar ePub 1 03-24-2014 09:36 PM
Conversion of Endnotes .docx to .epub profjones Conversion 1 11-01-2013 08:05 AM
Docx to Epub conversion error with 1.5 dapjukebox Calibre 6 10-03-2013 08:18 AM
Horizontal lines in DOCX to EPUB conversion. StevieP Conversion 13 07-05-2013 03:14 AM


All times are GMT -4. The time now is 08:46 AM.


MobileRead.com is a privately owned, operated and funded community.