View Single Post
Old 07-02-2013, 10:13 PM   #3
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
Posts: 13,566
Karma: 79436716
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
Quote:
Originally Posted by Fallingwater View Post
I'm converting a few epubs to txt with the intention of viewing them on a device that doesn't support any other format.

Problem is, Calibre removes all italics and bold, which makes it a lot harder to understand certain books.

I'm aware the txt format doesn't support anything other than plain text, so I've taken to converting to html instead and removing all the html tags using the search-replace feature until I'm left with a file composed only of text and the bold and italic tags. At that point I swap <i> and </i> with the slash character and <b> and </b> with two asterisks.

The net result:
"He said what?! How rude!"
Would be converted to:
"He said **what**?! How /rude/!".

This works, but the procedure to do the conversion is painfully slow, painstaking and prone to mistakes that can cause screwups in parts of the text I can't immediately see.

I'm looking for some form of automatic conversion that'll do all this from an epub without having to disassemble the html.
Have you tried changing conversion options for TXT output ? Try setting Formatting to either Markdown or Textile.

Markdown is described at http://en.wikipedia.org/wiki/Markdown and Textile at http://en.wikipedia.org/wiki/Textile_(markup_language)

As an example I tried converting a test document from Kovid for showing off the DOCX conversion.

Converting to Markdown gave me
Quote:
# Text Formatting

## Inline formatting

Here, we demonstrate various types of inline text formatting and the use of embedded fonts.

Here is some **bold, ***italic, ****bold-italic, ***underlined and struck out text. Then, we have a superscript and a subscript. Now we see some red, green and blue text. Some text with a yellow highlight. Some text in a box. Some text in inverse video.

A paragraph with styled text: *subtle emphasis *followed by **strong text **and ***intense emphasis***. This paragraph uses document wide styles for styling rather than inline text properties as demonstrated in the previous paragraph — calibre can handle both with equal ease.
and Textile
Quote:
h1=. Text Formatting

h2. Inline formatting

p. Here, we demonstrate various types of inline text formatting and the use of embedded fonts.

p. Here is some [*bold, *][_italic, [*bold-italic, *]_][+underlined +]and [-struck out -] text. Then, we have a super[^script^] and a sub[~script~]. Now we see some red, green and blue text. Some text with a yellow highlight. Some text in a box. Some text in inverse video.

p. A paragraph with styled text: [_subtle emphasis _]followed by [*strong text *]and _*intense emphasis*_. This paragraph uses document wide styles for styling rather than inline text properties as demonstrated in the previous paragraph — calibre can handle both with equal ease.
So as you can see, difference in formatting do come through into plain text.
PeterT is offline   Reply With Quote