Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 07-02-2013, 10:28 PM   #1
Fallingwater
Enthusiast
Fallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura about
 
Posts: 30
Karma: 4132
Join Date: Jun 2011
Device: Bookeen Cybook Opus
Epub -> txt with italic/bold characters

I'm converting a few epubs to txt with the intention of viewing them on a device that doesn't support any other format.

Problem is, Calibre removes all italics and bold, which makes it a lot harder to understand certain books.

I'm aware the txt format doesn't support anything other than plain text, so I've taken to converting to html instead and removing all the html tags using the search-replace feature until I'm left with a file composed only of text and the bold and italic tags. At that point I swap <i> and </i> with the slash character and <b> and </b> with two asterisks.

The net result:
"He said what?! How rude!"
Would be converted to:
"He said **what**?! How /rude/!".

This works, but the procedure to do the conversion is painfully slow, painstaking and prone to mistakes that can cause screwups in parts of the text I can't immediately see.

I'm looking for some form of automatic conversion that'll do all this from an epub without having to disassemble the html.
Fallingwater is offline   Reply With Quote
Old 07-02-2013, 10:45 PM   #2
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,059
Karma: 5939999
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
I have a feeling that you have the wrong extension set.

There is no Bold or Italics in a 'Text' file, just alpha numeric and punctuation.

What you might have is a Rich Text (RTF) document (If you see Bold in the editor, you might change the file type to RTF and see how calibre fairs)

Your second is a marked up text file (notice that it still follows the rule above, Calibre interprets the marks as it proceeds.

Last edited by theducks; 07-03-2013 at 09:57 AM. Reason: I was confused
theducks is online now   Reply With Quote
 
Advertisement
Old 07-02-2013, 11:13 PM   #3
PeterT
Taking a break; Fed up
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 7,183
Karma: 45264785
Join Date: Nov 2007
Location: Toronto
Device: Wife: Touch, Arc, Vox Me: Nexus 7, Glo
Quote:
Originally Posted by Fallingwater View Post
I'm converting a few epubs to txt with the intention of viewing them on a device that doesn't support any other format.

Problem is, Calibre removes all italics and bold, which makes it a lot harder to understand certain books.

I'm aware the txt format doesn't support anything other than plain text, so I've taken to converting to html instead and removing all the html tags using the search-replace feature until I'm left with a file composed only of text and the bold and italic tags. At that point I swap <i> and </i> with the slash character and <b> and </b> with two asterisks.

The net result:
"He said what?! How rude!"
Would be converted to:
"He said **what**?! How /rude/!".

This works, but the procedure to do the conversion is painfully slow, painstaking and prone to mistakes that can cause screwups in parts of the text I can't immediately see.

I'm looking for some form of automatic conversion that'll do all this from an epub without having to disassemble the html.
Have you tried changing conversion options for TXT output ? Try setting Formatting to either Markdown or Textile.

Markdown is described at http://en.wikipedia.org/wiki/Markdown and Textile at http://en.wikipedia.org/wiki/Textile_(markup_language)

As an example I tried converting a test document from Kovid for showing off the DOCX conversion.

Converting to Markdown gave me
Quote:
# Text Formatting

## Inline formatting

Here, we demonstrate various types of inline text formatting and the use of embedded fonts.

Here is some **bold, ***italic, ****bold-italic, ***underlined and struck out text. Then, we have a superscript and a subscript. Now we see some red, green and blue text. Some text with a yellow highlight. Some text in a box. Some text in inverse video.

A paragraph with styled text: *subtle emphasis *followed by **strong text **and ***intense emphasis***. This paragraph uses document wide styles for styling rather than inline text properties as demonstrated in the previous paragraph — calibre can handle both with equal ease.
and Textile
Quote:
h1=. Text Formatting

h2. Inline formatting

p. Here, we demonstrate various types of inline text formatting and the use of embedded fonts.

p. Here is some [*bold, *][_italic, [*bold-italic, *]_][+underlined +]and [-struck out -] text. Then, we have a super[^script^] and a sub[~script~]. Now we see some red, green and blue text. Some text with a yellow highlight. Some text in a box. Some text in inverse video.

p. A paragraph with styled text: [_subtle emphasis _]followed by [*strong text *]and _*intense emphasis*_. This paragraph uses document wide styles for styling rather than inline text properties as demonstrated in the previous paragraph — calibre can handle both with equal ease.
So as you can see, difference in formatting do come through into plain text.
PeterT is online now   Reply With Quote
Old 07-03-2013, 01:04 AM   #4
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,885
Karma: 12755553
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by theducks View Post
I have a feeling that you have the wrong extension set.
The OP is not the one confused.

He has an ePub that he is converting to txt to use on a device that only handles text. He still wants some indicators in the text so he can tell bold and italics since this emphasis is often quite necessary to understand what the author is conveying.

Quote:
Originally Posted by PeterT View Post
Have you tried changing conversion options for TXT output ? Try setting Formatting to either Markdown or Textile.
...~~~...
So as you can see, difference in formatting do come through into plain text.
Excellent explanation. This should suit the OP's needs.
DoctorOhh is offline   Reply With Quote
Old 07-03-2013, 09:57 AM   #5
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,059
Karma: 5939999
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Red face

Quote:
Originally Posted by DoctorOhh View Post
The OP is not the one confused.

He has an ePub that he is converting to txt to use on a device that only handles text. He still wants some indicators in the text so he can tell bold and italics since this emphasis is often quite necessary to understand what the author is conveying.



Excellent explanation. This should suit the OP's needs.
Right you are
theducks is online now   Reply With Quote
Old 07-03-2013, 10:54 AM   #6
Fallingwater
Enthusiast
Fallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura about
 
Posts: 30
Karma: 4132
Join Date: Jun 2011
Device: Bookeen Cybook Opus
Quote:
Originally Posted by PeterT View Post
Have you tried changing conversion options for TXT output ? Try setting Formatting to either Markdown or Textile.
Excellent! This is exactly what I needed. Many thanks!
Fallingwater is offline   Reply With Quote
Old 07-04-2013, 10:29 PM   #7
BillSmithBooks
Padawan Learner
BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.
 
BillSmithBooks's Avatar
 
Posts: 243
Karma: 1085815
Join Date: May 2009
Location: www.OutlawGalaxy.com, Foothills of NY's Adirondack mountains
Device: My PC...using Puppy Linux (FBReader, Calibre, Kindle Cloud Reader,
Another way to do this (without using Calibre) might be to convert Epub to HTML, then go into the HTML code and convert the <b>bold</b> to *bold* and <i>italic</> to _italic_ (Might also be emphasis and another code word for bold and italic).

Then go back into the regular browser, highlight all of the text, copy and paste into a txt browser editor.

Don't know if this helps but I think it would be a pretty foolproof way to do things.
BillSmithBooks is offline   Reply With Quote
Old 07-07-2013, 04:47 PM   #8
Fallingwater
Enthusiast
Fallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura aboutFallingwater has a spectacular aura about
 
Posts: 30
Karma: 4132
Join Date: Jun 2011
Device: Bookeen Cybook Opus
Quote:
Originally Posted by BillSmithBooks View Post
Another way to do this (without using Calibre) might be to convert Epub to HTML, then go into the HTML code and convert the <b>bold</b> to *bold* and <i>italic</> to _italic_ (Might also be emphasis and another code word for bold and italic).

Then go back into the regular browser, highlight all of the text, copy and paste into a txt browser editor.

Don't know if this helps but I think it would be a pretty foolproof way to do things.
It isn't that easy. The HTML conversion doesn't just use tags like that; it generates and refers to a style.css file, so that every change in the text is made up of one to several span tags. Stripping them out one by one feels like stripping the layers off an onion, and is not always doable either - how do you differentiate /span end-tags between the various span types? If I remove all </span> instances I have no way of knowing where the italic text ends.

I've manually converted one book like that because it was slightly clearer and I eventually got it right, but the one I wanted to convert next was essentially unfeasible.

I dunno, it might be possible to do something with regular expressions, but honestly life's too short.
Fallingwater is offline   Reply With Quote
Old 07-10-2013, 10:01 PM   #9
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,465
Karma: 986493
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Fallingwater View Post
It isn't that easy. The HTML conversion doesn't just use tags like that; it generates and refers to a style.css file...
HTMLZ output has an option to convert styles to tags (How to handle CSS = tag). A lot of styling will be lost using this option because only a very small subset of styles can be represented as tags. However, in this sort of situation it's not that big a deal.
user_none is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
The bold and italic buttons on the bar Artha Sigil 15 12-04-2011 05:52 PM
italic and bold ok in sigil but not on Kobo reader Mookiemon Sigil 14 07-23-2011 10:50 PM
txt to epub tilde/special characters Fuzzy Dustmite Conversion 1 04-11-2011 10:54 PM
italic, bold etc to normal cybmole Sigil 11 03-04-2011 11:37 AM
PRS-500 Tags for Bold, Italic, Center, Etc. in LRF? EatingPie Sony Reader Dev Corner 9 04-07-2007 02:06 AM


All times are GMT -4. The time now is 11:12 PM.


MobileRead.com is a privately owned, operated and funded community.