View Single Post
Old 08-19-2010, 05:43 PM   #9
Wintersdark
Junior Member
Wintersdark began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Nov 2009
Device: iPhone 3G
Quote:
Originally Posted by kovidgoyal View Post
Doing that will mean he'll lose all character formatting (italics, bold, etc). IIRC the TXT output plugin doesn't preserve those.
This isn't a problem. While character formatting is nice, having readable text at all is nicer.

You're right, it's not all .lit -> .epub, I'd (incorrectly) assumed that as several books I checked all suffered the same problem. Further investigation shows that's not the case - good news!

I tried converting to text and back, but the way it's formatted I basically get each paragraph followed by a pair of CR/LF's. So, converting directly back to epub doesn't help.

However, as it's not every book, I'm just addressing it on a case by case basis with Notepad++ as I go. If I were still running linux, I'd mass convert them all to text and figure out how to script applying the regex replace to them, but I have no idea of how to go about that in windows.

Unfortunately with Notepad++ you cannot use \r\n in regex expressions (who knows why), but you *can* replace (with "extended" searches) all the CRLF pairs with a unique identifier (I used QQQQ) then simply replace all .QQQQ and "QQQQ with \r\n\r\n, then all remaining QQQQ's with spaces. It's sort of a pain in the ass to have to do it one at a time, but it works at least.

If anyone knows a better tool to do this with - one that can macro the operations; or apply a regex directly, or better yet be applied in bulk, in windows, with a minimum of hassle for one not used to dealing with these things, I'd love to hear about it. But, even if not, this does work.


As a feature request for Calibre I'd definitely like to see, for this and other formatting issues, the ability to apply a regex directly in the conversion options (or some such easily accessible place). It would really help people cleaning up poor source material when converting to their ereader format of choice.
Wintersdark is offline   Reply With Quote