MobileRead Forums - View Single Post - Is there any way to remove space between paragraphs?

thrawn_aj · 11-02-2010, 10:44 PM

Quote:

Originally Posted by kabloooie

I have text and lit files that always come out with spaces between paragraphs instead of indentations.

There's a regular expression way to do it but I don't know how advanced the regex system in Calibre is or even if it can be co-opted by the user to edit the actual contents of the file.

In notepad++ (or any text editor that supports regex, with minor syntax mods) for instance, I would convert all linefeeds (\r\n usually) to some obscure character string that doesn't appear in your file (say, ###) using the extended mode search and replace. Note: if you have a multiline regex tool (I'm too lazy to use mine

and npp is just too convenient in other ways) you could search for the double linefeeds directly and replace them with paragraph breaks and indents.

Then, using its native regex, search for something like ######([^#]+)###### (since there will be 2 linefeeds between paragraphs - and you don't want that) and replace it with ###\t\1###. Then back to extended mode and replace all ### with \r\n.

This is probably overkill for what you're asking but I think it's useful for other (similar) functions like wrapping <p> tags around paragraphs and other html manipulations. Cleaned up a bunch of OCR'd stuff last weekend using notepad++

.

By the way, I've noticed that the result is always more WYSIWYG if you focus your attention on a simply coded (clean that is) html file and then use that as the master format for converting to anything else (adding an html file to a book record saves it as zip). TOC creation and chapter creation is also much more transparent this way

.