MobileRead Forums - View Single Post - How can I fix it when every line is a paragraph?

Tex2002ans · 12-23-2014, 09:06 AM

First, you want to get rid of the useless spaces before the closing ""

Regex #1:

Search: \s+
Replace: 

Explanation: What this will do is look for "one or more spaces" + "", and replace it with just "".

Example:

Code:

<p>This is a sample line </p>

Code:

<p>This is a sample line</p>

Regex #2:

Search: -\s+
Replace:

Explanation: What this will do is remove hyphens at the very end of the "paragraph", and combine it with the next line.

Side Note: I use the above regex on a one-by-one, case-by-case basis, because many "soft hyphens" in the PDF aren't actually a part of the word.

Example:

Code:

<p>Blah blah blah govern-</p>
<p>ment.</p>

Code:

<p>Blah blah blah government.</p>

Regex #2 (Variant):

Search: -\s+
Replace: -

Note: I don't use this one, although if there are TONS of hyphens at the end of each line, it might be best to do it this way, and take care of the hyphen situation on your own at a later step. I personally prefer to use the Spell Check Tool, and search for a single hyphen by itself: '-'. This will give you a list of every single word with a hyphen in it. Then I can check for + fix mistakes there much more quickly.

Example:

Code:

<p>Blah blah blah govern-</p>
<p>ment.</p>

Code:

<p>Blah blah blah govern-ment.</p>

Regex #3:

Search: ([^>”\?\!\.])\s+
Replace: \1

Explanation: What this Regex will do, is search for a paragraph that DOES NOT end in a "greater than sign", "right double quote", "question mark", "exclamation point", or "period". It will then combine it with the next paragraph.

Note: There is a space after the "\1".

Example:

Code:

<p>Susie said</p>
<p>that she was going to jump over a tree.</p>
<p>She also said,</p>
<p>that this was just a sample.</p>

Code:

<p>Susie said that she was going to jump over a tree.</p>
<p>She also said, that this was just a sample.</p>

12-23-2014, 09:06 AM	#9
Tex2002ans Wizard Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook	First, you want to get rid of the useless spaces before the closing "</p>" Regex #1: Search: \s+</p> Replace: </p> Explanation: What this will do is look for "one or more spaces" + "</p>", and replace it with just "</p>". Example: Code: <p>This is a sample line </p> Code: <p>This is a sample line</p> Regex #2: Search: -</p>\s+<p> Replace: Explanation: What this will do is remove hyphens at the very end of the "paragraph", and combine it with the next line. Side Note: I use the above regex on a one-by-one, case-by-case basis, because many "soft hyphens" in the PDF aren't actually a part of the word. Example: Code: <p>Blah blah blah govern-</p> <p>ment.</p> Code: <p>Blah blah blah government.</p> Regex #2 (Variant): Search: -</p>\s+<p> Replace: - Note: I don't use this one, although if there are TONS of hyphens at the end of each line, it might be best to do it this way, and take care of the hyphen situation on your own at a later step. I personally prefer to use the Spell Check Tool, and search for a single hyphen by itself: '-'. This will give you a list of every single word with a hyphen in it. Then I can check for + fix mistakes there much more quickly. Example: Code: <p>Blah blah blah govern-</p> <p>ment.</p> Code: <p>Blah blah blah govern-ment.</p> Regex #3: Search: ([^>”\?\!\.])</p>\s+<p> Replace: \1 Explanation: What this Regex will do, is search for a paragraph that DOES NOT end in a "greater than sign", "right double quote", "question mark", "exclamation point", or "period". It will then combine it with the next paragraph. Note: There is a space after the "\1". Example: Code: <p>Susie said</p> <p>that she was going to jump over a tree.</p> <p>She also said,</p> <p>that this was just a sample.</p> Code: <p>Susie said that she was going to jump over a tree.</p> <p>She also said, that this was just a sample.</p> Last edited by Tex2002ans; 12-23-2014 at 09:23 AM.