![]() |
#1 |
Series Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
How can I fix it when every line is a paragraph?
I have a book that for some reason has every line, regardless of punctuation, listed as a paragraph. (Please see image)
Is there a way, using the editor, that I can remove all of the paragraph html tags at once ? I'm practically reading the book as I'm trying to correct it (not an enjoyable experience) and it seems all of the books in this particular group were formated the same way. |
![]() |
![]() |
![]() |
#2 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,725
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Psst - its better to start your own thread rather than tack onto someone else's - then if they are different problems the different answers don't get confused BR |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Series Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
Quote:
|
|
![]() |
![]() |
![]() |
#4 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,725
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
![]() Someone can probably give you you a couple of regex's to fix the broken lines I'm more comfortable fixing things like that without the html markup - so I'd convert to formatted text and use an editor like Notepad++ (or the one I just found called Bowpad - Scintilla wrapped in a pretty ribbon) and then convert that back to EPUB. BR |
![]() |
![]() |
![]() |
#5 | ||
Series Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
Quote:
![]() Quote:
|
||
![]() |
![]() |
Advert | |
|
![]() |
#6 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,725
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
|
![]() |
![]() |
![]() |
#7 |
Series Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
First of all - " mufti " is a new term for me. I know them as "civies".
![]() Second of all, the "Umm.." Was I have no idea what the means (regex), but maybe I can figure it out; especially since I wouldn't know what to ask from here anyway. I was able to find "cleaner" versions of two of the books (maybe 3, I haven't checked the last one yet), but one of them has the same issue as my previous version. And I honestly don't feel like cleaning that mess up manually... Its too much. |
![]() |
![]() |
![]() |
#8 |
Member
![]() Posts: 23
Karma: 10
Join Date: Apr 2014
Location: Paris
Device: ipad 2, Ubuntu
|
First of all: remove the
Code:
</p> <p class="calibre2"> Code:
</p>\n+<p class="calibre2">(?=[a-z]) Last edited by dmonasse; 12-23-2014 at 08:20 AM. |
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
First, you want to get rid of the useless spaces before the closing "</p>"
Regex #1: Search: \s+</p> Replace: </p> Explanation: What this will do is look for "one or more spaces" + "</p>", and replace it with just "</p>". Example: Code:
<p>This is a sample line </p> Code:
<p>This is a sample line</p> Search: -</p>\s+<p> Replace: Explanation: What this will do is remove hyphens at the very end of the "paragraph", and combine it with the next line. Side Note: I use the above regex on a one-by-one, case-by-case basis, because many "soft hyphens" in the PDF aren't actually a part of the word. Example: Code:
<p>Blah blah blah govern-</p> <p>ment.</p> Code:
<p>Blah blah blah government.</p> Search: -</p>\s+<p> Replace: - Note: I don't use this one, although if there are TONS of hyphens at the end of each line, it might be best to do it this way, and take care of the hyphen situation on your own at a later step. I personally prefer to use the Spell Check Tool, and search for a single hyphen by itself: '-'. This will give you a list of every single word with a hyphen in it. Then I can check for + fix mistakes there much more quickly. Example: Code:
<p>Blah blah blah govern-</p> <p>ment.</p> Code:
<p>Blah blah blah govern-ment.</p> Search: ([^>”\?\!\.])</p>\s+<p> Replace: \1 Explanation: What this Regex will do, is search for a paragraph that DOES NOT end in a "greater than sign", "right double quote", "question mark", "exclamation point", or "period". It will then combine it with the next paragraph. Note: There is a space after the "\1". Example: Code:
<p>Susie said</p> <p>that she was going to jump over a tree.</p> <p>She also said,</p> <p>that this was just a sample.</p> Code:
<p>Susie said that she was going to jump over a tree.</p> <p>She also said, that this was just a sample.</p> Last edited by Tex2002ans; 12-23-2014 at 09:23 AM. |
![]() |
![]() |
![]() |
#10 |
Series Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
Based on what I've seen as I've tried to clean up, I would need Regex #3, however, when I add those search and replace terms I get:
Code:
Searching done: Replaced 0 occurrences of ([^>”\?\!\.])</p>\s+<p> Regex 1 ran without a problem, and I didn't change anything in the code (I just copied/pasted). |
![]() |
![]() |
![]() |
#11 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Each regex should be run, in order.
|
![]() |
![]() |
![]() |
#12 |
Series Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
|
![]() |
![]() |
![]() |
#13 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
#1 cleans up the ebook for #2 to run, same with #2 and #3.
If hyphens aren't a problem, you can probably skip it. The worst it can do is nothing. ![]() |
![]() |
![]() |
![]() |
#14 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Ooh, The Enchanted Forest Chronicles! I have all the pbooks, pity they were never released digitally.
![]() |
![]() |
![]() |
![]() |
#15 |
Series Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Indenting first line of paragraph | jimvde | Workshop | 3 | 07-22-2013 04:16 AM |
Indenting first line of each paragraph? | dandelioncottage | Sigil | 3 | 04-10-2012 07:08 AM |
Chapters are one giant paragraph. How to fix? | bfollowell | Conversion | 9 | 02-03-2011 01:20 PM |
First paragraph line indents | jhempel24 | Sigil | 10 | 11-23-2010 07:26 PM |
scanned PDF has weird paragraph breaks. Possible to fix | lunixer | 0 | 08-30-2010 10:47 PM |