![]() |
#1 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Nov 2024
Device: Kindle
|
removing the paragraphs tags if paragraph starts with lower case
Hi everyone. I have an old book a friend gave me and the paragraphs are all messed up. I'm trying to clean it up and it would be amazing if there was a function like "merge with the upper paragraph if the current paragraph starts with a low-case word". Is there something like this?
Is there otherwise a way to change directly in the preview without going into the html editor? It's easier to remove a the space between paragraphs than tags... thank you for your help |
![]() |
![]() |
![]() |
#2 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
|
Code:
Find : </p>\s*<p>(\p{Ll}.*?</p>) Replace : \x20\1 Mode : Regex with "dot all" and "case sensitive" checked (\x20 is a space) Code:
</p>\s*<p>(\s?\p{Ll}.*?</p>) Last edited by lomkiri; 11-02-2024 at 04:28 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Perfectionist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 72
Karma: 12802
Join Date: Apr 2014
Device: none
|
|
![]() |
![]() |
![]() |
#4 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
|
Quote:
This one will capture a <p> with classes : Code:
</p>\s*<p[^>]*>(\p{Ll}.*?</p>) Last edited by lomkiri; 11-02-2024 at 01:27 PM. |
|
![]() |
![]() |
![]() |
#5 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
|
I realized that if there was a succession of several paragraphs all beginning with a lowercase letter, my regex will capture only one every two, because the pointer will stop after the </p>, so the regex won't target the next paragraph, but will go on and find only the second next one, leaving one unchanged. It would be then necessary to make various passages to target all of them in the sequence (not a big deal, but unesthetic).
This can easily be resolved if we don't capture the last </p>, but use a positive lookahead (for </p>) instead, so the pointer will stop before the </p>, and the regex is ready to capture the next paragraph if it is a candidate. With this regex, all paragraphs will be targeted during the first passage : Code:
</p>\s*<p[^>]*>(\p{Ll}.*?)(?=</p>) Code:
</p>\s*<p[^>]*>(\s?\p{Ll}.*?)(?=</p>) (\x20 is a space) Last edited by lomkiri; 11-02-2024 at 04:26 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Perfectionist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 72
Karma: 12802
Join Date: Apr 2014
Device: none
|
Thanks, lomkiri!
|
![]() |
![]() |
![]() |
#7 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Hey, welcome to MobileRead!
![]() Quote:
and in this thread, specifically, I showed the exact 3 regex I've been using for 12+ years: - - - If you want a more "GUI-friendly way" of doing things—and you're more familiar with Word or LibreOffice—I described a lot of this stuff in: Of course, using Sigil or Calibre and fixing it directly in the code is the best + SUPER quick (if you use those 3 regexes, that'll take care of 99% cases in a single shot!). Trying to accomplish the same thing in Word/LibreOffice is clunky/limiting, and would take a lot more work. Last edited by Tex2002ans; 11-07-2024 at 12:19 PM. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Shortcut to execute title case, lower case, etc. | birkmaggs | Library Management | 2 | 10-28-2018 11:42 PM |
Assigning paragraph class to multiple paragraphs | Leonatus | Sigil | 21 | 08-07-2013 03:29 PM |
Removing spaces between paragraphs | Skydog | Calibre | 12 | 02-20-2013 08:52 PM |
Paragraph indent-size should not applied to centered paragraphs? | ShellShock | Calibre | 3 | 01-16-2010 11:54 AM |
Why are Tags all forced to lower case | =X= | Calibre | 2 | 09-19-2008 02:08 PM |