![]() |
#1 |
Member
![]() Posts: 19
Karma: 10
Join Date: Apr 2015
Device: Kobo Aura H2O 2ed
|
![]()
Can you suggest a regular expression for truncating paragraphs longer than 1000 characters? it doesn't matter if blank spaces are included in calculation or not.
thanks! ![]() |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
If the text is wrapped in paragraph tags, the following quick & dirty regex should work:
Search for: Code:
<p>(.{999}).*?</p> Replace with: Code:
<p>\1…</p> |
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,633
Karma: 8566337
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
|
Quote:
Find: (.{999})([^ ]+)(\s)(.) Replace: \1\2</p>\n\n<p>\4 The above will find any string of 1000 chars and a few chars more until to find an space (because I suppose you won't want to split the paragraph in the middle of a word). But first you must select the text where you want to do the S&R (otherwise the regex will work also in the header of the .xhtml file) or don't select the "wrap" option, set the pointer after the body tag and do the S&R in the current file. EDIT: If you want to find any string of 1000 chars plus a few more chars until to find a ". " (that would mean the end of a sentence) then use the following: Find: (.{999})([^ ]+)(\.\s)(.) Replace: \1\2.</p>\n\n<p>\4 Last edited by RbnJrg; 02-03-2020 at 09:34 AM. |
|
![]() |
![]() |
![]() |
#4 | |
Member
![]() Posts: 19
Karma: 10
Join Date: Apr 2015
Device: Kobo Aura H2O 2ed
|
Quote:
![]() Thank you very much. ![]() |
|
![]() |
![]() |
![]() |
#5 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,012
Karma: 144284074
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Why do you want to break up long paragraphs? That spoils the book. Makes it look like a bad PDF conversion.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,633
Karma: 8566337
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
|
|
![]() |
![]() |
![]() |
#7 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,344
Karma: 203719646
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
I used something similar to find very long quotations within paragraphs.
The author didn't use many blockquotes, so I looked for an opening quote + X many characters (400-800+) until a closing quote: Search: “([^”<]{800,})” Replace: </p> <blockquote><p>\1</p></blockquote> <p> Then I was able to easily replace: Code:
<p>Paragraph with “a super duper [...] long quotation” in the middle.</p> Code:
<p>Paragraph with</p> <blockquote><p>a super duper [...] long quotation</p></blockquote> <p>in the middle.</p> |
![]() |
![]() |
![]() |
#9 |
mostly an observer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,518
Karma: 987654
Join Date: Dec 2012
Device: Kindle
|
Soon after I began publishing through Amazon's DTP/KDP in November 2007, I realized that paragraphs in an ebook had to be much shorter than those in a print edition. So I began to use my right-hand pinkie much more industriously, ensuring that the paragraphs occurred at least once on every digital "page". Pippo won't have to use regex on me!
|
![]() |
![]() |
![]() |
#10 |
Not Quite Dead
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 195
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
|
@RbnJrg
Very nice regex. A keeper. I tested it in Calibre with 300 chars instead of 999 and the result looked much nicer than I had expected. I expected all the paragraphs to appear too similar in length, which is not cool. However, (esp. with "Dot All") the number of screen lines per paragraph varied in a nice way—I am not sure why. I encounter lots of books (history and the sciences) by learned people who do not believe in paragraphs, among their many e-book formatting crimes. Your regex will be added to my clean-up stack where breaking up a mass of text has benefits even tho some para breaks may not be precisely correct in terms of conventions. Last edited by Brett Merkey; 02-03-2020 at 07:30 PM. |
![]() |
![]() |
![]() |
#11 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,633
Karma: 8566337
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
|
Thank you, glad that the code was able to help you too.
Quote:
And also affect the output the place where the previous split was done. Suppose the case when after splitting, the ammount of letters before reaching the next </p> is 150. Then the next paragraph would splitted at with 150 chars (more or less) instead of 300. For that reason the paragraphs don't have all the same lenght. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre truncates long names on Win10 with long paths enabled. | maximx86 | Calibre | 14 | 01-24-2024 05:54 AM |
PRS-T1 Long paragraphs cause unwanted page breaks on PRS-T1 | entodoays | Sony Reader | 2 | 03-11-2014 06:21 AM |
Arbitrary breaks in long paragraphs | vampiregrave | ePub | 54 | 10-26-2013 11:42 AM |
Touch Long paragraphs and footnotes | AlexBell | Kobo Reader | 5 | 08-23-2013 07:31 AM |
Calibre taking a long, long time to update metadata on sony prs650 | hydin | Calibre | 5 | 06-05-2012 12:21 AM |