06-29-2015, 01:55 AM | #1 |
Zealot
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
|
</p> at beginning of paragraph - how do I change?
Greetings,
I have a file I'm using as a test bed to learn regex. It is a badly converted PDF to epub. The original file had the actual body of the text under the misc folder within the epub. I didn't know how to fix that so I converted to html, then back to epub. That seems to have fixed the problem of the text body now being where it should be. However...the code for the paragraphs is... Code:
<p class="calibre2"></p>The paragraph goes in here. I have learned a lot about how to tweak things using regex on this file, but this is somewhat beyond me. Appreciate the assistance. Update: I found out I could use .* and some variations to achieve a result. Got all the end tags where they belong now. Last edited by Chris_Snow; 06-29-2015 at 02:18 AM. |
06-29-2015, 02:24 AM | #2 |
Unicycle Daredevil
Posts: 13,926
Karma: 185041098
Join Date: Jan 2011
Location: Planet of the Pudding Brains
Device: Aura HD (R.I.P. After six years the USB socket died.) tolino shine 3
|
The experts will certainly have more interesting solutions, but I just tested this and it works:
Search: Code:
</p>(.*?) <(.*?)> Code:
\1</p> <\2> |
Advert | |
|
06-29-2015, 02:57 AM | #3 | |
Banned
Posts: 272
Karma: 1224588
Join Date: Sep 2014
Device: Sony PRS 650
|
Quote:
My approach would be to delete all </p> and let tiny do the rest. |
|
06-29-2015, 06:03 AM | #4 |
mostly an observer
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
|
|
06-29-2015, 07:02 AM | #5 |
Grand Sorcerer
Posts: 27,590
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
Advert | |
|
06-29-2015, 10:25 AM | #6 |
Banned
Posts: 272
Karma: 1224588
Join Date: Sep 2014
Device: Sony PRS 650
|
|
06-29-2015, 10:29 AM | #7 |
Wizard
Posts: 1,090
Karma: 6058305
Join Date: Sep 2010
Location: UK
Device: Kindle Paperwhite
|
I don't claim to be great at regular expressions, but I think this should work, and preserve any class/style attributes in the paragraph:
Find: Code:
(<p[^>]+>)<\/p>(.*) Code:
\1\2</p> |
06-29-2015, 10:37 AM | #8 |
Banned
Posts: 272
Karma: 1224588
Join Date: Sep 2014
Device: Sony PRS 650
|
I would use * instead of + to catch tags without attributes.
|
06-29-2015, 01:57 PM | #9 |
Wizard
Posts: 1,090
Karma: 6058305
Join Date: Sep 2010
Location: UK
Device: Kindle Paperwhite
|
Good point. I admit I didn't have time to test it much.
|
06-29-2015, 04:09 PM | #10 |
Zealot
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
|
Thx very muchly for all the pointers. You are right, in that my small regex didn't pick up all the paragraph instances (endings with question marks etc) - but surprisingly there were very few and I figured how to mod the regex to pick up a question mark. I seem to be able to sort out small changes but have a lot of trouble trying to get one regex to pick up everything
I'll trial the regexes here and see what the results are. Thx again. |
06-29-2015, 04:12 PM | #11 | |
Zealot
Posts: 148
Karma: 8170
Join Date: Jul 2013
Device: kobo glo
|
Quote:
Update: Yep...found that it does (well at least in small doses) Last edited by Chris_Snow; 06-29-2015 at 08:49 PM. |
|
06-30-2015, 12:55 AM | #12 | |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
Code:
(<p(?: [^>]+)?>)</p>((?:(?!</?p>).)+)
|
|
06-30-2015, 11:47 PM | #13 | |
Bookmaker & Cat Slave
Posts: 11,463
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Just sayin'! All these years, and Regex Buddy is still my closest, well...buddy. Hitch |
|
06-30-2015, 11:49 PM | #14 |
Well trained by Cats
Posts: 29,936
Karma: 55705602
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
07-01-2015, 01:48 AM | #15 | |
Wizard
Posts: 1,090
Karma: 6058305
Join Date: Sep 2010
Location: UK
Device: Kindle Paperwhite
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem: Merge two ebooks paragraph by paragraph... | akayacik80 | Workshop | 5 | 09-23-2014 09:05 AM |
How to Change Paragraph Indentation | Acharn | ePub | 5 | 01-31-2013 12:16 AM |
Could this be the Beginning? | kennyc | Lounge | 12 | 01-24-2013 03:59 PM |
Preference: Paragraph indent or a little paragraph spacing? | 1611mac | General Discussions | 48 | 11-11-2011 12:43 AM |
From the beginning ........ | Aspic8 | Writers' Corner | 15 | 10-10-2011 11:05 AM |