03-20-2013, 07:21 AM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: Feb 2013
Device: kobo touch
|
Regex Problem / Line that does't end with .</p>
Hello,
I am tryimg to match a string (with regex and sigil) that doesn't ends with .</p> Tried 1h and googeling. Thanks Martin |
03-20-2013, 02:00 PM | #2 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
|
03-20-2013, 08:31 PM | #3 |
Well trained by Cats
Posts: 29,802
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Sigil 7 has a saved search: Join paragraphs that look for 'paragraphs that end with a letter or comma (not a period).
Does not find all the other sometimes valid cases. For those, you need to carefully craft your Search and step through, skipping (find) or replace+find Never Replace All with the others |
03-24-2013, 02:29 PM | #4 |
Junior Member
Posts: 4
Karma: 10
Join Date: Feb 2013
Device: kobo touch
|
sry for late reply!
I am usig sigil 0.6.2 cause sigil 0.7.1 will start for a few counts, and then refuses to start(When starting it nothing happens, found a lot of stuff for linux but nothing for windows) maybe someone know where the problem is. back to the topic if have paragraphs like <p>blah some words </p> <p>next words and so on.</p> and i want <p>blah some words next words and so on.</p> regards and thx for help |
03-24-2013, 04:01 PM | #5 | |
Well trained by Cats
Posts: 29,802
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Code:
(?sm)([a-z,]) </p>\s+<p> do note: your example has a trailing space (the capture discards it), the replace inserts this. This S&R does not (deliberately) include a trailing hyphen/mdash you need to review each of those as join candidates |
|
03-24-2013, 05:27 PM | #6 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
I answered this same exact situation last month in this topic (with the two regular expressions I use to clean this):
https://www.mobileread.com/forums/sho...89#post2446589 Last edited by Tex2002ans; 03-24-2013 at 07:13 PM. |
03-25-2013, 06:38 PM | #7 |
actually it is /var/log
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
For me regex solution was not really comfortable enough. As I usually start cleaning with a pure text I thought I'll do something really nice and wrote this small peace:
Code:
/* stripLF.c - remove false line breaks compile with: cc stripLF.c -o stripLF usage: stripLF -h stripLF [-C][-p][-H] < file.in > file.out cat file.in | stripLF [-C][-p][-H] >file.out options: -h: print help text and exit -C: line break before [C]apitals is legitimate too (for poetry) -p: change line breaks into </p> LF <p> -H: change text into bare html page removes all carridge returns removes all line breaks which are not preceded by . _ ! ? * ' "]> other another line break or followed by capital letter: option -C removes multiple spaces (c)varlog 2013 LICENSE: FREE FOR ALL */ #include <stdio.h> #define LF 0x0A #define CR 0x0D #define SPACE 0x20 #define SINGLE_QUOTE 0x27 #define DOPPEL_QUOTE 0x22 #define VERSION 1.02 void usage(){ printf("\n**********************************************************\n"); printf("stripLF: remove false line breaks \n"); printf("usage: \n"); printf("stripLF [-h] \n"); printf("stripLF [-C] [-p] [-H] < file.in > file.out \n"); printf("cat file.in |stripLF [-C] [-p] [-H] > file.out \n"); printf("options:\n"); printf("-h: print this help text and exit\n"); printf("-C: line break before [C]apitals is legitimate too\n"); printf("-p: change line breaks into </p> LF <p>\n"); printf("-H: add <html><body>......</body></html> tokens, implies -p \n"); printf("v %.2f 2013 (c) varlog\n",VERSION); printf("***********************************************************\n"); } main(int argc, char **argv) { int ch,pch=LF,nch=0; int i; int Cflag=0; int Hflag=0; int pflag=0; int eflag=0; if(argc>1){ for(i=1 ;i<argc; i++){ if(argv[i][0]=='-') { switch (argv[i][1]){ case 'C': Cflag=1; //capitals break; case 'p': // LF --> </p><p> pflag=1; break; case 'h': // help usage(); eflag=1; break; case 'H': //-->html Hflag=1; pflag=1; break; default: break; } } } } if(Hflag) printf("<html>\n<body>\n"); if(pflag) printf("<p>"); while(!eflag) { ch = getchar(); if(ch==EOF) break; if(ch==SPACE && pch==SPACE) { ; //remove space if more than one by ignoring it }else{ if(ch!=LF && ch!=CR) { putchar(ch); //just next letter }else { if(ch==CR){ ch=pch; //remove CR by ignoring it }else { while((nch=getchar())==SPACE); //get next char ignoring SPACE if(nch==EOF) { putchar(ch); break; } if( // it is line break! pch==']'|| pch=='>'|| pch=='*'|| pch=='_'|| pch=='.'|| pch=='!'|| pch=='?'|| pch==SINGLE_QUOTE|| pch==DOPPEL_QUOTE|| pch==LF|| nch==LF|| (Cflag==1 && nch>=0x40 && nch<=0x5A) //capitals and @ ) { if(pflag) printf("</p>"); putchar(ch); if(pflag) printf("<p>"); putchar(nch); ch=nch; } else { //phony line break putchar(SPACE); //change LF into space putchar(nch); ch=nch; } } } pch=ch; } } //end while if(pflag) printf("</p>"); if(Hflag) printf("\n</body>\n</html>"); } |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
how do I span more than one line with regex | BartB | Sigil | 3 | 12-11-2011 05:12 PM |
End of the line for Sony Readers? | Rizla | Sony Reader | 264 | 07-06-2011 08:00 PM |
Importing RegEx Line | TheEldest | Calibre | 1 | 07-05-2011 10:18 PM |
Insert new line with regex | deckoff | Sigil | 6 | 08-08-2010 11:24 AM |
Denial of Service 5: End of Line. | Steven Lyle Jordan | Writers' Corner | 19 | 11-10-2009 10:58 PM |