Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 03-20-2013, 07:21 AM   #1
mcam77
Junior Member
mcam77 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Feb 2013
Device: kobo touch
Question Regex Problem / Line that does't end with .</p>

Hello,

I am tryimg to match a string (with regex and sigil) that doesn't ends with .</p>

Tried 1h and googeling.

Thanks

Martin
mcam77 is offline   Reply With Quote
Old 03-20-2013, 02:00 PM   #2
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by mcam77 View Post
I am tryimg to match a string (with regex and sigil) that doesn't ends with .</p>
It would be helpful if you could give examples of sentences you are trying to match.
Tex2002ans is offline   Reply With Quote
Advert
Old 03-20-2013, 08:31 PM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,775
Karma: 54401244
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Sigil 7 has a saved search: Join paragraphs that look for 'paragraphs that end with a letter or comma (not a period).
Does not find all the other sometimes valid cases. For those, you need to carefully craft your Search and step through, skipping (find) or replace+find Never Replace All with the others
theducks is offline   Reply With Quote
Old 03-24-2013, 02:29 PM   #4
mcam77
Junior Member
mcam77 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Feb 2013
Device: kobo touch
sry for late reply!

I am usig sigil 0.6.2 cause sigil 0.7.1 will start for a few counts, and then refuses to start(When starting it nothing happens, found a lot of stuff for linux but nothing for windows) maybe someone know where the problem is.

back to the topic

if have paragraphs like

<p>blah some words </p>

<p>next words and so on.</p>

and i want

<p>blah some words next words and so on.</p>

regards and thx for help
mcam77 is offline   Reply With Quote
Old 03-24-2013, 04:01 PM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,775
Karma: 54401244
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by mcam77 View Post
sry for late reply!

I am usig sigil 0.6.2 cause sigil 0.7.1 will start for a few counts, and then refuses to start(When starting it nothing happens, found a lot of stuff for linux but nothing for windows) maybe someone know where the problem is.

back to the topic

if have paragraphs like

<p>blah some words </p>

<p>next words and so on.</p>

and i want

<p>blah some words next words and so on.</p>

regards and thx for help
Code:
(?sm)([a-z,]) </p>\s+<p>
replace: <space here>\1

do note: your example has a trailing space (the capture discards it), the replace inserts this.
This S&R does not (deliberately) include a trailing hyphen/mdash you need to review each of those as join candidates
theducks is offline   Reply With Quote
Advert
Old 03-24-2013, 05:27 PM   #6
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
I answered this same exact situation last month in this topic (with the two regular expressions I use to clean this):

https://www.mobileread.com/forums/sho...89#post2446589

Last edited by Tex2002ans; 03-24-2013 at 07:13 PM.
Tex2002ans is offline   Reply With Quote
Old 03-25-2013, 06:38 PM   #7
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
For me regex solution was not really comfortable enough. As I usually start cleaning with a pure text I thought I'll do something really nice and wrote this small peace:
Code:
/*
 stripLF.c - remove false line breaks 
 compile with: cc stripLF.c -o stripLF
 usage:
          stripLF -h 
          stripLF [-C][-p][-H] < file.in > file.out
          cat file.in | stripLF [-C][-p][-H] >file.out
 options:
    -h: print help text and exit
    -C: line break before [C]apitals is legitimate too (for poetry)
    -p: change line breaks into </p> LF <p>  
    -H: change text into bare html page       
    
 removes all carridge returns
 removes all line breaks which are not 
 preceded by . _ ! ? * ' "]> other another line break or followed by capital letter: option -C
 removes multiple spaces  
   (c)varlog 2013
   LICENSE: FREE FOR ALL
   
*/

#include <stdio.h> 
#define LF 0x0A
#define CR 0x0D
#define SPACE 0x20
#define SINGLE_QUOTE 0x27
#define DOPPEL_QUOTE 0x22

#define VERSION 1.02

void usage(){
 printf("\n**********************************************************\n");
 printf("stripLF: remove false line breaks \n");
 printf("usage: \n");
 printf("stripLF [-h] \n");
 printf("stripLF [-C] [-p] [-H] < file.in > file.out \n");
 printf("cat file.in |stripLF [-C] [-p] [-H] > file.out \n");
 printf("options:\n");
 printf("-h: print this help text and exit\n");
 printf("-C: line break before [C]apitals is legitimate too\n");
 printf("-p: change line breaks into </p> LF <p>\n");
 printf("-H: add <html><body>......</body></html> tokens, implies -p \n");
 printf("v %.2f 2013 (c) varlog\n",VERSION);
 printf("***********************************************************\n");
}

main(int argc, char **argv)
{

int ch,pch=LF,nch=0;
int i;
int Cflag=0;
int Hflag=0;
int pflag=0;
int eflag=0;
 
 if(argc>1){
  for(i=1 ;i<argc; i++){
   if(argv[i][0]=='-') {
   
    switch (argv[i][1]){
     case 'C':
      Cflag=1; //capitals
      break;
     case 'p':   // LF --> </p><p>
      pflag=1;
      break;
     case 'h':   // help
      usage();
      eflag=1;
      break;
     case 'H':  //-->html
      Hflag=1;
      pflag=1;
      break;
     default:
      break;
    }   
   }  
  }
 }
  if(Hflag) printf("<html>\n<body>\n");
  if(pflag) printf("<p>");
      while(!eflag)
  {
   ch = getchar();
   if(ch==EOF) break;
   if(ch==SPACE && pch==SPACE) { 
    ; //remove space if more than one by ignoring it
   }else{
    if(ch!=LF && ch!=CR) {
    putchar(ch);  //just next letter
    }else {   
     if(ch==CR){ 
      ch=pch; //remove CR by ignoring it 
     }else {      
      while((nch=getchar())==SPACE); //get next char ignoring SPACE
      if(nch==EOF)
      {
       putchar(ch);
       break;      
      }        
      if(  // it is line break!
       pch==']'||
       pch=='>'||
       pch=='*'||
       pch=='_'||
       pch=='.'||
       pch=='!'||
       pch=='?'||
       pch==SINGLE_QUOTE|| 
       pch==DOPPEL_QUOTE||
       pch==LF||
       nch==LF||
       (Cflag==1 && nch>=0x40 && nch<=0x5A)  //capitals and @      
       )  { 
        if(pflag) printf("</p>");     
        putchar(ch);
        if(pflag) printf("<p>");
        putchar(nch);
        ch=nch;
       }
        else {  //phony line break
         putchar(SPACE); //change LF into space                
         putchar(nch);
         ch=nch;
         }
       }      
     }
     pch=ch;  
   }
  } //end while
  if(pflag) printf("</p>");
  if(Hflag) printf("\n</body>\n</html>");
}
of course it is not really nice and needs manual correction in the end - but it works for me .
varlog is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
how do I span more than one line with regex BartB Sigil 3 12-11-2011 05:12 PM
End of the line for Sony Readers? Rizla Sony Reader 264 07-06-2011 08:00 PM
Importing RegEx Line TheEldest Calibre 1 07-05-2011 10:18 PM
Insert new line with regex deckoff Sigil 6 08-08-2010 11:24 AM
Denial of Service 5: End of Line. Steven Lyle Jordan Writers' Corner 19 11-10-2009 10:58 PM


All times are GMT -4. The time now is 01:53 PM.


MobileRead.com is a privately owned, operated and funded community.