Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 11-02-2010, 02:14 PM   #1
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
I need help!!!

I'm at the verge of gibbering. Since all of the really smart folks hang out here, I know someone has the answer. The book I'm working on has 72 blocks similar the the following:

Code:
<div class="stanza">
  <p>“‘Tell me, my old friend, tell me why</p>
  <p>You sit and softly laugh by yourself.’</p>
  <p>‘It is because I am repeating to myself,</p>
  <p>Write! write</p>
  <p>Of the valiant strength,</p>
  <p>The calm, brave bearing</p>
  <p>Of the sons of the sea.’”</p>
</div>
I need to get rid of the <p>...</p> tags and replace with the following:

Code:
<div class="stanza">
  “‘Tell me, my old friend, tell me why<br />
  You sit and softly laugh by yourself.’<br />
  ‘It is because I am repeating to myself,<br />
  Write! write<br />
  Of the valiant strength,<br />
  The calm, brave bearing<br />
  Of the sons of the sea.’”<br />
</div>
I've been through every manual and web reference I can find and nothing I've tried has worked. Nothing comes close to finding a series of items between two markers. I know that anything can be done with regex's if you are only smart enough or can frame the right question to google.
crutledge is offline   Reply With Quote
Old 11-02-2010, 02:31 PM   #2
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
I would actually prefer the former markup, as it lets you control how each line wraps in narrow screens (typically, I'd have some largish negative indent; ideally, the wrapped part would be right-aligned)

I see you have spaces before the <p>, is that the case in every stanza and only inside stanzas?, then you could try something like:

search: "^ <p>(.*)</p>"
replace: " \1<br/>"

Otherwise, it gets complicated, I'd probably proceed stepwise: search for "<div class="stanza">, whatever, </p>\n<p>, whatever, </div>", where "whatever" stands for "any character, including newlines, multiple times, as few as possible", and repeat as needed.
Jellby is offline   Reply With Quote
Advert
Old 11-02-2010, 02:46 PM   #3
Fabe
Dylanologist
Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.
 
Fabe's Avatar
 
Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
I would do it in two steps.
FIND </p>
REPLACE WITH <br />

And Then

FIND <p>
REPLACE WITH [blank]

This seems too easy, what an I missing? - Fabe
Fabe is offline   Reply With Quote
Old 11-02-2010, 03:36 PM   #4
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
Quote:
Originally Posted by Jellby View Post
I would actually prefer the former markup, as it lets you control how each line wraps in narrow screens (typically, I'd have some largish negative indent; ideally, the wrapped part would be right-aligned)

I see you have spaces before the <p>, is that the case in every stanza and only inside stanzas?, then you could try something like:

search: "^ <p>(.*)</p>"
replace: " \1<br/>"

Otherwise, it gets complicated, I'd probably proceed stepwise: search for "<div class="stanza">, whatever, </p>\n<p>, whatever, </div>", where "whatever" stands for "any character, including newlines, multiple times, as few as possible", and repeat as needed.
Please remember that there are many <p>...</p> that are not part of the stanza. What you describe would change every <p>...</p> in the file. The changes must be limited to those <p>...</p> between <div class="stanza"> and </div>
crutledge is offline   Reply With Quote
Old 11-02-2010, 04:47 PM   #5
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Here's how I'd do it.
place cursor at END of file

Match case / Minimal / Regex / Direction Up
Search for
Code:
<div class="stanza">(.+)<p>(.+)</p>
Replace with
Code:
<div class="stanza">\1\2<br />
Then repeat 'replace' as needed for each line of current stanza, then after replacing the last line, do a 'find next'
Perkin is offline   Reply With Quote
Advert
Old 11-02-2010, 04:52 PM   #6
Fabe
Dylanologist
Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.
 
Fabe's Avatar
 
Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
I'd go with my original Find/Replace settings, but go a line at a time with the mouse on REPLACE and a finger on the Enter key for Find Next.

Or I'd become a potato farmer. - Fabe

Last edited by Fabe; 11-02-2010 at 06:57 PM.
Fabe is offline   Reply With Quote
Old 11-02-2010, 04:53 PM   #7
Fabe
Dylanologist
Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.
 
Fabe's Avatar
 
Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
????????

Last edited by Fabe; 11-02-2010 at 06:56 PM. Reason: Duplicate entry!
Fabe is offline   Reply With Quote
Old 11-02-2010, 05:02 PM   #8
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,439
Karma: 5703082
Join Date: Nov 2009
Device: many
Hi,

If you are okay running python you could try something like the following on each file in the epub that had stanza's in it (after unzipping it of course)

Code:
#!/usr/bin/env python                                                                                   
import sys
import os
def main(argv=sys.argv):
    if len(argv) != 3:
        print "syntax is:  python fixme.py INPUTFILE OUTPUTFILE"
        return 1
    infile = argv[1]
    outfile = argv[2]
    if not os.path.exists(infile):
        print "input file was not found"
        return 1

    data = file(infile,'rb').read()
    of = file(outfile,'wb')
    lines = data.split(os.linesep)
    instanza = False
    res = ''
    for line in lines:
        if line.find('<div class="stanza">') != -1:
            instanza = True
        if instanza :
            line = line.replace('<p>','')
            line = line.replace('</p>','<br />')
            if line.find('</div>') != -1:
                instanza = False
        line += os.linesep
        res += line
    of.write(res)
    of.close()

if __name__ == '__main__':
    sys.exit(main())

my test.html is:
Code:
<html>
<body>
<p>do not change me</p>
<div class="stanza">
  <p>“‘Tell me, my old friend, tell me why</p>
  <p>You sit and softly laugh by yourself.’</p>
  <p>‘It is because I am repeating to myself,</p>
  <p>Write! write</p>
  <p>Of the valiant strength,</p>
  <p>The calm, brave bearing</p>
  <p>Of the sons of the sea.’”</p>
</div>
<p>do not change me either</p>
</body>
</html>
And running:

python fixme.py test.html test_fixed.html

gives the following for test_fixed.html

Code:
<html>
<body>
<p>do not change me</p>
<div class="stanza">
  “‘Tell me, my old friend, tell me why<br />
  You sit and softly laugh by yourself.’<br />
  ‘It is because I am repeating to myself,<br />
  Write! write<br />
  Of the valiant strength,<br />
  The calm, brave bearing<br />
  Of the sons of the sea.’”<br />
</div>
<p>do not change me either</p>
</body>
</html>


This is of course only a simple test and the pasting of it here may cause problems if it messes up spacing and things, but a similar approach can be used for almost any mass change you want.

If you are desperate enough to want to give it a try, pm me with your e-mail and I will send you the python file.

If you are macosx, linux or unix based, you can use 'sed" to do this or awk or almost any simple scripting language like python (above) or perl or php, etc.

KevinH

Last edited by KevinH; 11-02-2010 at 05:16 PM. Reason: add test example and output
KevinH is offline   Reply With Quote
Old 11-02-2010, 08:02 PM   #9
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
My head is sore from banging against the wall!!!

Thanks to all for the ideas which made me think harder and develop a solution. Hearing other ideas is a stimulant.

I finally used GREP. This allowed a logical sectioning of the file (which I just discovered) within the stanza tags. Once this was done, and the lines I was interested in were isolated from the rest of the code, a regex find and replace on the lines in the sections worked like a champ. Trying manually to do 75 stanzas X 5 to 10 lines per stanza would mind numbing.

I need to dig deeper into this GREP.

BTW, the file looks great>
crutledge is offline   Reply With Quote
Old 11-02-2010, 11:51 PM   #10
st_albert
Guru
st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'
 
Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
so, how did you use grep to solve this problem?

and what OS?

ETA: my textmod fu is perl >> sed > awk, but that's because I'm too frikkin lazy to learn much at my age. But occasionally I do get curious. now's your chance!

Last edited by st_albert; 11-02-2010 at 11:57 PM.
st_albert is offline   Reply With Quote
Old 11-03-2010, 05:29 AM   #11
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by crutledge View Post
Please remember that there are many <p>...</p> that are not part of the stanza. What you describe would change every <p>...</p> in the file. The changes must be limited to those <p>...</p> between <div class="stanza"> and </div>
Actually, the first solution only replaces <p>...</p> which are preceded by spaces.

The second was intended to find only <p>...</p> inside stanzas, but was flawed. In the "whatever" you'd have to include the restriction "does not contain '</div>'", since that restriction in often hard to express in regexp, I sometimes use this trick:

1 Replace "<div class="stanza">...</div>" with "¬...|" (¬ and | are unused characters in the rest of the file)
2 Find and replace "¬...<p>...</p>...|" where the "..." includes "[^|]", i.e., any character but |, this keeps the matching inside the ¬...| block.
3 Convert ¬ and | back into the <div class="stanza"> and </div>.
Jellby is offline   Reply With Quote
Old 11-03-2010, 06:01 AM   #12
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
Quote:
Originally Posted by st_albert View Post
so, how did you use grep to solve this problem?

and what OS?

ETA: my textmod fu is perl >> sed > awk, but that's because I'm too frikkin lazy to learn much at my age. But occasionally I do get curious. now's your chance!
I am not a GREP expert. I use PowerGREP from JGSOFT. I selected "FILE SECTIONING" and entered the following in the "SECTION SEARCH" pane.

Code:
<div class="stanza">.*?</div>
When I selected "PREVIEW" all of the lines within each stanza were displayed. Black Magic.

Below the "SECTION SEARCH" pane is a SEARCH AND REPLACE pane which allows modification of the data found.

So simple. Clever people these JSOFT folks. I've used their tools for years. I just have to be smart enough to use them. But they do have a great forum setup for help and information with a large following so I can ask my dumb questions.
crutledge is offline   Reply With Quote
Reply


Forum Jump


All times are GMT -4. The time now is 10:16 PM.


MobileRead.com is a privately owned, operated and funded community.