![]() |
#1 |
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
I need help!!!
I'm at the verge of gibbering. Since all of the really smart folks hang out here, I know someone has the answer. The book I'm working on has 72 blocks similar the the following:
Code:
<div class="stanza"> <p>“‘Tell me, my old friend, tell me why</p> <p>You sit and softly laugh by yourself.’</p> <p>‘It is because I am repeating to myself,</p> <p>Write! write</p> <p>Of the valiant strength,</p> <p>The calm, brave bearing</p> <p>Of the sons of the sea.’”</p> </div> Code:
<div class="stanza"> “‘Tell me, my old friend, tell me why<br /> You sit and softly laugh by yourself.’<br /> ‘It is because I am repeating to myself,<br /> Write! write<br /> Of the valiant strength,<br /> The calm, brave bearing<br /> Of the sons of the sea.’”<br /> </div> |
![]() |
![]() |
![]() |
#2 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I would actually prefer the former markup, as it lets you control how each line wraps in narrow screens (typically, I'd have some largish negative indent; ideally, the wrapped part would be right-aligned)
I see you have spaces before the <p>, is that the case in every stanza and only inside stanzas?, then you could try something like: search: "^ <p>(.*)</p>" replace: " \1<br/>" Otherwise, it gets complicated, I'd probably proceed stepwise: search for "<div class="stanza">, whatever, </p>\n<p>, whatever, </div>", where "whatever" stands for "any character, including newlines, multiple times, as few as possible", and repeat as needed. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Dylanologist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
|
I would do it in two steps.
FIND </p> REPLACE WITH <br /> And Then FIND <p> REPLACE WITH [blank] This seems too easy, what an I missing? - Fabe |
![]() |
![]() |
![]() |
#4 | |
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
Here's how I'd do it.
place cursor at END of file Match case / Minimal / Regex / Direction Up Search for Code:
<div class="stanza">(.+)<p>(.+)</p> Code:
<div class="stanza">\1\2<br /> |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Dylanologist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
|
I'd go with my original Find/Replace settings, but go a line at a time with the mouse on REPLACE and a finger on the Enter key for Find Next.
Or I'd become a potato farmer. - Fabe Last edited by Fabe; 11-02-2010 at 06:57 PM. |
![]() |
![]() |
![]() |
#7 |
Dylanologist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
|
????????
Last edited by Fabe; 11-02-2010 at 06:56 PM. Reason: Duplicate entry! |
![]() |
![]() |
![]() |
#8 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,439
Karma: 5703082
Join Date: Nov 2009
Device: many
|
Hi,
If you are okay running python you could try something like the following on each file in the epub that had stanza's in it (after unzipping it of course) Code:
#!/usr/bin/env python import sys import os def main(argv=sys.argv): if len(argv) != 3: print "syntax is: python fixme.py INPUTFILE OUTPUTFILE" return 1 infile = argv[1] outfile = argv[2] if not os.path.exists(infile): print "input file was not found" return 1 data = file(infile,'rb').read() of = file(outfile,'wb') lines = data.split(os.linesep) instanza = False res = '' for line in lines: if line.find('<div class="stanza">') != -1: instanza = True if instanza : line = line.replace('<p>','') line = line.replace('</p>','<br />') if line.find('</div>') != -1: instanza = False line += os.linesep res += line of.write(res) of.close() if __name__ == '__main__': sys.exit(main()) my test.html is: Code:
<html> <body> <p>do not change me</p> <div class="stanza"> <p>“‘Tell me, my old friend, tell me why</p> <p>You sit and softly laugh by yourself.’</p> <p>‘It is because I am repeating to myself,</p> <p>Write! write</p> <p>Of the valiant strength,</p> <p>The calm, brave bearing</p> <p>Of the sons of the sea.’”</p> </div> <p>do not change me either</p> </body> </html> python fixme.py test.html test_fixed.html gives the following for test_fixed.html Code:
<html> <body> <p>do not change me</p> <div class="stanza"> “‘Tell me, my old friend, tell me why<br /> You sit and softly laugh by yourself.’<br /> ‘It is because I am repeating to myself,<br /> Write! write<br /> Of the valiant strength,<br /> The calm, brave bearing<br /> Of the sons of the sea.’”<br /> </div> <p>do not change me either</p> </body> </html> This is of course only a simple test and the pasting of it here may cause problems if it messes up spacing and things, but a similar approach can be used for almost any mass change you want. If you are desperate enough to want to give it a try, pm me with your e-mail and I will send you the python file. If you are macosx, linux or unix based, you can use 'sed" to do this or awk or almost any simple scripting language like python (above) or perl or php, etc. KevinH Last edited by KevinH; 11-02-2010 at 05:16 PM. Reason: add test example and output |
![]() |
![]() |
![]() |
#9 |
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
My head is sore from banging against the wall!!!
Thanks to all for the ideas which made me think harder and develop a solution. Hearing other ideas is a stimulant.
I finally used GREP. This allowed a logical sectioning of the file (which I just discovered) within the stanza tags. Once this was done, and the lines I was interested in were isolated from the rest of the code, a regex find and replace on the lines in the sections worked like a champ. Trying manually to do 75 stanzas X 5 to 10 lines per stanza would mind numbing. I need to dig deeper into this GREP. BTW, the file looks great> ![]() |
![]() |
![]() |
![]() |
#10 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
|
so, how did you use grep to solve this problem?
and what OS? ETA: my textmod fu is perl >> sed > awk, but that's because I'm too frikkin lazy to learn much at my age. But occasionally I do get curious. now's your chance! ![]() Last edited by st_albert; 11-02-2010 at 11:57 PM. |
![]() |
![]() |
![]() |
#11 | |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
The second was intended to find only <p>...</p> inside stanzas, but was flawed. In the "whatever" you'd have to include the restriction "does not contain '</div>'", since that restriction in often hard to express in regexp, I sometimes use this trick: 1 Replace "<div class="stanza">...</div>" with "¬...|" (¬ and | are unused characters in the rest of the file) 2 Find and replace "¬...<p>...</p>...|" where the "..." includes "[^|]", i.e., any character but |, this keeps the matching inside the ¬...| block. 3 Convert ¬ and | back into the <div class="stanza"> and </div>. |
|
![]() |
![]() |
![]() |
#12 | |
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Quote:
Code:
<div class="stanza">.*?</div> Below the "SECTION SEARCH" pane is a SEARCH AND REPLACE pane which allows modification of the data found. So simple. ![]() |
|
![]() |
![]() |