Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-20-2011, 05:53 AM   #1
schuster
Zealot
schuster doesn't litterschuster doesn't litter
 
Posts: 116
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
regex replace???

hi at all,

a new prob with regex i've found during converting my books.

any sugg. ?
____________________________________
der neuen Rechtschreibung. <br>
<br>
4 <br>
<hr>
<A name=5></a><i>Vorwort </i><br>
<br>
Katzen haben in meinem
____________________________________
result is:
der neuen Rechtschreibung. Vorwort Katzen haben in meinem Leben

but i will to have it:
____________________________________
der neuen Rechtschreibung.



Vorwort

Katzen haben in meinem Leben
____________________________________


is this possible?

have tried many terms, but nothing do it as wished.

how can i replace <i> and </i> with newline follow?

or simple insert newlines to the tag's?

thanx for help

olaf
schuster is offline   Reply With Quote
Old 01-20-2011, 06:01 AM   #2
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
It's relatively unclear to me what you want. Do you want to totally remove all markup from that snippet? Or are the results you posted results without the markup, as they would be rendered in the reader?
Also, without knowing what regexes you tried, it's hard to know what to suggest.
Manichean is offline   Reply With Quote
Old 01-20-2011, 06:18 AM   #3
schuster
Zealot
schuster doesn't litterschuster doesn't litter
 
Posts: 116
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
hi Manichean,
the results are without processing any regex.
this is the result of the epub book converting.

i want to insert "newlines" on every markup (<i> and </i>).

in original the word "Vorwort" is alone in line, after processing it is member of the next line.

hope i have explained it (sorry about my bad english, too far away from school)

olaf
schuster is offline   Reply With Quote
Old 01-20-2011, 06:32 AM   #4
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Oh, alright, I understand. However, in your conversion results, the number "4" is missing, which I presume to be a page number removed by a regex, right?
What source format are you converting from and what format are you converting to?
Manichean is offline   Reply With Quote
Old 01-20-2011, 06:39 AM   #5
schuster
Zealot
schuster doesn't litterschuster doesn't litter
 
Posts: 116
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
no,
not the number is the prob.

i want to insert a newline after each <i> tag to get the tagged word in a separated line.
remove of the page numbering is not the prob yet.

source: pdf
target: epub
schuster is offline   Reply With Quote
Old 01-20-2011, 06:45 AM   #6
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
I don't know if that is possible. The problem is, I believe, that the source document, judging from the snippet you posted, doesn't seem to be marked up using paragraph tags, but rather "dumb" linebreaks. You could try enabling the preprocessing facility (found in the structure detection part of the conversion settings) and see if Calibre fixes the markup to include paragraph tags. However, I cannot say if that will necessarily occur for every italics tag.
Another, probably better, solution that comes to mind would be using Sigil. Assuming that the italics tags are preserved, as they should be, you could do a search and replace in sigil on the XHTML and add linebreaks (or paragraph tags) before and/or after the italics as you like.
Manichean is offline   Reply With Quote
Old 01-20-2011, 10:44 AM   #7
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,436
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Wait for the next release. The header and header and footer regexes are replaced with true regex search and replace. You will be able to specify <i> to be replaced with <i>\n.
user_none is offline   Reply With Quote
Old 01-21-2011, 08:20 AM   #8
schuster
Zealot
schuster doesn't litterschuster doesn't litter
 
Posts: 116
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
thanx for info user_none,
I hope the next release coming soon, so that I can edit my books quickly.

have you a date for the release?

olaf
schuster is offline   Reply With Quote
Old 01-21-2011, 08:22 AM   #9
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,436
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
There is never a firm date for a release. However, they typically happen once a week.
user_none is offline   Reply With Quote
Old 01-21-2011, 08:25 AM   #10
schuster
Zealot
schuster doesn't litterschuster doesn't litter
 
Posts: 116
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
I'll wait patiently
schuster is offline   Reply With Quote
Old 01-26-2011, 08:54 AM   #11
Wolfgan
Avid reader
Wolfgan began at the beginning.
 
Wolfgan's Avatar
 
Posts: 19
Karma: 10
Join Date: Feb 2009
Location: Argentina
Device: Kindle 3 wifi
Quote:
Originally Posted by user_none View Post
Wait for the next release. The header and header and footer regexes are replaced with true regex search and replace. You will be able to specify <i> to be replaced with <i>\n.
Thanks user_none for the confirmation. I upgraded calibre recently and was looking around for the header/footer removal options :-)
Wolf.
Wolfgan is offline   Reply With Quote
Old 01-29-2011, 08:43 AM   #12
Wolfgan
Avid reader
Wolfgan began at the beginning.
 
Wolfgan's Avatar
 
Posts: 19
Karma: 10
Join Date: Feb 2009
Location: Argentina
Device: Kindle 3 wifi
Quote:
Originally Posted by user_none View Post
There is never a firm date for a release. However, they typically happen once a week.
Just one quick question, as I downloaded 0.7.43 but saw no changes regarding header and footers. Just to know what to expect, are you envisioning a couple of specific regex boxes targeting header & footer removal (like previous Xpath options) or just include some regex on the standard "search & replace" tab?
Thanks, Wolf.
Wolfgan is offline   Reply With Quote
Old 01-29-2011, 08:51 AM   #13
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Wolfgan View Post
Just one quick question, as I downloaded 0.7.43 but saw no changes regarding header and footers. Just to know what to expect, are you envisioning a couple of specific regex boxes targeting header & footer removal (like previous Xpath options) or just include some regex on the standard "search & replace" tab?
Thanks, Wolf.
The header/footer removal was dropped completely. It is now a generic search & replace feature in the conversion options. If I remember correctly, the change happened in 0.7.42, thus no mention in the changelog for 0.7.43
Manichean is offline   Reply With Quote
Old 01-29-2011, 08:52 AM   #14
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,436
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Wolfgan View Post
Just one quick question, as I downloaded 0.7.43 but saw no changes regarding header and footers. Just to know what to expect, are you envisioning a couple of specific regex boxes targeting header & footer removal (like previous Xpath options) or just include some regex on the standard "search & replace" tab?
The header and footer boxes were just regexes that when matched would remove the the matched content. The default regex for those options didn't work in 99% of cases.

They have been replaced with the search and replace. Put the regex in the regex filed and leave the replace field blank if you want to have it delete the content.
user_none is offline   Reply With Quote
Old 01-29-2011, 09:02 AM   #15
Wolfgan
Avid reader
Wolfgan began at the beginning.
 
Wolfgan's Avatar
 
Posts: 19
Karma: 10
Join Date: Feb 2009
Location: Argentina
Device: Kindle 3 wifi
Thanks Manichean, user_none for the confirmation. I'll go & craft my regex removal then. Thanks! Wolf
Wolfgan is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
RegEx find and replace iblesq Sigil 1 01-10-2011 09:26 PM
need regex help search and replace schuster Calibre 4 01-10-2011 09:00 AM
REGEX find and replace help please potestus Sigil 13 09-18-2010 04:14 PM
Help with a regex A.T.E. Calibre 1 04-05-2010 07:50 AM
Regex help... Bobthebass Workshop 6 04-26-2009 03:54 PM


All times are GMT -4. The time now is 01:57 PM.


MobileRead.com is a privately owned, operated and funded community.