Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 11-19-2012, 04:43 PM   #1
ianc
Junior Member
ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.
 
Posts: 8
Karma: 472024
Join Date: Nov 2012
Device: Samsung Galaxy S3
Capitalize first word in sentence with search and replace?

Hi Folks,

I've converted a .PDF file to .epub and was able to remove the headers and footers with only a little difficulty.

I notice, however, that after the conversion, a lot of the capitalization at the beginning of sentences has been lost (unrelated to headers and footers), which is rather annoying.

It occurred to me to use a regex to locate lower case chars at the start of sentences. Initially, I could think of two cases:

1) First character in the sentence after a paragraph break. Can locate with "\.<br>\s+[a-z]"

2) First character in the sentence in the middle of a paragraph, assuming the previous sentence ends with a period and is followed by one space. Can locate with "\. [a-z]".

My question is, what should I use in the replacement text box to cause Calibre to substitute the upper case char for that which was found by the original search regex?

At first I just tried "\.<br>\s+[A-Z]" and "\. [A-Z]", but the replacement just took those literal text strings and wrote them into the book, so that, for example, every sentence beginning with a lower case character in the middle of a paragraph now begins with "\. [A-Z]" rather than the correct letter.

Thanks for any help,

ianc
ianc is offline   Reply With Quote
Old 11-20-2012, 07:12 AM   #2
lof
Member
lof began at the beginning.
 
Posts: 10
Karma: 30
Join Date: Nov 2012
Device: Nook Color
I am not a script wizard so I can't directly write the perfect solution, but it seems you would need to use this unless you iterate the operation for the whole [A-Z] range.
One trick could also convert the text to an xls file, use the capitalize function then convert back to a text file, though you'd have to think about keeping the paragraphs and chapter in the process.
lof is offline   Reply With Quote
 
Enthusiast
Old 11-20-2012, 09:16 AM   #3
Gunnerp245
Gadget Freak
Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.
 
Gunnerp245's Avatar
 
Posts: 1,033
Karma: 1025474
Join Date: Nov 2007
Location: US
Device: Sony 700; Entourage Edge, Kindle 3, Pocket Edge
Use Sigil. calibre does not have the full set of PCRE functions.
__________________
User of both the 10" EE & 7" PE. Visit edge/pocket edge forum. calibre User Manual.
Gunnerp245 is offline   Reply With Quote
Old 11-20-2012, 11:11 AM   #4
ianc
Junior Member
ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.
 
Posts: 8
Karma: 472024
Join Date: Nov 2012
Device: Samsung Galaxy S3
Thanks guys. Yes, I eventually did download Sigil and take a look and was able to do it there. Sigil is a bit more inconvenient to use; or at least it seems so to me, but then I haven't put any time into learning it yet.

It would seem then that Calibre's search and replace function cannot use back references to a group in the initial search regexp?

Thanks again,

ianc
ianc is offline   Reply With Quote
Old 11-20-2012, 12:49 PM   #5
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,089
Karma: 1211092
Join Date: Oct 2010
Location: London, UK
Device: Kindle 3 3G, iPad 2, iPad 3
Install the "Open With" plugin, assign "Open with Sigil" to a keyboard shortcut and you will be way ahead of anything you can do in calibre when it comes to search and replace - at least for working with EPUB. I assign Alt+E in my case and it is second nature to do any editing be it css tweaks, find/replace operations, TOC manipulations etc in Sigil that way.

Personally my toes curl every time I see one of these sort of threads to look at the sort of hoops people are jumping through to try to use the calibre S&R . It certainly has a purpose if you are not working with EPUBs (such as stripping PDF header/footers as part of a conversion to EPUB) but I would never ever use it for anything outside of that and would always recommend someone convert to EPUB (if not already), do their editing using either Sigil or Tweak ePub/HTML editor and then convert to their target format if EPUB isn't the end game.
__________________
Like my calibre plugins or Sigil work? Say thanks with PayPal
kiwidude is online now   Reply With Quote
Old 11-20-2012, 01:23 PM   #6
ianc
Junior Member
ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.
 
Posts: 8
Karma: 472024
Join Date: Nov 2012
Device: Samsung Galaxy S3
OK, thanks again for the help guys, looks like Sigil it is...

ianc
ianc is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search and replace TdeV Sigil 1 10-30-2011 04:45 PM
search and replace - drops blanks in replace ? cybmole Conversion 10 03-13-2011 03:07 AM
Help with Word - Find & Replace Big Kev Workshop 3 09-21-2010 06:51 PM


All times are GMT -4. The time now is 01:59 PM.


MobileRead.com is a privately owned, operated and funded community.