Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 11-19-2012, 04:43 PM   #1
ianc
Member
ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.
 
Posts: 12
Karma: 472024
Join Date: Nov 2012
Device: Samsung Galaxy S3
[Old Thread] Capitalize first word in sentence with search and replace?

Hi Folks,

I've converted a .PDF file to .epub and was able to remove the headers and footers with only a little difficulty.

I notice, however, that after the conversion, a lot of the capitalization at the beginning of sentences has been lost (unrelated to headers and footers), which is rather annoying.

It occurred to me to use a regex to locate lower case chars at the start of sentences. Initially, I could think of two cases:

1) First character in the sentence after a paragraph break. Can locate with "\.<br>\s+[a-z]"

2) First character in the sentence in the middle of a paragraph, assuming the previous sentence ends with a period and is followed by one space. Can locate with "\. [a-z]".

My question is, what should I use in the replacement text box to cause Calibre to substitute the upper case char for that which was found by the original search regex?

At first I just tried "\.<br>\s+[A-Z]" and "\. [A-Z]", but the replacement just took those literal text strings and wrote them into the book, so that, for example, every sentence beginning with a lower case character in the middle of a paragraph now begins with "\. [A-Z]" rather than the correct letter.

Thanks for any help,

ianc
ianc is offline   Reply With Quote
Old 11-20-2012, 07:12 AM   #2
lof
Member
lof began at the beginning.
 
Posts: 11
Karma: 30
Join Date: Nov 2012
Device: Nook Color
I am not a script wizard so I can't directly write the perfect solution, but it seems you would need to use this unless you iterate the operation for the whole [A-Z] range.
One trick could also convert the text to an xls file, use the capitalize function then convert back to a text file, though you'd have to think about keeping the paragraphs and chapter in the process.
lof is offline   Reply With Quote
Advert
Old 11-20-2012, 09:16 AM   #3
Gunnerp245
Gadget Freak
Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.
 
Gunnerp245's Avatar
 
Posts: 1,169
Karma: 1043832
Join Date: Nov 2007
Location: US
Device: EE, Note 8
Use Sigil. calibre does not have the full set of PCRE functions.
Gunnerp245 is offline   Reply With Quote
Old 11-20-2012, 11:11 AM   #4
ianc
Member
ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.
 
Posts: 12
Karma: 472024
Join Date: Nov 2012
Device: Samsung Galaxy S3
Thanks guys. Yes, I eventually did download Sigil and take a look and was able to do it there. Sigil is a bit more inconvenient to use; or at least it seems so to me, but then I haven't put any time into learning it yet.

It would seem then that Calibre's search and replace function cannot use back references to a group in the initial search regexp?

Thanks again,

ianc
ianc is offline   Reply With Quote
Old 11-20-2012, 12:49 PM   #5
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Install the "Open With" plugin, assign "Open with Sigil" to a keyboard shortcut and you will be way ahead of anything you can do in calibre when it comes to search and replace - at least for working with EPUB. I assign Alt+E in my case and it is second nature to do any editing be it css tweaks, find/replace operations, TOC manipulations etc in Sigil that way.

Personally my toes curl every time I see one of these sort of threads to look at the sort of hoops people are jumping through to try to use the calibre S&R . It certainly has a purpose if you are not working with EPUBs (such as stripping PDF header/footers as part of a conversion to EPUB) but I would never ever use it for anything outside of that and would always recommend someone convert to EPUB (if not already), do their editing using either Sigil or Tweak ePub/HTML editor and then convert to their target format if EPUB isn't the end game.
kiwidude is offline   Reply With Quote
Advert
Old 11-20-2012, 01:23 PM   #6
ianc
Member
ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.ianc ought to be getting tired of karma fortunes by now.
 
Posts: 12
Karma: 472024
Join Date: Nov 2012
Device: Samsung Galaxy S3
OK, thanks again for the help guys, looks like Sigil it is...

ianc
ianc is offline   Reply With Quote
Old 10-21-2013, 04:47 PM   #7
Hoods7070
Groupie
Hoods7070 can name that song in three notesHoods7070 can name that song in three notesHoods7070 can name that song in three notesHoods7070 can name that song in three notesHoods7070 can name that song in three notesHoods7070 can name that song in three notesHoods7070 can name that song in three notesHoods7070 can name that song in three notesHoods7070 can name that song in three notesHoods7070 can name that song in three notesHoods7070 can name that song in three notes
 
Posts: 159
Karma: 24430
Join Date: Mar 2012
Location: Australia
Device: Nexus 7"
FWIW, to capitalize the first word in a sentence (using Sigil) is quite straightforward. For the example below, I would have defined CSS classes "indentoff" (whatever) for the first sentence following a scene break, and "caps" to transform a string to capital letters. So:

Find:
<p class="indentoff">(.*?)[space]

will find the first text string after "indentoff" followed by a space. Note: [space] here represents a typed space; it is not part of the Regex!

Replace:
<p class="indentoff"><span class="caps">\1</span>[space]
Hoods7070 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
search and replace - drops blanks in replace ? cybmole Conversion 10 03-13-2011 03:07 AM
Help with Word - Find & Replace Big Kev Workshop 3 09-21-2010 06:51 PM


All times are GMT -4. The time now is 04:43 PM.


MobileRead.com is a privately owned, operated and funded community.