Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 09-04-2011, 02:45 AM   #1
organized_chaos
Junior Member
organized_chaos began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Apr 2010
Device: iPhone, Sony PSP, Android, Kindle
Question Need to change words surrounded by _ to italics

I have a book I purchased from Amazon but there is one problem with it.

Every word that should be in italics is surrounded by _underscores_. I want to edit this book and change the underscores to italics but I'm not sure how. I removed the DRM from the book and converted it to HTMLZ format. My plan was to replace the _ with the html code for italics. The only thing is, one underscore needs to be <i> and the other needs to be </i>. I can replace all underscores with <i> or </i> using the Find and Replace option... but I can't do half and half like I need to.

How can I accomplish my goal without manually replacing everything one at a time?
organized_chaos is offline   Reply With Quote
Old 09-04-2011, 06:10 AM   #2
Billi
Wizard
Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.
 
Billi's Avatar
 
Posts: 3,388
Karma: 14190103
Join Date: Jun 2009
Location: Berlin
Device: Cybook, iRex, PB, Onyx
I think you can't do without a little manual work, but maybe you can limit the number of cases you have to change by hand.
- The beginning underscore has in most cases a blank in front of it - unless it is the beginning of a sentence -, so you should be able to search for "blank+_".
- The closing underscore is in most cases followed by a blank, a comma, a point or a question/exclamation mark - make use of this.
Billi is offline   Reply With Quote
Advert
Old 09-04-2011, 06:33 AM   #3
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 73,804
Karma: 315126578
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
Unless you really, really want this book, ask Amazon for a refund and move on to something formatted properly.

If you do really, really want the book, still complain bitterly to Amazon about the poor formatting, but the way to fix it is like this.

Use a text editor that understands regular expressions. Search for something like:

_([a-z ]+?)_

and replace with

<i>\1</i>

And then check for any underscores that haven't been replaced. The first regular expression says "seach for an underscore and then the smallest number of letter and spaces possible until another underscore, and remember the text between the underscores". The second says "replace what was found by <i> followed by the first bit of remembered text, followed by </i>"


Quote:
Originally Posted by organized_chaos View Post
Every word that should be in italics is surrounded by _underscores_. I want to edit this book and change the underscores to italics but I'm not sure how.
[...]
How can I accomplish my goal without manually replacing everything one at a time?
pdurrant is offline   Reply With Quote
Old 09-05-2011, 02:36 AM   #4
organized_chaos
Junior Member
organized_chaos began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Apr 2010
Device: iPhone, Sony PSP, Android, Kindle
Quote:
Originally Posted by pdurrant View Post
If you do really, really want the book, still complain bitterly to Amazon about the poor formatting, but the way to fix it is like this.
Complaint has been registered with Amazon.

Quote:
Use a text editor that understands regular expressions. Search for something like:

_([a-z ]+?)_

and replace with

<i>\1</i>

And then check for any underscores that haven't been replaced. The first regular expression says "seach for an underscore and then the smallest number of letter and spaces possible until another underscore, and remember the text between the underscores". The second says "replace what was found by <i> followed by the first bit of remembered text, followed by </i>"
This works for single words, but how can I make it work with entire sentences? I'm sorry, I've never needed to edit documents like this before.

For example:
Quote:
_That's South America down there_, he decided, after rejecting the notion that it might be Africa. They had pretty much the same shape, and it was so hard to remember what Earth's continents looked like when there were so many other worlds. _But that's South America. And so that's North America just above it. The place where I was born._
Should be this:
Quote:
That's South America down there, he decided, after rejecting the notion that it might be Africa. They had pretty much the same shape, and it was so hard to remember what Earth's continents looked like when there were so many other worlds. But that's South America. And so that's North America just above it. The place where I was born.

Last edited by organized_chaos; 09-05-2011 at 02:38 AM.
organized_chaos is offline   Reply With Quote
Old 09-05-2011, 03:57 AM   #5
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 73,804
Karma: 315126578
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
Quote:
Originally Posted by organized_chaos View Post
This works for single words, but how can I make it work with entire sentences? I'm sorry, I've never needed to edit documents like this before.
_([a-z ]+?)_

will work for words and spaces. If there might be punctuation involved, you could also add in those characters, e.g.

_([a-z ,.'"-;:]+?)_

or you could go for the slightly more chancy approach of

_([^_]+)_

which searches for an underscore, as many characters as possible that aren't an underscore, and then another underscore.
pdurrant is offline   Reply With Quote
Advert
Old 09-05-2011, 04:18 AM   #6
organized_chaos
Junior Member
organized_chaos began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Apr 2010
Device: iPhone, Sony PSP, Android, Kindle
_([a-z ,.'"-;:]+?)_ worked perfectly for me. I didn't know I could just add additional things to that (like punctuation). If there were numbers in the book would I need to add something like 0-9?

_([a-z ,0-9.'"-;:]+?)_
organized_chaos is offline   Reply With Quote
Old 09-05-2011, 04:25 AM   #7
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 73,804
Karma: 315126578
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
Quote:
Originally Posted by organized_chaos View Post
_([a-z ,.'"-;:]+?)_ worked perfectly for me. I didn't know I could just add additional things to that (like punctuation). If there were numbers in the book would I need to add something like 0-9?

_([a-z ,0-9.'"-;:]+?)_
Exactly so. the [] give a range of characters that can match. The + after the [] says that there can be as many matches as possible, but that it must match at least one character. I suppose, strictly speaking, the ? is unnecessary, since the match can't include underscores.
pdurrant is offline   Reply With Quote
Old 09-14-2011, 03:05 PM   #8
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Since you converted to HTMLZ I'm assuming you're using calibre. Convert it again with heuristic processing and the italicize common cases option enabled.
user_none is offline   Reply With Quote
Old 09-17-2011, 05:25 PM   #9
BillSmithBooks
Padawan Learner
BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.BillSmithBooks ought to be getting tired of karma fortunes by now.
 
BillSmithBooks's Avatar
 
Posts: 243
Karma: 1085815
Join Date: May 2009
Location: www.OutlawGalaxy.com, Foothills of NY's Adirondack mountains
Device: My PC...using Puppy Linux (FBReader, Calibre, Kindle Cloud Reader,
I think this will work for you...

This is what I do when formatting my plain text books to simple HTML.

1) Search for ^p_ and (space)_ and change them to ^p<i> and (space)<i> -- this finds all the beginning italics and the italics that start at the beginning of a paragraph.

2) Then just do a search for any remaining _ and change to </i>

This should work just fine as long as the author didn't do any duplicate _ _ marks. Or, rather than just do an auto-change, you could search for each change one at a time to make sure it is the right type of change.
BillSmithBooks is offline   Reply With Quote
Old 09-18-2011, 03:35 AM   #10
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by BillSmithBooks View Post
1) Search for ^p_ and (space)_ and change them to ^p<i> and (space)<i> -- this finds all the beginning italics and the italics that start at the beginning of a paragraph.
and misses italics that start right after a dash, or an opening quote, or an apostrophe, or a parenthesis, or in the middle of the word (which occurs sometimes)...
Jellby is offline   Reply With Quote
Old 09-23-2011, 01:38 AM   #11
organized_chaos
Junior Member
organized_chaos began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Apr 2010
Device: iPhone, Sony PSP, Android, Kindle
Just FYI, pdurrant's suggestion solved my problem perfectly.
organized_chaos is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Italics, Bold, Etc. Arekuzanra Amazon Kindle 14 03-09-2014 11:01 PM
KINDLE DEAL: Simple Little Words: What You Say Can Change a Life ($5.01) gospelebooks Deals and Resources (No Self-Promotion or Affiliate Links) 0 05-26-2011 01:14 PM
Still Want to be Surrounded by Treebooks? neilmarr General Discussions 12 04-15-2010 03:05 PM
LRF italics bremler Sony Reader 11 01-10-2010 05:22 AM
No italics roquet Bookeen 18 04-26-2009 03:57 PM


All times are GMT -4. The time now is 03:27 PM.


MobileRead.com is a privately owned, operated and funded community.