![]() |
#1 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Apr 2010
Device: iPhone, Sony PSP, Android, Kindle
|
![]()
I have a book I purchased from Amazon but there is one problem with it.
Every word that should be in italics is surrounded by _underscores_. I want to edit this book and change the underscores to italics but I'm not sure how. I removed the DRM from the book and converted it to HTMLZ format. My plan was to replace the _ with the html code for italics. The only thing is, one underscore needs to be <i> and the other needs to be </i>. I can replace all underscores with <i> or </i> using the Find and Replace option... but I can't do half and half like I need to. How can I accomplish my goal without manually replacing everything one at a time? |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,388
Karma: 14190103
Join Date: Jun 2009
Location: Berlin
Device: Cybook, iRex, PB, Onyx
|
I think you can't do without a little manual work, but maybe you can limit the number of cases you have to change by hand.
- The beginning underscore has in most cases a blank in front of it - unless it is the beginning of a sentence -, so you should be able to search for "blank+_". - The closing underscore is in most cases followed by a blank, a comma, a point or a question/exclamation mark - make use of this. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
The Grand Mouse 高貴的老鼠
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 73,804
Karma: 315126578
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
|
Unless you really, really want this book, ask Amazon for a refund and move on to something formatted properly.
If you do really, really want the book, still complain bitterly to Amazon about the poor formatting, but the way to fix it is like this. Use a text editor that understands regular expressions. Search for something like: _([a-z ]+?)_ and replace with <i>\1</i> And then check for any underscores that haven't been replaced. The first regular expression says "seach for an underscore and then the smallest number of letter and spaces possible until another underscore, and remember the text between the underscores". The second says "replace what was found by <i> followed by the first bit of remembered text, followed by </i>" |
![]() |
![]() |
![]() |
#4 | ||||
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Apr 2010
Device: iPhone, Sony PSP, Android, Kindle
|
Quote:
![]() Quote:
![]() For example: Quote:
Quote:
Last edited by organized_chaos; 09-05-2011 at 02:38 AM. |
||||
![]() |
![]() |
![]() |
#5 | |
The Grand Mouse 高貴的老鼠
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 73,804
Karma: 315126578
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
|
Quote:
will work for words and spaces. If there might be punctuation involved, you could also add in those characters, e.g. _([a-z ,.'"-;:]+?)_ or you could go for the slightly more chancy approach of _([^_]+)_ which searches for an underscore, as many characters as possible that aren't an underscore, and then another underscore. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Apr 2010
Device: iPhone, Sony PSP, Android, Kindle
|
_([a-z ,.'"-;:]+?)_ worked perfectly for me. I didn't know I could just add additional things to that (like punctuation). If there were numbers in the book would I need to add something like 0-9?
_([a-z ,0-9.'"-;:]+?)_ |
![]() |
![]() |
![]() |
#7 |
The Grand Mouse 高貴的老鼠
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 73,804
Karma: 315126578
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
|
Exactly so. the [] give a range of characters that can match. The + after the [] says that there can be as many matches as possible, but that it must match at least one character. I suppose, strictly speaking, the ? is unnecessary, since the match can't include underscores.
|
![]() |
![]() |
![]() |
#8 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Since you converted to HTMLZ I'm assuming you're using calibre. Convert it again with heuristic processing and the italicize common cases option enabled.
|
![]() |
![]() |
![]() |
#9 |
Padawan Learner
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 243
Karma: 1085815
Join Date: May 2009
Location: www.OutlawGalaxy.com, Foothills of NY's Adirondack mountains
Device: My PC...using Puppy Linux (FBReader, Calibre, Kindle Cloud Reader,
|
I think this will work for you...
This is what I do when formatting my plain text books to simple HTML. 1) Search for ^p_ and (space)_ and change them to ^p<i> and (space)<i> -- this finds all the beginning italics and the italics that start at the beginning of a paragraph. 2) Then just do a search for any remaining _ and change to </i> This should work just fine as long as the author didn't do any duplicate _ _ marks. Or, rather than just do an auto-change, you could search for each change one at a time to make sure it is the right type of change. |
![]() |
![]() |
![]() |
#10 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
and misses italics that start right after a dash, or an opening quote, or an apostrophe, or a parenthesis, or in the middle of the word (which occurs sometimes)...
|
![]() |
![]() |
![]() |
#11 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Apr 2010
Device: iPhone, Sony PSP, Android, Kindle
|
Just FYI, pdurrant's suggestion solved my problem perfectly.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Italics, Bold, Etc. | Arekuzanra | Amazon Kindle | 14 | 03-09-2014 11:01 PM |
KINDLE DEAL: Simple Little Words: What You Say Can Change a Life ($5.01) | gospelebooks | Deals and Resources (No Self-Promotion or Affiliate Links) | 0 | 05-26-2011 01:14 PM |
Still Want to be Surrounded by Treebooks? | neilmarr | General Discussions | 12 | 04-15-2010 03:05 PM |
LRF italics | bremler | Sony Reader | 11 | 01-10-2010 05:22 AM |
No italics | roquet | Bookeen | 18 | 04-26-2009 03:57 PM |