02-26-2016, 05:05 PM | #1 |
Member
Posts: 15
Karma: 10
Join Date: Dec 2014
Device: kindle touch WP63GW
|
Question how to eliminate unwanted space in Sigil
I have an ebook originally in doc format. I opened the ebook in Microsoft Word and saved as PDF. Then I used Abbyy Finereader 12 to convert to epub. In some cases (usually around pictures or author sidebar), unwanted spaces appeared. When I opened the epub in Sigil, I couldn't just easily delete the unwanted space. I had to do a lot of copy/paste and typing to eliminate the space. My question is: is there an easy way to delete the unwanted spaces.
The epub is here (search for "rolling" to find the part): http://files.videohelp.com/u/61125/Travel2.epub This first screenshot shows the unwanted space in Kindle Previewer. The second screenshot shows the unwanted space in Sigil. I can't take the cursor and just delete the space. Image violates Posting Guideline for size - MODERATOR] Image violates Posting Guideline for size - MODERATOR] Last edited by Dr. Drib; 03-20-2016 at 06:59 AM. |
02-26-2016, 05:25 PM | #2 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Stop converting DOCX to PDF and then OCRing to EPUB. You are deliberately shooting yourself in the foot (and cutting off your nose for good measure).
Convert DOCX to EPUB. End of story. |
02-26-2016, 05:40 PM | #3 |
Member
Posts: 15
Karma: 10
Join Date: Dec 2014
Device: kindle touch WP63GW
|
I initially tried that (convert doc to epub with several different methods, Calibre/online), but around the pictures and author sidebars, the results were completely unacceptable.
Examples of sidebar/pics: Last edited by Dr. Drib; 03-20-2016 at 07:37 AM. Reason: Two images overlap - Exceeds Posting Guidelines for size. |
02-26-2016, 05:51 PM | #4 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Is this Fixed-Format?
Regardless, choose your problem: horrifically mangled OCR spew, or improperly-converted sidebars. Either one requires fixing by hand. Alternatively, Kindle Previewer might be able to do directly from DOCX to dual-MOBI and preserve everything, and KindleUnpack can extract an EPUB from that, which *might* do the trick (or a trick, anyway). |
02-26-2016, 07:02 PM | #5 |
null operator (he/him)
Posts: 20,575
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@jimdays - have you tried
BR |
02-27-2016, 12:01 AM | #6 |
Well trained by Cats
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Your original issue looked like a split (broken) paragraph.
I fix those in Codeview by using the proper REGEX to remove the trailing Paragraph ending (usually </p> but may include /Spans, /i, /b ... or it might use a Div) I tend to do in 3 passes (note that some patterns include SELECTED punctuation marks <lower to lower pass> ([a-z,"])</p>\s+([a-z]) \1 \2 ( after a few test replace, I use All ) <lower to upper pass> ([a-z,"])</p>\s+([A-Z"]) \1 \2 (I step through on this one and confirm each replace) <dehyphen pass> ([a-z]-)</p>\s+([a-z]) \1\2 <no space in this replace. Also a step through pass And you might want to do a emdash pass as well |
02-27-2016, 08:35 AM | #7 | |
A Hairy Wizard
Posts: 3,095
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Quote:
I am nowhere close to an expert on regex... could you not do the third pass like such to get dash and emdash on the same pass? <dehyphen pass> ([a-z])([-—])</p>\s+([a-z]) \1\3 The way I read yours is that it includes the "-" in the first parenthesis and would thus put it back on the replace. |
|
02-29-2016, 02:34 AM | #8 |
Banned
Posts: 272
Karma: 1224588
Join Date: Sep 2014
Device: Sony PRS 650
|
Instead of declaring wrong chars at the end of a paragraph ist better to declare allowed chars at the end of a paragraph... such as [^.,:!"] and some other quote chars.
|
03-03-2016, 08:43 PM | #9 | |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
You would have been FAR better off just going straight from Word. You will face this extra work one way or the other. It's not necessarily "less work" to do the ePUB from Word, but it's LESS CRUFT to clean up. When you try to invent a process, as you have, you are not seeing the iceberg. By which I mean, you think you are seeing progress, on the surface; but the ungodly mess that you're not seeing, under the hood, will come back and wreak havoc when you try to put this on any real device. You're spinning your wheels, chasing ephemera that isn't real. What you think you're seeing, in BookView, isn't what you'll see down the road. OH, and by the way: using freebie online converters or calibre isn't helping you, either. Not the way you're using them. BUT, I'll shut up now. As sure as shooting, you'll keep trying the methods that SEEM to produce the result you want. I'm really glad I'm not the one cleaning up the undersides of that file. Hitch Last edited by Dr. Drib; 03-20-2016 at 06:57 AM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Eliminate first paragraph tab after blank space? | nws | ePub | 8 | 04-10-2015 03:07 AM |
Sigil 0.4.1 : unwanted span added by Sigil | Bertrand | Sigil | 0 | 09-02-2011 05:28 AM |
Removing unwanted white space | JayLaFunk | Sigil | 4 | 03-19-2010 11:33 AM |
Unwanted space between paragraphs | superanima | Calibre | 3 | 10-14-2009 02:28 PM |
utility to eliminate unwanted line breaks in txt | profnachos | Workshop | 11 | 11-27-2007 06:24 PM |