Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-26-2016, 05:05 PM   #1
jimdays
Member
jimdays began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Dec 2014
Device: kindle touch WP63GW
Question how to eliminate unwanted space in Sigil

I have an ebook originally in doc format. I opened the ebook in Microsoft Word and saved as PDF. Then I used Abbyy Finereader 12 to convert to epub. In some cases (usually around pictures or author sidebar), unwanted spaces appeared. When I opened the epub in Sigil, I couldn't just easily delete the unwanted space. I had to do a lot of copy/paste and typing to eliminate the space. My question is: is there an easy way to delete the unwanted spaces.
The epub is here (search for "rolling" to find the part):
http://files.videohelp.com/u/61125/Travel2.epub
This first screenshot shows the unwanted space in Kindle Previewer.
The second screenshot shows the unwanted space in Sigil. I can't take the cursor and just delete the space.


Image violates Posting Guideline for size - MODERATOR]


Image violates Posting Guideline for size - MODERATOR]

Last edited by Dr. Drib; 03-20-2016 at 06:59 AM.
jimdays is offline   Reply With Quote
Old 02-26-2016, 05:25 PM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Stop converting DOCX to PDF and then OCRing to EPUB. You are deliberately shooting yourself in the foot (and cutting off your nose for good measure).

Convert DOCX to EPUB.

End of story.
eschwartz is offline   Reply With Quote
Old 02-26-2016, 05:40 PM   #3
jimdays
Member
jimdays began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Dec 2014
Device: kindle touch WP63GW
I initially tried that (convert doc to epub with several different methods, Calibre/online), but around the pictures and author sidebars, the results were completely unacceptable.
Examples of sidebar/pics:





Last edited by Dr. Drib; 03-20-2016 at 07:37 AM. Reason: Two images overlap - Exceeds Posting Guidelines for size.
jimdays is offline   Reply With Quote
Old 02-26-2016, 05:51 PM   #4
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Is this Fixed-Format?

Regardless, choose your problem: horrifically mangled OCR spew, or improperly-converted sidebars.

Either one requires fixing by hand.

Alternatively, Kindle Previewer might be able to do directly from DOCX to dual-MOBI and preserve everything, and KindleUnpack can extract an EPUB from that, which *might* do the trick (or a trick, anyway).
eschwartz is offline   Reply With Quote
Old 02-26-2016, 07:02 PM   #5
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,575
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@jimdays - have you tried
  • converting a DOCX from Word (or Writer) with calibre
  • importing a DOCX into the calibre book editor
  • using Toxaris' e-Book Tools - a Word add-in
Any one of them should do a better job than exporting a PDF from Word and OCRing it. As eschwartz indicates - you'll have to do some editing, but not to get rid of the crud created by OCR scanning.

BR
BetterRed is offline   Reply With Quote
Old 02-27-2016, 12:01 AM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Your original issue looked like a split (broken) paragraph.

I fix those in Codeview by using the proper REGEX to remove the trailing Paragraph ending (usually </p> but may include /Spans, /i, /b ... or it might use a Div)

I tend to do in 3 passes (note that some patterns include SELECTED punctuation marks
<lower to lower pass>
([a-z,"])</p>\s+([a-z])
\1 \2 ( after a few test replace, I use All )

<lower to upper pass>
([a-z,"])</p>\s+([A-Z"])
\1 \2 (I step through on this one and confirm each replace)

<dehyphen pass>
([a-z]-)</p>\s+([a-z])
\1\2 <no space in this replace. Also a step through pass

And you might want to do a emdash pass as well
theducks is offline   Reply With Quote
Old 02-27-2016, 08:35 AM   #7
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,095
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Quote:
Originally Posted by theducks View Post
<dehyphen pass>
([a-z]-)</p>\s+([a-z])
\1\2 <no space in this replace. Also a step through pass

And you might want to do a emdash pass as well
Thanks Ducks!

I am nowhere close to an expert on regex... could you not do the third pass like such to get dash and emdash on the same pass?

<dehyphen pass>
([a-z])([-—])</p>\s+([a-z])
\1\3

The way I read yours is that it includes the "-" in the first parenthesis and would thus put it back on the replace.
Turtle91 is offline   Reply With Quote
Old 02-29-2016, 02:34 AM   #8
rubeus
Banned
rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.
 
Posts: 272
Karma: 1224588
Join Date: Sep 2014
Device: Sony PRS 650
Instead of declaring wrong chars at the end of a paragraph ist better to declare allowed chars at the end of a paragraph... such as [^.,:!"] and some other quote chars.
rubeus is offline   Reply With Quote
Old 03-03-2016, 08:43 PM   #9
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by jimdays View Post
I initially tried that (convert doc to epub with several different methods, Calibre/online), but around the pictures and author sidebars, the results were completely unacceptable.
Examples of sidebar/pics:





I'll just cut this short. There is NO SUCH THING as an automagic conversion for a book that has wraparound articles, sidebars, floating images, etc. In trying to achieve this, you are causing yourself untold amounts of unnecessary work--all created in the false hope that you'll have LESS work. This is not the case.

You would have been FAR better off just going straight from Word. You will face this extra work one way or the other. It's not necessarily "less work" to do the ePUB from Word, but it's LESS CRUFT to clean up. When you try to invent a process, as you have, you are not seeing the iceberg. By which I mean, you think you are seeing progress, on the surface; but the ungodly mess that you're not seeing, under the hood, will come back and wreak havoc when you try to put this on any real device.

You're spinning your wheels, chasing ephemera that isn't real. What you think you're seeing, in BookView, isn't what you'll see down the road.

OH, and by the way: using freebie online converters or calibre isn't helping you, either. Not the way you're using them.

BUT, I'll shut up now. As sure as shooting, you'll keep trying the methods that SEEM to produce the result you want. I'm really glad I'm not the one cleaning up the undersides of that file.

Hitch

Last edited by Dr. Drib; 03-20-2016 at 06:57 AM.
Hitch is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Eliminate first paragraph tab after blank space? nws ePub 8 04-10-2015 03:07 AM
Sigil 0.4.1 : unwanted span added by Sigil Bertrand Sigil 0 09-02-2011 05:28 AM
Removing unwanted white space JayLaFunk Sigil 4 03-19-2010 11:33 AM
Unwanted space between paragraphs superanima Calibre 3 10-14-2009 02:28 PM
utility to eliminate unwanted line breaks in txt profnachos Workshop 11 11-27-2007 06:24 PM


All times are GMT -4. The time now is 10:12 AM.


MobileRead.com is a privately owned, operated and funded community.