Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 10-27-2010, 09:45 PM   #1
bfollowell
Fanatic
bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.
 
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
Help with horribly formatted epub?

I have an epub that someone did a totally horrible formatting job on. Rather than have each paragraph defined as, of all things, a paragraph, basically each chapter is a paragragh with breaks and extra line spacing between what should be paragraphs. I've tried to clean it up a little by converting with Calibre but have had no luck at all. Since each chapter is really just one giant paragraph, there is no way to remove the extra spacing between the paragraph groups. They aren't really paragraphs as far as Calibre is concerned so there's nothing to remove.

I'm almost afraid to ask but, is there an easy way to removed all these breaks and clean this up or am I basically going to have to edit this whole 500 page book by hand in order to clean it up?

I'd love to hear any advice anyone may have to offer.

Thanks.

- Byron Followell
bfollowell is offline   Reply With Quote
Old 10-27-2010, 10:11 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,754
Karma: 54401244
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by bfollowell View Post
I have an epub that someone did a totally horrible formatting job on. Rather than have each paragraph defined as, of all things, a paragraph, basically each chapter is a paragragh with breaks and extra line spacing between what should be paragraphs. I've tried to clean it up a little by converting with Calibre but have had no luck at all. Since each chapter is really just one giant paragraph, there is no way to remove the extra spacing between the paragraph groups. They aren't really paragraphs as far as Calibre is concerned so there's nothing to remove.

I'm almost afraid to ask but, is there an easy way to removed all these breaks and clean this up or am I basically going to have to edit this whole 500 page book by hand in order to clean it up?

I'd love to hear any advice anyone may have to offer.

Thanks.

- Byron Followell
Kinda hard to come up with a precise solution just from your description.

Are you looking at the file in Code View?
Are these line spaces: <br /> You could try replacing each one with a </p> <p> <- Note you are ending the previous and starting another.

Also note that a replace All may get the few legitimate "Breaks" in the overhaul.

Second pass, then you need to decide what to do with multiple "empty Paragraphs (<p></p>) that are left.

Other than that, you might need to craft a regex that "finds the uniqueness" of those breaks.
For each successful "cleaning pass", do a save. Abort (reload without save) if you get undesired results.
Clean in little fixes and multiple passes instead of trying for "just one Replace"


BTW:
Have you tried saving the file as a Text (no HTML, Looks as it would print), then have Calibre convert using the best choice:
1) treat multiple blank lines as para markers.
2)removing extra blank lines and treat indents as Para markers
theducks is offline   Reply With Quote
Advert
Old 10-27-2010, 10:17 PM   #3
bfollowell
Fanatic
bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.
 
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
Quote:
Originally Posted by theducks View Post

BTW:
Have you tried saving the file as a Text (no HTML, Looks as it would print), then have Calibre convert using the best choice:
1) treat multiple blank lines as para markers.
2)removing extra blank lines and treat indents as Para markers
Thanks for the advice. I'll try the replace slowly and see how it goes. It will still be a long, drawn-out process but not quite the same as taking care of each paragraph manually.

And, no I really don't want to try converting to text because then I'll lose what good formatting was done (bold, italics, etc.)

Thanks for your help.

We'll see if anyone comes up with something more.

- Byron
bfollowell is offline   Reply With Quote
Old 10-27-2010, 10:29 PM   #4
bfollowell
Fanatic
bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.
 
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
Well, your suggestion helped me a lot more than I thought it might when I saw it initially. Basically, I just went in, pulled up a typical chapter and did a search for:

</span><br />
<br />
<span

This is the common code between two typical paragraphs. I found 230 occurrences. I replaced them all with:

</p>
<p

I fixed the first and last paragraphs by hand and that pretty much took care of the worst of it. There'll be some more tweaking and cleaning to get it just right and I'm sure a few things may get messed up but this will take care of the bulk of my problems a lot faster than I was worried it might take me.

Thanks again for your suggestion.

- Byron
bfollowell is offline   Reply With Quote
Old 10-28-2010, 12:44 AM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,754
Karma: 54401244
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by bfollowell View Post
Well, your suggestion helped me a lot more than I thought it might when I saw it initially. Basically, I just went in, pulled up a typical chapter and did a search for:

</span><br />
<br />
<span

This is the common code between two typical paragraphs. I found 230 occurrences. I replaced them all with:

</p>
<p

I fixed the first and last paragraphs by hand and that pretty much took care of the worst of it. There'll be some more tweaking and cleaning to get it just right and I'm sure a few things may get messed up but this will take care of the bulk of my problems a lot faster than I was worried it might take me.

Thanks again for your suggestion.

- Byron
Glad it worked out.
theducks is offline   Reply With Quote
Advert
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
The Adventures of Joe Nobody and the Badly Formatted Epub mklynds Sigil 44 01-30-2013 02:43 PM
iPad stanza + calibre: *horribly* slow vonCZ Apple Devices 9 08-23-2010 02:50 AM
Books Particularly Well-Formatted for Kindle geneven Amazon Kindle 11 11-05-2009 03:48 PM
Properly formatted PDFs to Epub AgentBEATS Calibre 10 11-01-2009 11:02 PM
nicely formatted plays quillaja ePub 3 09-28-2009 08:03 PM


All times are GMT -4. The time now is 05:41 AM.


MobileRead.com is a privately owned, operated and funded community.