10-27-2010, 09:45 PM | #1 |
Fanatic
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
|
Help with horribly formatted epub?
I have an epub that someone did a totally horrible formatting job on. Rather than have each paragraph defined as, of all things, a paragraph, basically each chapter is a paragragh with breaks and extra line spacing between what should be paragraphs. I've tried to clean it up a little by converting with Calibre but have had no luck at all. Since each chapter is really just one giant paragraph, there is no way to remove the extra spacing between the paragraph groups. They aren't really paragraphs as far as Calibre is concerned so there's nothing to remove.
I'm almost afraid to ask but, is there an easy way to removed all these breaks and clean this up or am I basically going to have to edit this whole 500 page book by hand in order to clean it up? I'd love to hear any advice anyone may have to offer. Thanks. - Byron Followell |
10-27-2010, 10:11 PM | #2 | |
Well trained by Cats
Posts: 29,782
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Are you looking at the file in Code View? Are these line spaces: <br /> You could try replacing each one with a </p> <p> <- Note you are ending the previous and starting another. Also note that a replace All may get the few legitimate "Breaks" in the overhaul. Second pass, then you need to decide what to do with multiple "empty Paragraphs (<p></p>) that are left. Other than that, you might need to craft a regex that "finds the uniqueness" of those breaks. For each successful "cleaning pass", do a save. Abort (reload without save) if you get undesired results. Clean in little fixes and multiple passes instead of trying for "just one Replace" BTW: Have you tried saving the file as a Text (no HTML, Looks as it would print), then have Calibre convert using the best choice: 1) treat multiple blank lines as para markers. 2)removing extra blank lines and treat indents as Para markers |
|
Advert | |
|
10-27-2010, 10:17 PM | #3 | |
Fanatic
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
|
Quote:
And, no I really don't want to try converting to text because then I'll lose what good formatting was done (bold, italics, etc.) Thanks for your help. We'll see if anyone comes up with something more. - Byron |
|
10-27-2010, 10:29 PM | #4 |
Fanatic
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
|
Well, your suggestion helped me a lot more than I thought it might when I saw it initially. Basically, I just went in, pulled up a typical chapter and did a search for:
</span><br /> <br /> <span This is the common code between two typical paragraphs. I found 230 occurrences. I replaced them all with: </p> <p I fixed the first and last paragraphs by hand and that pretty much took care of the worst of it. There'll be some more tweaking and cleaning to get it just right and I'm sure a few things may get messed up but this will take care of the bulk of my problems a lot faster than I was worried it might take me. Thanks again for your suggestion. - Byron |
10-28-2010, 12:44 AM | #5 | |
Well trained by Cats
Posts: 29,782
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
|
|
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
The Adventures of Joe Nobody and the Badly Formatted Epub | mklynds | Sigil | 44 | 01-30-2013 02:43 PM |
iPad stanza + calibre: *horribly* slow | vonCZ | Apple Devices | 9 | 08-23-2010 02:50 AM |
Books Particularly Well-Formatted for Kindle | geneven | Amazon Kindle | 11 | 11-05-2009 03:48 PM |
Properly formatted PDFs to Epub | AgentBEATS | Calibre | 10 | 11-01-2009 11:02 PM |
nicely formatted plays | quillaja | ePub | 3 | 09-28-2009 08:03 PM |