|  06-04-2020, 12:31 AM | #1 | 
| Bibliophagist            Posts: 47,971 Karma: 174315098 Join Date: Jul 2010 Location: Vancouver Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos | 
				
				Candidate for worse formatted epub
			 
			
			If anyone is interested in hanging Sigil on startup for several minutes and making the whole program crawl, check out the epub version of Radix by A. A. Attanasio which is currently a freebie from Arc Manor/Phoenix Pick.  Downloadable from Free Ebooks | Publisher's Pick. This abomination has a single 2MB text file with nothing to be seen but inline styles.
		 | 
|   |   | 
|  06-04-2020, 02:38 AM | #2 | 
| Klak            Posts: 174 Karma: 150374 Join Date: Sep 2011 Location: Belgrade, Serbia Device: many | 
			
			edit: In the header there is a name of "generator" aspose words ... which is probably responsible for strange code.
		 Last edited by najgori; 06-04-2020 at 10:42 AM. Reason: DiapDialer | 
|   |   | 
|  06-04-2020, 06:21 AM | #3 | |
| Resident Curmudgeon            Posts: 80,665 Karma: 150249619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 | Quote: 
 | |
|   |   | 
|  06-04-2020, 08:52 AM | #4 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Thank you for pointing out this horrible test case.  It is not just one big file, it is one super big line!!!  There are NO linebreaks anyplace. And Sigil is a line based editor.   It actually opens better in older versions (0.9.14) when mend on open is used so that at least that huge line is split into multiple lines. Perhaps adding a "bad epub" input plugin that can be used to do what old Sigil did (force at least some linebreaks into the file after block level tags). And I might also be able to add some ability to collect out inline styles into a separate css file. That is one messed up epub, and therefore a wonderful test case for improving Sigil. I will look into it. Last edited by KevinH; 06-04-2020 at 08:57 AM. | 
|   |   | 
|  06-04-2020, 08:56 AM | #5 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			To make clean-up less painful in Sigil, I did the following: 1. Open Sigil, turn off Preview 2. Use Sigil to load the epub 3. Go make coffee while Sigil loads the one huge line :-) 4. Immediately after it finally open, do nothing except: Right click in the window and select Mend 5. Go refill your coffee :-) Sigil should now be able to at least function a bit more rapidly now that the file is not just one giant single line. 6. Use Find and Replace to insert a Sigil Split Marker Code: <hr class="sigil_split_marker" /> 7. Turn back on Preview You are finally back to something still very horrible but workable. Last edited by KevinH; 06-04-2020 at 09:14 AM. | 
|   |   | 
|  06-04-2020, 09:08 AM | #6 | 
| Grand Sorcerer            Posts: 28,855 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			I'm hesitant to reprimand a long-standing member for what looks to me like a straight-up spam post, so I'll give you some time to explain the relevance of the link you posted to the ongoing discussion before I delete it. Are you suggesting the program you linked to was used to create the awful epub, or what?
		 Last edited by DiapDealer; 06-04-2020 at 09:20 AM. | 
|   |   | 
|  06-04-2020, 04:42 PM | #7 | 
| A Hairy Wizard            Posts: 3,394 Karma: 20212733 Join Date: Dec 2012 Location: Charleston, SC today Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire | 
			
			I used a non-sigil technique to make it usable in Sigil... I used notepad++ and replaced all the </p> tags with </p>\n to give it line breaks. It was fairly spry when opening in Sigil....then of course all the other steps Kevin mentioned. So, perhaps that can be a simple file integrity check when Sigil opens..... number of characters divided by number of lines MUST be below a certain threshold otherwise additional line breaks (\n) are automatically added. That doesn't adversely affect the document at all... | 
|   |   | 
|  06-04-2020, 05:11 PM | #8 | 
| Grand Sorcerer            Posts: 28,855 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			I've seen epubs created with Apple's Pages where the files were all one single line without breaks as well. Large swathes of code (often the entire body) with "white-space: pre" applied via css in some InDesign output also creates a similar problem. | 
|   |   | 
|  06-04-2020, 05:25 PM | #9 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			We will have to add some code on ImportEpub to analyze each html file and either run mend on it to inject newlines after each block level tags, or do the equivalent of what Turtle91 suggested.  The new policy of trying not to touch each file on load until absolutely necessary has hurt us a bit here.  Older Sigil would forcibly inject the newlines since "mend" on open was effectively always done during the move to our standard directory structure. Thoughts? | 
|   |   | 
|  06-04-2020, 05:51 PM | #10 | 
| Resident Curmudgeon            Posts: 80,665 Karma: 150249619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 | 
			
			Both the code and formatting are just awful. I have this abomination and it's really bad. An example is 3em indents.
		 Last edited by JSWolf; 06-04-2020 at 05:58 PM. | 
|   |   | 
|  06-04-2020, 05:56 PM | #11 | |
| Resident Curmudgeon            Posts: 80,665 Karma: 150249619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 | Quote: 
 | |
|   |   | 
|  06-05-2020, 04:12 AM | #12 | 
| Klak            Posts: 174 Karma: 150374 Join Date: Sep 2011 Location: Belgrade, Serbia Device: many | |
|   |   | 
|  06-05-2020, 12:52 PM | #13 | |
| Bibliophagist            Posts: 47,971 Karma: 174315098 Join Date: Jul 2010 Location: Vancouver Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos | Quote: 
 Basically, I did what Turtle91 did and used Notepad++ to add the line breaks (though I did use </p>\r\n), restructured to Sigil Norm and used the RemoveInLineStyles plugin to move the styles to a stylesheet. Some time spent with cleaning up that stylesheet muttering about people who use absolute value and their probable destination, dubious ancestry, repugnant morals, etc. Last edited by DNSB; 06-05-2020 at 01:03 PM. | |
|   |   | 
|  06-05-2020, 01:50 PM | #14 | 
| Zealot            Posts: 136 Karma: 432377 Join Date: Nov 2010 Location: USA Device: Kindle PW 10thGen, Kobo Clara HD | 
			
			I don't remember the order I did things, but with calibre I split the file on <H1> tags and ran beautify all files.   Then I ran check book and it complained that some of the files where too large, so I manually split them where there were "***" in the files until I got tired of doing them and it stopped complaining about the error... It does open faster now. I should read the book, but I am tired of looking at it for now. I will get back to it later. | 
|   |   | 
|  06-08-2020, 11:58 PM | #15 | |
| Guru            Posts: 681 Karma: 929286 Join Date: Apr 2014 Device: PW-3, iPad, Android phone | 
			
			The publisher advertises http://www.arcmanor.com/ Quote: 
 They publish some interesting and diverse books, but the layout and artwork are at best enthusiastic amateur. It's very hard for small publishers, if they aren't a ripoff vanity press or reformatting public domain text they've scraped from Gutenberg or Internet Archive. Or genre-of-the-month stuff like zombies/gay werewolves/LitRPG/survivalist gun porn. So still give them kudos for services to literature. But do your typesetting elsewhere. Last edited by AlanHK; 06-09-2020 at 12:04 AM. | |
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| The Adventures of Joe Nobody and the Badly Formatted Epub | mklynds | Sigil | 44 | 01-30-2013 02:43 PM | 
| EPUB files formatted okay XHTML not so much | condor | Nook Color & Nook Tablet | 13 | 04-29-2011 10:31 AM | 
| Help with horribly formatted epub? | bfollowell | Sigil | 4 | 10-28-2010 12:44 AM | 
| Mobipocket vs ePub: Why worse is better in ebook formats | anurag | News | 104 | 10-15-2010 04:28 PM | 
| Properly formatted PDFs to Epub | AgentBEATS | Calibre | 10 | 11-01-2009 11:02 PM |