![]() |
#1 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,168
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Candidate for worse formatted epub
If anyone is interested in hanging Sigil on startup for several minutes and making the whole program crawl, check out the epub version of Radix by A. A. Attanasio which is currently a freebie from Arc Manor/Phoenix Pick. Downloadable from Free Ebooks | Publisher's Pick. This abomination has a single 2MB text file with nothing to be seen but inline styles.
|
![]() |
![]() |
![]() |
#2 |
Klak
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
|
edit: In the header there is a name of "generator" aspose words ... which is probably responsible for strange code.
Last edited by najgori; 06-04-2020 at 10:42 AM. Reason: DiapDialer |
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,743
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,760
Karma: 5706256
Join Date: Nov 2009
Device: many
|
Thank you for pointing out this horrible test case. It is not just one big file, it is one super big line!!! There are NO linebreaks anyplace. And Sigil is a line based editor.
It actually opens better in older versions (0.9.14) when mend on open is used so that at least that huge line is split into multiple lines. Perhaps adding a "bad epub" input plugin that can be used to do what old Sigil did (force at least some linebreaks into the file after block level tags). And I might also be able to add some ability to collect out inline styles into a separate css file. That is one messed up epub, and therefore a wonderful test case for improving Sigil. I will look into it. Last edited by KevinH; 06-04-2020 at 08:57 AM. |
![]() |
![]() |
![]() |
#5 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,760
Karma: 5706256
Join Date: Nov 2009
Device: many
|
To make clean-up less painful in Sigil, I did the following:
1. Open Sigil, turn off Preview 2. Use Sigil to load the epub 3. Go make coffee while Sigil loads the one huge line :-) 4. Immediately after it finally open, do nothing except: Right click in the window and select Mend 5. Go refill your coffee :-) Sigil should now be able to at least function a bit more rapidly now that the file is not just one giant single line. 6. Use Find and Replace to insert a Sigil Split Marker Code:
<hr class="sigil_split_marker" /> 7. Turn back on Preview You are finally back to something still very horrible but workable. Last edited by KevinH; 06-04-2020 at 09:14 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,569
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I'm hesitant to reprimand a long-standing member for what looks to me like a straight-up spam post, so I'll give you some time to explain the relevance of the link you posted to the ongoing discussion before I delete it. Are you suggesting the program you linked to was used to create the awful epub, or what?
Last edited by DiapDealer; 06-04-2020 at 09:20 AM. |
![]() |
![]() |
![]() |
#7 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,347
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
I used a non-sigil technique to make it usable in Sigil... I used notepad++ and replaced all the </p> tags with </p>\n to give it line breaks. It was fairly spry when opening in Sigil....then of course all the other steps Kevin mentioned.
So, perhaps that can be a simple file integrity check when Sigil opens..... number of characters divided by number of lines MUST be below a certain threshold otherwise additional line breaks (\n) are automatically added. That doesn't adversely affect the document at all... |
![]() |
![]() |
![]() |
#8 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,569
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I've seen epubs created with Apple's Pages where the files were all one single line without breaks as well.
Large swathes of code (often the entire body) with "white-space: pre" applied via css in some InDesign output also creates a similar problem. |
![]() |
![]() |
![]() |
#9 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,760
Karma: 5706256
Join Date: Nov 2009
Device: many
|
We will have to add some code on ImportEpub to analyze each html file and either run mend on it to inject newlines after each block level tags, or do the equivalent of what Turtle91 suggested. The new policy of trying not to touch each file on load until absolutely necessary has hurt us a bit here. Older Sigil would forcibly inject the newlines since "mend" on open was effectively always done during the move to our standard directory structure.
Thoughts? |
![]() |
![]() |
![]() |
#10 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,743
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Both the code and formatting are just awful. I have this abomination and it's really bad. An example is 3em indents.
Last edited by JSWolf; 06-04-2020 at 05:58 PM. |
![]() |
![]() |
![]() |
#11 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,743
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
![]() |
![]() |
![]() |
#12 |
Klak
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
|
|
![]() |
![]() |
![]() |
#13 | |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,168
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
Basically, I did what Turtle91 did and used Notepad++ to add the line breaks (though I did use </p>\r\n), restructured to Sigil Norm and used the RemoveInLineStyles plugin to move the styles to a stylesheet. Some time spent with cleaning up that stylesheet muttering about people who use absolute value and their probable destination, dubious ancestry, repugnant morals, etc. Last edited by DNSB; 06-05-2020 at 01:03 PM. |
|
![]() |
![]() |
![]() |
#14 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 136
Karma: 432377
Join Date: Nov 2010
Location: USA
Device: Kindle PW 10thGen, Kobo Clara HD
|
I don't remember the order I did things, but with calibre I split the file on <H1> tags and ran beautify all files.
Then I ran check book and it complained that some of the files where too large, so I manually split them where there were "***" in the files until I got tired of doing them and it stopped complaining about the error... It does open faster now. I should read the book, but I am tired of looking at it for now. I will get back to it later. |
![]() |
![]() |
![]() |
#15 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 681
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
|
The publisher advertises http://www.arcmanor.com/
Quote:
They publish some interesting and diverse books, but the layout and artwork are at best enthusiastic amateur. It's very hard for small publishers, if they aren't a ripoff vanity press or reformatting public domain text they've scraped from Gutenberg or Internet Archive. Or genre-of-the-month stuff like zombies/gay werewolves/LitRPG/survivalist gun porn. So still give them kudos for services to literature. But do your typesetting elsewhere. Last edited by AlanHK; 06-09-2020 at 12:04 AM. |
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
The Adventures of Joe Nobody and the Badly Formatted Epub | mklynds | Sigil | 44 | 01-30-2013 02:43 PM |
EPUB files formatted okay XHTML not so much | condor | Nook Color & Nook Tablet | 13 | 04-29-2011 10:31 AM |
Help with horribly formatted epub? | bfollowell | Sigil | 4 | 10-28-2010 12:44 AM |
Mobipocket vs ePub: Why worse is better in ebook formats | anurag | News | 104 | 10-15-2010 04:28 PM |
Properly formatted PDFs to Epub | AgentBEATS | Calibre | 10 | 11-01-2009 11:02 PM |