|
|
#1 |
|
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 48,476
Karma: 174510106
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Issue opening HTML file
I'd picked up an ebook (well, 3/4 of an ebook) from Baen and went to open it's html files in Sigil. Sadly, it refused to open and gave an error message suggesting that I change the clean source preference to Pretty Print Tidy or HTML Tidy and reloading the file. See attached image. While neither Pretty Print Gumbo or Google Gumbo-Parser worked to allow opening the file, the message might be changed to reflect those two options.
|
|
|
|
|
|
#2 |
|
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,083
Karma: 6379190
Join Date: Nov 2009
Device: many
|
Hi,
We will update that error message. But you turned on Clean on Open, and still nothing could open that .htm file? Wow! Was it encrypted? Google's Gumbo parser has been able to read/parse literally billions of pages on the web. I would be interested in knowing why that page could not be parsed. KevinH |
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,913
Karma: 207182180
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I'd be willing to purchase this book from Baen in order to troubleshoot, if you (@DNSB) could tell me the exact format of the 3/4 version of this book you purchased.
EDIT: actually, never mind. It doesn't appear that the 3/4 version of the book is available any more. Last edited by DiapDealer; 12-11-2015 at 09:52 AM. |
|
|
|
|
|
#4 |
|
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,083
Karma: 6379190
Join Date: Nov 2009
Device: many
|
FYI: That error message has now been fixed in Sigil master and the fix will appear in the next release.
|
|
|
|
|
|
#5 |
|
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
You can try @jackie_w's ScrambleEbook: Getting help with copyrighted books troubleshooting utility.
Hopefully that will give you a copyright-free EPUB that can replicate the problem. EDIT: Oh wait, unpacked HTML?
Last edited by eschwartz; 12-11-2015 at 12:33 PM. |
|
|
|
| Advert | |
|
|
|
|
#6 |
|
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,083
Karma: 6379190
Join Date: Nov 2009
Device: many
|
Hi,
Found a sample chapter of that book on the Baen website and used its html and received the following when I tested it for being well-formed by Gumbo: -------- line: 2 col: 1 type 40 @1:1: This is not a legal doctype. @2:1: This is not a legal doctype. <!DOCTYPE html PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.2 Document//EN" "http://openebook.org/dtds/oeb-1.2/oebdoc12.dtd"> So all it is complaining about is the DOCTYPE not being an epub2 DOCTYPE. If you turn on Clean On Open in Sigil preferences it will still happily load and parse that file. Your error message in the image cites the exact same line so your problem is probably the same non-standard doctype as well. Simply turn on Cleaning and it should load just fine into CodeView. You can then modify it if need be, otherwise leave it alone. KevinH |
|
|
|
|
|
#7 |
|
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,083
Karma: 6379190
Join Date: Nov 2009
Device: many
|
Ah! I see the issue. Gumbo detects the bad docytpe but allows it to pass through unchanged which just triggers the issue again. I will have to modify the gumbo code to not pass through known bad doctypes. No doctype at all would be better,
If you cut and paste the the html with this doctype into Sigil, Preview and BookView will barf until it is fixed. I will look into fixing this. KevinH |
|
|
|
|
|
#8 |
|
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,083
Karma: 6379190
Join Date: Nov 2009
Device: many
|
Hi,
This is now fixed in master. Bad doctypes will no longer survive a gumbo parse/serialize sequence which means that Clean On Open with Gumbo will allow this book to load. Thanks for the bug report! KevinH |
|
|
|
|
|
#9 |
|
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 48,476
Karma: 174510106
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Opening Ebooks from within HTML document | kerravonsen | PocketBook | 2 | 01-25-2014 10:03 PM |
| beginner wants help with html not opening on notepad | m468949 | Sigil | 3 | 08-13-2013 05:37 AM |
| HTML input plugin stripping text within toc tags in child html file | nimblebooks | Conversion | 3 | 02-21-2012 04:24 PM |
| Kindle and Overdrive - how to use? Opening windows issue | netrate | Amazon Kindle | 8 | 01-04-2012 11:27 PM |
| Convert HTML to MOBI (HTML recognized as ZIP file) | pdubois | Conversion | 1 | 01-25-2011 01:55 PM |