![]() |
#1 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,951
Karma: 168802811
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Issue opening HTML file
I'd picked up an ebook (well, 3/4 of an ebook) from Baen and went to open it's html files in Sigil. Sadly, it refused to open and gave an error message suggesting that I change the clean source preference to Pretty Print Tidy or HTML Tidy and reloading the file. See attached image. While neither Pretty Print Gumbo or Google Gumbo-Parser worked to allow opening the file, the message might be changed to reflect those two options.
|
![]() |
![]() |
![]() |
#2 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,493
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Hi,
We will update that error message. But you turned on Clean on Open, and still nothing could open that .htm file? Wow! Was it encrypted? Google's Gumbo parser has been able to read/parse literally billions of pages on the web. I would be interested in knowing why that page could not be parsed. KevinH |
![]() |
![]() |
![]() |
#3 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,373
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I'd be willing to purchase this book from Baen in order to troubleshoot, if you (@DNSB) could tell me the exact format of the 3/4 version of this book you purchased.
EDIT: actually, never mind. It doesn't appear that the 3/4 version of the book is available any more. Last edited by DiapDealer; 12-11-2015 at 08:52 AM. |
![]() |
![]() |
![]() |
#4 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,493
Karma: 5703586
Join Date: Nov 2009
Device: many
|
FYI: That error message has now been fixed in Sigil master and the fix will appear in the next release.
|
![]() |
![]() |
![]() |
#5 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
You can try @jackie_w's ScrambleEbook: Getting help with copyrighted books troubleshooting utility.
Hopefully that will give you a copyright-free EPUB that can replicate the problem. EDIT: Oh wait, unpacked HTML? ![]() Last edited by eschwartz; 12-11-2015 at 11:33 AM. |
![]() |
![]() |
![]() |
#6 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,493
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Hi,
Found a sample chapter of that book on the Baen website and used its html and received the following when I tested it for being well-formed by Gumbo: -------- line: 2 col: 1 type 40 @1:1: This is not a legal doctype. @2:1: This is not a legal doctype. <!DOCTYPE html PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.2 Document//EN" "http://openebook.org/dtds/oeb-1.2/oebdoc12.dtd"> So all it is complaining about is the DOCTYPE not being an epub2 DOCTYPE. If you turn on Clean On Open in Sigil preferences it will still happily load and parse that file. Your error message in the image cites the exact same line so your problem is probably the same non-standard doctype as well. Simply turn on Cleaning and it should load just fine into CodeView. You can then modify it if need be, otherwise leave it alone. KevinH |
![]() |
![]() |
![]() |
#7 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,493
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Ah! I see the issue. Gumbo detects the bad docytpe but allows it to pass through unchanged which just triggers the issue again. I will have to modify the gumbo code to not pass through known bad doctypes. No doctype at all would be better,
If you cut and paste the the html with this doctype into Sigil, Preview and BookView will barf until it is fixed. I will look into fixing this. KevinH |
![]() |
![]() |
![]() |
#8 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,493
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Hi,
This is now fixed in master. Bad doctypes will no longer survive a gumbo parse/serialize sequence which means that Clean On Open with Gumbo will allow this book to load. Thanks for the bug report! KevinH |
![]() |
![]() |
![]() |
#9 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,951
Karma: 168802811
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Opening Ebooks from within HTML document | kerravonsen | PocketBook | 2 | 01-25-2014 09:03 PM |
beginner wants help with html not opening on notepad | m468949 | Sigil | 3 | 08-13-2013 04:37 AM |
HTML input plugin stripping text within toc tags in child html file | nimblebooks | Conversion | 3 | 02-21-2012 03:24 PM |
Kindle and Overdrive - how to use? Opening windows issue | netrate | Amazon Kindle | 8 | 01-04-2012 10:27 PM |
Convert HTML to MOBI (HTML recognized as ZIP file) | pdubois | Conversion | 1 | 01-25-2011 12:55 PM |