![]() |
#976 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
Assuming GBS anchors are automated trash, otherwise known as "artifacts", then they can be safely gotten rid of and it is appropriate to do so. And everyone else knows this already. Please don't muddy the waters. |
|
![]() |
![]() |
![]() |
#977 |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Whether the anchors are trash or valuable, it is always nice to have the option to get rid of them.
|
![]() |
![]() |
Advert | |
|
![]() |
#978 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
|
Quote:
1. All GBS.*.* anchors look alike. 2. Only some of those anchors are used in pagemapping. The rest appear to be garbage except (possibly) in a Google ebook reader app. 3. The GBS-related pagemap appears to be garbage, with the same caveat. 4. My GBS pagemap data comes from a very small sample, from which I am reluctant to generalize. Thus, it is difficult to separate the pagemapped GBS anchors from the rest, and that may not even be desirable - so, at present, the posted beta routine does not give the option to preserve any GBS anchors. (Well, aside from the "don't use the routine at all" option.) That is why I'd like to get more feedback on that point before sanctifying this version as a proper release. I'm fine with removing Google app-specific bloat, but not actually useful code. |
|
![]() |
![]() |
![]() |
#979 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,763
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I'm away for two weeks, but when I get back, I'll download all of my Google ePub and test whatever the latest beta is.
|
![]() |
![]() |
![]() |
#980 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
|
I'm in no particular hurry; I have several projects on my plate right now. I'm particularly interested in these cases:
- Books with non-GBS pagemaps. - GBS pagemaps that conform to physical page counts in some way. - Books with multiple pagemaps, GBS or otherwise, if that's even possible. - Any case where the GBS anchors or pagemap is actually useful, such that removing them is a Bad Idea. - Any kind of false positive, where something is affected that should not be. - Weird "unpretty" results, especially in EPUB3 books (since they have new tags). |
![]() |
![]() |
Advert | |
|
![]() |
#981 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,763
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
![]() |
![]() |
![]() |
#982 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
|
Quote:
I wish I could think of a way to automatically detect and discard pointlessly-nested DIV elements. Several Baen books show this bug, in which the first chapter's contents are enclosed in one DIV, the next is wrapped in two DIVs, and by the end of the book, you've got thirty or more extraneous DIVs nested around the same kind of content. Removing them does no harm, and may improve performance in some situations. |
|
![]() |
![]() |
![]() |
#983 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,252
Karma: 16544692
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Quote:
|
|
![]() |
![]() |
![]() |
#984 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
|
Oh, I can get rid of them easily in the editor - two regex operations take care of things nicely - but that's the thing. I have to manually identify the problem and check for breakage afterward. Details, details...
![]() See, in this particular class of cases, the nesting spans the entire BODY element, and the DIVs have no attached classes. It's literally a matter of stripping out opening DIV tags that come immediately after the opening BODY tag, and doing likewise for the closing ones. Simple and painless, but not universally applicable. Last edited by Rev. Bob; 07-11-2015 at 09:59 PM. |
![]() |
![]() |
![]() |
#985 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 977
Karma: 2209358
Join Date: Nov 2011
Location: London, UK
Device: Kobo Aura, Kobo Aura ONE, PocketBook InkPad Color 3
|
I'm testing the plugin in a clean library with copies of the original Google Play books with stripped DRM. I am going through them systematically to see what the plugin is doing.
I found 2 books with a pageList and a GBS pagemap, both using different anchors (id="page-.*" and id="GBS.*") and in both cases the new plugin correctly removed the GBS pagemap and anchors, but left the pageList intact. So far so good :-) However I'm not sure about what multiple pagemaps look like, so don't know how to check these. |
![]() |
![]() |
![]() |
#986 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
|
Quote:
With multiple pagemaps, my main concern is what happens when a book that already has a pagemap gets picked up by Google Play. Does the old one get junked in favor of the new, or is it still there in some form? If it's junked, that's a bad thing, but it's on Google; nothing I can do about it. If it's still there, though, it ought to be restored if possible... |
|
![]() |
![]() |
![]() |
#987 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 977
Karma: 2209358
Join Date: Nov 2011
Location: London, UK
Device: Kobo Aura, Kobo Aura ONE, PocketBook InkPad Color 3
|
Unfortunately I've just found a book that the test plugin has slightly mangled, producing invalid HTML in two files.
I'm attaching a zip containing the original files, which look like they are nearly XHTML except I thought empty XHTML elements had a start tag ending with " />". These just end "/>", i.e. without the space. But if you're using a real XML parser that shouldn't matter. In both cases the plugin leaves an open <div> element at the end of the text. Sigil complains, and the FlightCrew validator complains. On the plus side, at least the bogus GBS anchors are gone ![]() |
![]() |
![]() |
![]() |
#988 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
|
I have a couple of suspicions about what's going on, but could you post an "after" ZIP of those two files as well, so I can see exactly what the routine's doing to them?
Meanwhile, as a workaround, you might try running the book through the "unpretty" routine first. I have a hunch that the <div></div><div/> line is causing some of the trouble, and passing the book through "unpretty" should split that up so it can be handled correctly. It's a hack, but it may get the job done until I'm back on my feel and able to look at it in depth. |
![]() |
![]() |
![]() |
#989 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 977
Karma: 2209358
Join Date: Nov 2011
Location: London, UK
Device: Kobo Aura, Kobo Aura ONE, PocketBook InkPad Color 3
|
Yes, your hunch was right. If I do a modify run to "depretty" the book before a second modify run to remove the Kobo/Google gunk, the output files are valid.
(If I do a single modify run to depretty *and* remove the Kobo/Google gunk, I still get mangling.) Attached are the two files from the original mangled run. |
![]() |
![]() |
![]() |
#990 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
|
I think I see the problem, and it should be an easy fix. Not to get too detailed, but I believe the current code uses one line to look for three patterns, with the middle one mandatory and the outer ones optional. The trouble is, the outer ones are supposed to be optional together, not separately - and in these cases, one of the outer patterns (the closing /DIV) gets detected without the other (the opening DIV). The code then removes the anchor and closing tag, thus creating a mismatch.
If that's correct, I just need to adjust the code to make it two lines instead of one: one with the anchor enclosed by the DIV tags (not optional), followed by one with just the anchor. Since neither new pattern allows for a mismatch, the problem should be solved. I should get a chance to look into that tomorrow (er, later today) sometime - thanks for the useful bug report! |
![]() |
![]() |
![]() |
Tags |
modify epub |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1251 | 07-07-2025 09:13 PM |
[GUI Plugin] Open With | kiwidude | Plugins | 404 | 02-21-2025 05:42 AM |
[GUI Plugin] Manage Series | kiwidude | Plugins | 167 | 07-28-2024 03:07 PM |
Modify ePub plugin dev thread | kiwidude | Development | 346 | 09-02-2013 05:14 PM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |