![]() |
#841 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,034
Karma: 147977995
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
The book I noticed the counting errors with...
Old Method - 169541 ICU Method - 168183 Word 2016 - 168187 For Word, I used Calibre to convert to text and loaded the text version into Word to get the count. So yes, the ICU method is a lot more accurate. |
![]() |
![]() |
![]() |
#842 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,826
Karma: 30277270
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
I thought the existence of a space adjacent to an ellipsis was germane to why ellipsis is there.
From memory, CMS has space before and after to indicate missing words, to indicate an unfinished sentence no space before, and period after. MLA puts [] around the ellipsis to indicate missing words, IMO that looks better if words are omitted from the start of a sentence. But as Jefferson said - "On matters of style swim with the current, on matters of principle stand like a rock." BR Last edited by BetterRed; 01-09-2016 at 02:22 AM. |
![]() |
![]() |
![]() |
#843 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,166
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
@davidfor,
Thanks for the beta. I run a test against some German EPUB files in different genres. As reference information I made a docx-conversion. Here are the results. 1. A Federal Agency for Civic Education document (Link:http://www.bpb.de/system/files/datei/APuZ_2015-52.epub) Count Page/Word Statistics Word count - old method: 30,349 Word count: 30,539 MS Word 2013 Statistics Words: 29,864 Spoiler:
2. Dale Brown - Außer Kontrolle (Thriller) isbn:9783641111038 Count Page/Word Statistics Word count - old method: 109,218 Word count: 106,689 MS Word 2013 Statistic Words: 107,033 Spoiler:
3. Umberto Eco - Die große Zukunft des Buches (non-fiction) isbn:9783446236165 Count Page/Word Statistics Word count - old method: 69,745 Word count: 68,944 MS Word 2013 Statistic Words: 68,916 Spoiler:
4. Helena Marten - Die Kaffeemeisterin (historical fiction) isbn:9783641059606 Count Page/Word Statistics Word count - old method: 149,472 Word count: 148,460 MS Word 2013 Statistics Words: 148,883 Spoiler:
|
![]() |
![]() |
![]() |
#844 |
Serpent Rider
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,123
Karma: 10219804
Join Date: Jun 2009
Device: Sony 350; Nook STR; Oasis
|
These past couple of pages are why I so infrequently post. Why be a complete bother to the creator of the plugin AND annoy others who use it? And all to no purpose...
|
![]() |
![]() |
![]() |
#845 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,135
Karma: 60406498
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
If I was getting paid 'by the word', I could see the need for 'the most accurate precision'. (Please, Please, do not get 'Weights-and Measures' involved. None of the current methods would be acceptable
![]() But I don't. I find ADE page estimates (I use a RMSDK device) meets my needs for book size, just like 'heft' did in the bookstore of days past. ![]() |
![]() |
![]() |
![]() |
#846 |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
|
![]() |
![]() |
![]() |
#847 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,826
Karma: 30277270
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
|
![]() |
![]() |
![]() |
#848 |
Connoisseur
![]() Posts: 86
Karma: 10
Join Date: Oct 2014
Device: Kindle Paperwhite 2
|
Bother the creator? The person that seems to be in charge of the plugin now said that he/she was bored and decided to look into it, it might as well has been ignored but it wasn't. Annoy users? Why are people annoyed by this? The benefit is marginal but this is nothing more than looking for a way of making the count more "accurate", I don't see the need to hate on this or consider it annoying in any way. I don't think this makes the user a "pedant", this kind of marginal changes are done every day in all kinds of projects and they improve them in the long term.
|
![]() |
![]() |
![]() |
#849 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,166
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
Quote:
![]() Quote:
At the end JSWolf is right to say there is something wrong in counting words as that is reality. On the other side there is no simple solution, as the different samples shows (and users who work with this information's know this very well too, e.g for technical dokumentation's the gap is much bigger). This discussion is a good one as it brings facts on the table for users and developer who like to think outside the box to find the optimum solution for their requirements. That is exactly why calibre is what it is, right? Best regards, DivingDuck Last edited by Divingduck; 01-10-2016 at 04:49 AM. |
||
![]() |
![]() |
![]() |
#850 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Others have commented, but personally, I see the last few pages as a healthy discussion about the plugin. Someone reported a possible problem, and the discussion has been about how useful it is to fix the bug. I like seeing this level of involvement as it helps to work out the best solution.
|
![]() |
![]() |
![]() |
#851 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,826
Karma: 30277270
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
The existing algorithm is NOT a bug - putting a Whitworth nut on a Metric bolt is a bug. But that JSWolf regards all opinions, other than his own, as bugs, is a proven fact ![]() It's an issue of which algorithm to use. The one that has stood those who use it in good stead for nigh on 5 years. Or one that has only become available in recent times. There would be no discussion, from me at least, if the proposal was to add an option to use the existing or the ICU algorithms when computing word counts. IMO, adding an option would be in the 'spirit' of the original developer, who usually (always ?) protected 'legacy' features. If at all possible, existing 'installs' would set the option to use the 'legacy' algorithm, new installs would default to the 'ICU' algorithm. Support forums are riddled with complaints about Apple, MS, Google etc blithely clobbering/discontinuing existing features. Less so with IBM, if you're minded you can definitely run IEBGENER and probably DISSOS or PROFS on your shiny new z/OS system. Facetiously, one might suggest an option to include the components of hyphenated words in the word count if they are present in designated dictionaries. Thus, the compound word 'so-called' would likely be counted as two words, whereas 'topsy-turvy' would likely be counted as one. But realistically one wouldn't — would one? =============== An unrelated feature I'd like to see in Count Pages, is an option to use the format file with the latest file system modification date as the basis for counting. In my workflow that would avoid in-flight conversions to EPUB - because in 99% of cases, I Convert from non-EPUB to EPUB immediately prior to running Count Pages. NB: EPUB is not even close to being near the top of my preferred input format list, although it is my designated output format. I rarely need to convert from EPUB, when I do it's unlikely I would then run Count Pages. I would typically attach the output format file to an email, send it, and then remove the format from the library. BR |
|
![]() |
![]() |
![]() |
#852 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,111
Karma: 34000001
Join Date: Mar 2008
Device: KPW1, KA1
|
For what it is worth, soft hyphenation with the Hyphenate This plugin messes up Count Pages (or the word count functions it uses). It counts a huge number of extra pages and words if you run it after running Hyphenate This.
Therefore Hyphenate This is the last plugin I run, and I only run it on the AZW3 format, which is the one I use for the Kindle Paperwhite 1. |
![]() |
![]() |
![]() |
#853 | ||||||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Quote:
Quote:
Saying that it has served the users for five years is a problem. The code for both methods is in calibre. Are you sure that the existing method hasn't changed in five years? Are you sure it won't change in the future? I'm a little surprised that when Kovid implemented the ICU method that he didn't remove the old method. Sure, he would have left the interface, but that would have just pointed to the new code. And for changes to the algorithm, if it had been implemented completely inside Count Pages and the issue was that, for example, the ellipses character was not in the list of word delimiter characters, I would have had no hesitation in adding it. Would you expect an option to keep the old in that case? Quote:
Quote:
Quote:
|
||||||
![]() |
![]() |
![]() |
#854 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,449
Karma: 27757438
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That is because the old code is only used by conversion heuristics and conversion heuristics is a bit of the conversion pipeline I dont maintain. I dislike making changes in other people's code unless there is some compelling reason to do so.
|
![]() |
![]() |
![]() |
#855 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,034
Karma: 147977995
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
What I posted are indeed true bugs. Those six words were counted as three words when they should be six words. This is not an opinion. It's a fact. If you don't like bug reports, just ignore them.
|
![]() |
![]() |
![]() |
Tags |
count, count pages, page count, pages, plugin |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1252 | 08-02-2025 09:53 AM |
[GUI Plugin] Open With | kiwidude | Plugins | 404 | 02-21-2025 05:42 AM |
[GUI Plugin] Quick Preferences | kiwidude | Plugins | 62 | 03-16-2024 11:47 PM |
[GUI Plugin] Kindle Collections (old) | meme | Plugins | 2070 | 08-11-2014 12:02 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |