View Single Post
Old 08-09-2018, 10:27 PM   #74
sealbeater
Banned
sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.
 
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
Quote:
Originally Posted by davidfor View Post
Yes, it was a pretty stupid thing to say considering how I knew you would answer. Basically everything you said was a condescending little quip or something to avoid an actual answer. But, maybe I didn't make things clear enough..
I'm sorry, did you try it? Do you know what you are talking about? Are you basing your responses to me on what you *know* is actual reality or only what you *think* reality is?

Quote:
Originally Posted by davidfor View Post
You did comment on reading comprehension. And I can make the same comment. You seem to think that I am saying these scripts don't work. I have never said that. I questioned the word "perfect". And that is based on my understanding of calibre and how it fetches metadata.
So, not based on any observable repeatable tests you have conducted yourself.

Quote:
Originally Posted by davidfor View Post
The simple fact is, that is not perfect. And the only way to get close to perfect is to already have nearly perfect metadata. You need the correct ISBN or a title and author that is correct.
Yes, I think that is understood by all. You aren't giving anyone any new revelations just like you didn't when you said I would be IP banned for hammering websites.

Do you really think someone could have 2 million books...import half a million into calibre...decide that this method was a better way and blow out a 1.5 TB calbre dir and start over and not have some clue as to what they were doing?


Perhaps more clue than you. That damn character flaw of mine again.

Quote:
Originally Posted by davidfor View Post
Any spelling mistakes in them and the likelihood of getting the wrong book, or no book, is high. Throw a colon in the title and again, the likelihood of an error is high. Especially if the first part is a series name.
So boring....so very very very tedious and boring...like arguing with a child.

Quote:
Originally Posted by davidfor View Post
All that means is that you have to do some preprocessing to get the correct results. Your implication has always been that you just run the script against a directory and there are no errors. But, you finally seem to be admitting that there are and that you have to fix them some other way. After all, why would you have have a problem directory?
The only preprocessing I do is the following (and I say this for other's edification, not your own):

binary deduplication.
file name sanitizing (by which I mean file renaming, ie, removing all spaces and lower-casing filenames
I then move all files into the root of the working dir and then remove all subdirectories.

I make a temp dir and move the files in one by one and run the script on them. (I find it's better to do it this way then deal with filesystem limits when dealing with lots of small files.)

This is all automated of course.

Quote:
Originally Posted by davidfor View Post
You dismissed my statement about the file called "smashwords-epub-a030871b-dc97-493b-b4e8-7ebe19777523.epub" saying that I should try it and see. I don't need to try it, I can read code, I know how it works and I can see its limitations.
Oh dear. I dismissed your statement about the file called "mashwords-epub-a030871b-dc97-493b-b4e8-7ebe19777523.epub", because I, unlike you, know that the file name doesn't matter to the script. It could be called "7aLzN3dNqlUcqYVv1e+WyWXGfn+e0IuTA1CwJO/pm0NOxkkdc+i5gqdpB9zA.epub" and it would be identified correctly. I know this because I have tried it for myself and because of that, i know you haven't. Therefore, one of us knows what he is talking about and the other doesn't. i wonder who...?

Quote:
Originally Posted by davidfor View Post
But, what I wanted to know was how you handled these. Your statements imply that you don't do any manual processing but reply completely on the scripts. And that isn't possible with the file names that are coming from the shops. So, how do you handle this? There is nothing in that file name that tells you what the book is. And the book doesn't have an ISBN inside it. So, how do you handle this?
If the book doesn't have an ISBN, that's going to be tricky. Let's give it a proper test. Since you seem incapable of testing this for yourself and finding out for yourself, a sad position to be in, depending on others to discover truth for you, why don't you send me the book and I'll run the script on it and we'll all find out. i too, am curious now.


EDIT: The script would probably resort to OCR'ing at that point and make a best guess...this would probably mean it would end up in the "uncertain" dir for later review even if correct.

Of course, I consider this perfection..others who like to do tedious point and click work may disagree.

Quote:
Originally Posted by davidfor View Post
Of course, you used someone elses test to show how this is "perfect". Of course, that person reported 99% accuracy, which last time I checked, isn't perfect.
There goes that reading comprehension again. The *comic* was correctly renamed but yes, it was placed in the "uncertain" directory so the user could review.

And yes, I used someone else's test. That seems to be the best way, when one can't test for themselves.


Quote:
Originally Posted by davidfor View Post
And to me that is worse, that 1 book that failed was actually correct but treated as an error. That demonstrates the scripts are not perfect.
Well then, you have met my expectations of you. The script errs on the side of caution. It didn't treat it as an "error", it treated it as "uncertain". That's exactly what it's supposed to do. Unless you prefer things not to err on the side of caution, esp when automated? That doesn't seem too wise to me. Everything that gets put in the "certain" dir is extremely accurate.

Or, rather than just running a script and going to sleep, you can point and click your gui all day...can't do it during work hours, although I can...can't do it while sleeping, although I can...

Of course, you are free to believe that a person would turn that script loose on a 2 million ebook collection, without being very sure that it wouldn't end in disaster...well....please...feel free to believe whatever reality your mind can concoct. I'm glad it's not mine.

Last edited by sealbeater; 08-09-2018 at 10:52 PM.
sealbeater is offline   Reply With Quote