08-08-2018, 11:46 PM | #61 | |||||
Grand Sorcerer
Posts: 24,906
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Quote:
Quote:
Quote:
All that makes me amazed that your scripts are perfect and need no intervention. You are happy with your scripts. I am happy with my methods. Maybe if I was interested in collecting millions of books I would have to change the process. But, I am only collecting books that I have some expectation of reading in the near future. I rarely add more than a couple of books at a time. Quote:
Of course, maybe the problem is that we don't have the benefit of your "perfect" scripts. We have to stick with the terrible tools and processes that we have available. How about you publish your scripts so that we can all see the light? |
|||||
08-09-2018, 12:27 AM | #62 |
null operator (he/him)
Posts: 20,650
Karma: 26966376
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
|
Advert | |
|
08-09-2018, 12:37 AM | #63 | |||||||||||
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Well, looking back over the context, it was in response to your comment that read:
And my comment was: I hope that clears things up for you. Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Well, first of all, again, they aren't "my" scripts. I've adapted them to my needs. Second off, check the thread, the git-repo is listed. Last edited by sealbeater; 08-09-2018 at 12:41 AM. |
|||||||||||
08-09-2018, 02:58 AM | #64 | |||||||||
Grand Sorcerer
Posts: 24,906
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Quote:
So yes, I apparently "just don't know" and would need you to explain the difference. Quote:
Quote:
That file name is just an example. Amazon use their own names, anything downloaded with ADE another other stores use other names that might not be unique when looking for metadata. Quote:
Quote:
Quote:
Quote:
Quote:
And yes, that is deliberately insulting. Throughout this discussion you have gone to great lengths to say how perfect your scripts are and how they get perfect results and that anyone using calibre is an idiot (OK, my word but I don't think it is a stretch). But, those scripts are using calibre to download the metadata. And that means that the difference in what you do and someone using the calibre interface is doing is that you don't check the metadata before accepting it. What that means is that your definition of "perfect" is actually "whatever calibre finds for me". You also must be either only sourcing books from places that have already done some work on the file names to make metadata downloading reasonably accurate. Or are pre-editing them. Which of course you can't be as that would be "manually editing metadata". You did mention a script "ebook-reader-prep.sh" which isn't in that package. Maybe that is what is doing the hard work of preparing the books to be able to get "perfect" metadata. And just a last comment: I know how the calibre metadata download works. If you have two million books and have not vetted them before downloading the metadata using calibre and not checked them afterwards, you have some books with wrong metadata. The only chance that there are no errors is if you had an ISBN for every book before you tried to get the metadata. And to the moderators, I'll stop now. |
|||||||||
08-09-2018, 04:01 AM | #65 |
Grand Sorcerer
Posts: 6,262
Karma: 11768331
Join Date: Jun 2009
Location: Madrid, Spain
Device: Kobo Clara/Aura One/Forma,XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2
|
Duplicated with previous post.
|
Advert | |
|
08-09-2018, 05:35 AM | #66 |
null operator (he/him)
Posts: 20,650
Karma: 26966376
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Reading this thread for the past couple of weeks has been not unlike reading the Reader Comments on the UK Daily Telegraph Opinion pieces for the past couple of years
What was that Italian aphorism Voltaire famously used? Ah, yes "Le meglio è l'inimico del bene" BR |
08-09-2018, 11:12 AM | #67 |
Handy Elephant
Posts: 1,736
Karma: 26785668
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Thinkpad E595, Ubuntu Mate, Huawei Mediapad 5, Bouye Likebook Plus
|
OK!
I have tested it on around 100 books and I am quite impressed. I let it work on some books from my baen monthly bundles and some non-fiction pdfs and some random files. And the script organize-ebooks.sh did a great job. I will definitely start using this to preprocess all my ebook files before importing to calibre! I had it the script place books it was uncertain about in a specific folder. Only one book, a comic book I have no idea where I got, was placed there. But it was correctly renamed... However I never got -oft (output filename template) working. I tried several ways with, quotes in many different ways, including copied from the docs, but it never worked. So all the books are in a single folder. If you, sealbeater, could post a snippet showing how to get organize-ebooks.sh to use -oft and/or output books to subfolders it would be great! A few caveats: The scripts is intended for actual published books. With ISBN. Fan fiction or self-published stuff I suspect would be very hit-and-miss depending on how metadata is stored in the file. Specify -owi to attempt to organize books without ISBN. The WorldCat ISBN calibre plugin had to be rezipped to a more shallow structure to be installed in calibre. And it has a limit of 1000 lookups per day. Fine for my needs... Tesseract (OCR-software) in Ubuntu 18.04 is the latest 4.0, so it can be installed from normal repositories. Seems to OCR fine! If you have the source books in subfolders the subfolders are not removed as the renamed books are removed. Some familiarity with linux and bash scripting is needed. But, as said, I am very pleased. Thanks to sealbeater for making me look at these scripts a little closer. I've seen them before, but not tried them. Just point the script at a folder with books and have it shug away. A few seconds per book unless it has to be OCR:ed. Then import to calibre. I still have to manually organize by genre... Perhaps also download cover. Last edited by Adoby; 08-09-2018 at 11:23 AM. |
08-09-2018, 12:11 PM | #68 | |
creator of calibre
Posts: 43,966
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
To any innocent bystanders reading this, please do not use these scripts. You will place undue load on the various servers calibre downloads metadata from, making the site owners lives more difficult and making it more likely that they will implement more rigorous bot detection algorithms which in turn means that metadata download will stop working for all calibre users, not just users of these "scripts". And everybody that uses those services will have to solve more captchas. |
|
08-09-2018, 04:10 PM | #69 | |||
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Quote:
Quote:
--output-filename-template='"${d[AUTHORS]// & /, }/${d[AUTHORS]// & /, } - ${d[SERIES]:+[${d[SERIES]//:/ -}] - }${d[TITLE]//:/ -}${d[PUBLISHED]:+ (${d[PUBLISHED]%%-*})}${d[ISBN]:+ [${d[ISBN]}]}.${d[EXT]}"' Quote:
You can install proxychains and put "proxychains -q" in front of any script commands to proxy any connection requests through Tor and avoid IP banning. I hope you find it useful. Last edited by sealbeater; 08-09-2018 at 04:25 PM. |
|||
08-09-2018, 04:22 PM | #70 | |||||||||||||
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Quote:
Quote:
Quote:
Quote:
It is perfect. When it finds a match, it's matched, when it doesn't it doesn't. Perhaps you should try it for yourself and that would make things clearer. If you are capable of it. Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
The ebook-reader-prep.sh script is my wrapper around it. It does nothing special that anyone who was scripting for their own use would do. And just a last comment: Quote:
If only saying so made it so. Well, read other's comments on how it works and see for yourself. Or live in a reality created in your own mind. Up to you which. Last edited by sealbeater; 08-09-2018 at 04:51 PM. Reason: s/doesn't/does |
|||||||||||||
08-09-2018, 04:24 PM | #71 | |
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Quote:
|
|
08-09-2018, 05:30 PM | #72 |
Handy Elephant
Posts: 1,736
Karma: 26785668
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Thinkpad E595, Ubuntu Mate, Huawei Mediapad 5, Bouye Likebook Plus
|
I might throttle the script myself. Since it can run unattended I am fine with it matching one book per minute, or whatever, instead of one every few seconds. It can run while I sleep or work.
I got -otf working. I tried setting the environment variable inside the script, but never got it working. I resorted to making a script for the script and setting it as a parameter, and it works fine! |
08-09-2018, 09:47 PM | #73 | |
Grand Sorcerer
Posts: 24,906
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
You did comment on reading comprehension. And I can make the same comment. You seem to think that I am saying these scripts don't work. I have never said that. I questioned the word "perfect". And that is based on my understanding of calibre and how it fetches metadata. The simple fact is, that is not perfect. And the only way to get close to perfect is to already have nearly perfect metadata. You need the correct ISBN or a title and author that is correct. Any spelling mistakes in them and the likelihood of getting the wrong book, or no book, is high. Throw a colon in the title and again, the likelihood of an error is high. Especially if the first part is a series name. All that means is that you have to do some preprocessing to get the correct results. Your implication has always been that you just run the script against a directory and there are no errors. But, you finally seem to be admitting that there are and that you have to fix them some other way. After all, why would you have have a problem directory? You dismissed my statement about the file called "smashwords-epub-a030871b-dc97-493b-b4e8-7ebe19777523.epub" saying that I should try it and see. I don't need to try it, I can read code, I know how it works and I can see its limitations. But, what I wanted to know was how you handled these. Your statements imply that you don't do any manual processing but reply completely on the scripts. And that isn't possible with the file names that are coming from the shops. So, how do you handle this? There is nothing in that file name that tells you what the book is. And the book doesn't have an ISBN inside it. So, how do you handle this? Of course, you used someone elses test to show how this is "perfect". Of course, that person reported 99% accuracy, which last time I checked, isn't perfect. And to me that is worse, that 1 book that failed was actually correct but treated as an error. That demonstrates the scripts are not perfect. |
|
08-09-2018, 10:27 PM | #74 | |||||||||
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Quote:
Quote:
Quote:
Do you really think someone could have 2 million books...import half a million into calibre...decide that this method was a better way and blow out a 1.5 TB calbre dir and start over and not have some clue as to what they were doing? Perhaps more clue than you. That damn character flaw of mine again. Quote:
Quote:
binary deduplication. file name sanitizing (by which I mean file renaming, ie, removing all spaces and lower-casing filenames I then move all files into the root of the working dir and then remove all subdirectories. I make a temp dir and move the files in one by one and run the script on them. (I find it's better to do it this way then deal with filesystem limits when dealing with lots of small files.) This is all automated of course. Quote:
Quote:
EDIT: The script would probably resort to OCR'ing at that point and make a best guess...this would probably mean it would end up in the "uncertain" dir for later review even if correct. Of course, I consider this perfection..others who like to do tedious point and click work may disagree. Quote:
And yes, I used someone else's test. That seems to be the best way, when one can't test for themselves. Quote:
Or, rather than just running a script and going to sleep, you can point and click your gui all day...can't do it during work hours, although I can...can't do it while sleeping, although I can... Of course, you are free to believe that a person would turn that script loose on a 2 million ebook collection, without being very sure that it wouldn't end in disaster...well....please...feel free to believe whatever reality your mind can concoct. I'm glad it's not mine. Last edited by sealbeater; 08-09-2018 at 10:52 PM. |
|||||||||
08-10-2018, 02:37 AM | #75 |
Handy Elephant
Posts: 1,736
Karma: 26785668
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Thinkpad E595, Ubuntu Mate, Huawei Mediapad 5, Bouye Likebook Plus
|
My first test was very biased since it was made mainly on books I had earlier successfully imported to calibre. I have used the script on more books now and there are several more books in the uncertain folder. For some reason many mobi-files. Also the scripts doesn't even attempt to process periodicals like magazines. Junk files fail.
Still, I find the scripts very useful. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
The pros and cons of Sigil | slowsmile | Sigil | 44 | 02-15-2017 10:30 AM |
What Are Your Pros And Cons? | MorganM | Which one should I buy? | 22 | 09-26-2011 10:38 AM |
So you have a Kindle. Pros/Cons | emclinux | Amazon Kindle | 12 | 09-28-2010 08:20 AM |
Accessories Pros and Cons between 2 covers | F1Wild | Amazon Kindle | 5 | 07-08-2009 06:59 PM |