Pros & Cons of using the Calibre GUI - Page 5

davidfor · 08-08-2018, 11:46 PM

Quote:

Originally Posted by sealbeater

it was, I'm not sure where that came from, perhaps an error during editing.

It was in your post, so you typed it.

Quote:

I said cute because you seem to be implying a gui window full of text is analogous to a ssh terminial session. That's....very debatable.

If they are displaying the same information, how are they not analogous?

Quote:

Fair enough. I would never force my way on anyone, I will however talk about my way.

I do think my paradigm is better than everyone else's. I apologize for being insulting...I was a bit on the defensive. I hope you can forgive me. But yea, my way is better. For me, because I know most of you aren't there but better in an overall sense as well.

In that case, check what you are typing. It is coming off as "there is only one way to do it and everyone else is doing it wrong and are idiots".

Quote:

It's automated. I don't have to do anything but feed it a directory of books. I can spawn as many threads as my hardware and network can handle. I don't have to hand edit anything. I don't even have to check. If it doesn't find enough information to identify the book positively, it doesn't get moved to my sorted dir.

I don't have to. That's the whole point. My scripts so far, are perfect. Every time I add a new book, they get exactly the right metadata, I don't need to examine it, why would I?

It amazes me you guys do it any other way.

And it amazes me that you have some magical source of metadata that is perfect. Or it means you have prepared the books ahead of time to make sure that the data that is being used for the lookup the metadata is perfect. Or maybe your source for the books has already done the clean-up. If that is the case, then it explains the difference in how we work. I am importing a book from some source that may or may not have an accurate title or author. Or include an ISBN. You just need to look at the inconsistent titles used by the shops and other sources. I first need to fix those details. And only then can I expect to get the right metadata from the metadata sources I use. And even then, those sources can be inconsistent. The series name is the most common problem (Ben Aaronovitch has a series that is called "Rivers of London", "Peter Grant" or "PC Peter Grant" depending on the source).

All that makes me amazed that your scripts are perfect and need no intervention.

You are happy with your scripts. I am happy with my methods. Maybe if I was interested in collecting millions of books I would have to change the process. But, I am only collecting books that I have some expectation of reading in the near future. I rarely add more than a couple of books at a time.

Quote:

What do you need? You have the series, the title, the author, the year it was published and the ISBN. I could adjust for more but I have no need to. If you had that need, I'm sure a way could be found to include it.

If I want Stephen King, it's not hard to find...if I want The Expanse, it's not hard to find. If I want Odd Thomas or Xanth, same.

But, that isn't how I choose a book. Yes, sometimes I go looking for the next in a series or an author or a particular book. But, I'm much more likely to be after a genre or style or something. Generally, I look at the list of books and see what catches my eye. It might be the title, the author or the series. Or I filter the list a bit based on something. And in the past when I did this with paper books, it might be the cover (or simply what was easy to grab on my way out the door for my commute to work when I remembered I was nearly finished my current book). Something catches my attention and I look at the details. Using calibre or my ereader, it is easy to read the synopsis and maybe a quick glance at the tags. And maybe the page count if I'm after a quick read. The choice of a book to read is based on my mood and seeing some that fits that. That doesn't work well with the file listing, unless, as I suggested, that file name was several thousands of characters long.

Of course, maybe the problem is that we don't have the benefit of your "perfect" scripts. We have to stick with the terrible tools and processes that we have available. How about you publish your scripts so that we can all see the light?

BetterRed · 08-09-2018, 12:27 AM

Quote:

Originally Posted by davidfor

How about you publish your scripts so that we can all see the light?

@davidfor - see post #54.

BR

sealbeater · 08-09-2018, 12:37 AM

Quote:

Originally Posted by davidfor

It was in your post, so you typed it.

Well, looking back over the context, it was in response to your comment that read:

Quote:

Originally Posted by davidfor

And I know your response will be you use 100% of the window.

And my comment was:

Quote:

Originally Posted by sealbeater

That actually wasn't my response.

I hope that clears things up for you.

Quote:

Originally Posted by davidfor

If they are displaying the same information, how are they not analogous?

Well, if you don't know, you just don't know.

Quote:

Originally Posted by davidfor

In that case, check what you are typing. It is coming off as "there is only one way to do it and everyone else is doing it wrong and are idiots".

I think my way is better. I've been able to discuss the benefits so far. Perhaps there is a better way. Perhaps *it* is a better way. How will we know if we don't discuss?

Quote:

Originally Posted by davidfor

And it amazes me that you have some magical source of metadata that is perfect. Or it means you have prepared the books ahead of time to make sure that the data that is being used for the lookup the metadata is perfect. Or maybe your source for the books has already done the clean-up.

Or, none of the above.

Quote:

Originally Posted by davidfor

If that is the case, then it explains the difference in how we work. I am importing a book from some source that may or may not have an accurate title or author. Or include an ISBN.

Yea, it handles that. If it comes across a book it's unsure of, it moves it to a "uncertain" directory.

Quote:

Originally Posted by davidfor

You just need to look at the inconsistent titles used by the shops and other sources. I first need to fix those details.

You aren't hearing me. Regardless of the source, after the scripts are done, the title is the same.

Quote:

Originally Posted by davidfor

And only then can I expect to get the right metadata from the metadata sources I use. And even then, those sources can be inconsistent. The series name is the most common problem (Ben Aaronovitch has a series that is called "Rivers of London", "Peter Grant" or "PC Peter Grant" depending on the source).

Still not hearing me.

Quote:

Originally Posted by davidfor

All that makes me amazed that your scripts are perfect and need no intervention.

I'm sorry you live in such a bleak and deary world that an automated script working as intended is greeted with amazement.

Quote:

Originally Posted by davidfor

You are happy with your scripts. I am happy with my methods. Maybe if I was interested in collecting millions of books I would have to change the process. But, I am only collecting books that I have some expectation of reading in the near future. I rarely add more than a couple of books at a time.

What makes you think I'm not reading the books I'm collecting? I have lots of time since I don't have to worry about manually editing metadata.

Quote:

Originally Posted by davidfor

But, that isn't how I choose a book. Yes, sometimes I go looking for the next in a series or an author or a particular book. But, I'm much more likely to be after a genre or style or something. Generally, I look at the list of books and see what catches my eye. It might be the title, the author or the series.

Yea, I do the same. This is no impediment to that at all.

Quote:

Originally Posted by davidfor

Or I filter the list a bit based on something. And in the past when I did this with paper books, it might be the cover (or simply what was easy to grab on my way out the door for my commute to work when I remembered I was nearly finished my current book). Something catches my attention and I look at the details. Using calibre or my ereader, it is easy to read the synopsis and maybe a quick glance at the tags. And maybe the page count if I'm after a quick read. The choice of a book to read is based on my mood and seeing some that fits that. That doesn't work well with the file listing, unless, as I suggested, that file name was several thousands of characters long.

More power to you.

Quote:

Originally Posted by davidfor

Of course, maybe the problem is that we don't have the benefit of your "perfect" scripts. We have to stick with the terrible tools and processes that we have available. How about you publish your scripts so that we can all see the light?

Well, first of all, again, they aren't "my" scripts. I've adapted them to my needs. Second off, check the thread, the git-repo is listed.

davidfor · 08-09-2018, 02:58 AM

Quote:

Originally Posted by sealbeater

Well, looking back over the context, it was in response to your comment that read:

And my comment was:

I hope that clears things up for you.

Not really as you did post the "cute" comment and then said that what I said wasn't going to be your reaction without stating what it was. A bit later you stated that you didn't think a GUI display of text was the same as a command line display, so I suppose that was your reaction.

Quote:

Well, if you don't know, you just don't know.

Well I see a line of well formatted text in one place and a line of well formatted text in another. They appear in windows on my screen and I can scroll those windows to see other lines. Or maybe the command-line can't if it is sized differently.

So yes, I apparently "just don't know" and would need you to explain the difference.

Quote:

I think my way is better. I've been able to discuss the benefits so far. Perhaps there is a better way. Perhaps *it* is a better way. How will we know if we don't discuss?

Or, none of the above.

Yea, it handles that. If it comes across a book it's unsure of, it moves it to a "uncertain" directory.

Sorry, I thought you said it was perfect. If so, why do you have an uncertain directory? Doesn't that mean you have to do some manual checking? Doesn't that mean it isn't "perfect"?

Quote:

You aren't hearing me. Regardless of the source, after the scripts are done, the title is the same.

And you are not hearing me. Based on the metadata I see with books I download, or the file names I see when I download the books, that is not possible. The scripts are relying on file names. I just downloaded a book that I purchased from Kobo. The file name is "smashwords-epub-a030871b-dc97-493b-b4e8-7ebe19777523.epub". How does your script handle that? Or do you have to set the file name to something close to what is desired first. But, if you did, that would mean you would need manual editing and that the scripts are perfect.

That file name is just an example. Amazon use their own names, anything downloaded with ADE another other stores use other names that might not be unique when looking for metadata.

Quote:

Still not hearing me.

And you are not hearing me. The source for metadata are inconsistent and a simple take on downloading them doesn't produce what I think is perfect data.

Quote:

I'm sorry you live in such a bleak and deary world that an automated script working as intended is greeted with amazement.

Because I've written to many scripts and applications to believe that perfection in something like this is actually achievable. Because I maintain more than one metadata source plugin for calibre and know through my testing that even within a site, the data is inconsistent.

Quote:

What makes you think I'm not reading the books I'm collecting? I have lots of time since I don't have to worry about manually editing metadata.

I never said you weren't reading. I just stated you were collecting more books than you could possibly read. I'll admit that I'm probably collecting more than I can read before I die, but probably by only a factor of two or three. Or maybe 10. And of course, that still leaves me with heaps of time to read.

Quote:

Yea, I do the same. This is no impediment to that at all.

More power to you.

Again condescending little phrases that have no purpose other than to be insulting.

Quote:

Well, first of all, again, they aren't "my" scripts. I've adapted them to my needs. Second off, check the thread, the git-repo is listed.

I missed this and someone else pointed it out to me. So, I've had a look at them. And well, if I had seen this earlier, I wouldn't have responded because I would have been capable of it because I would have been

And yes, that is deliberately insulting.

Throughout this discussion you have gone to great lengths to say how perfect your scripts are and how they get perfect results and that anyone using calibre is an idiot (OK, my word but I don't think it is a stretch). But, those scripts are using calibre to download the metadata. And that means that the difference in what you do and someone using the calibre interface is doing is that you don't check the metadata before accepting it. What that means is that your definition of "perfect" is actually "whatever calibre finds for me".

You also must be either only sourcing books from places that have already done some work on the file names to make metadata downloading reasonably accurate. Or are pre-editing them. Which of course you can't be as that would be "manually editing metadata". You did mention a script "ebook-reader-prep.sh" which isn't in that package. Maybe that is what is doing the hard work of preparing the books to be able to get "perfect" metadata.

And just a last comment:

I know how the calibre metadata download works. If you have two million books and have not vetted them before downloading the metadata using calibre and not checked them afterwards, you have some books with wrong metadata. The only chance that there are no errors is if you had an ISBN for every book before you tried to get the metadata.

And to the moderators, I'll stop now.

Terisa de morgan · 08-09-2018, 04:01 AM

Duplicated with previous post.

BetterRed · 08-09-2018, 05:35 AM

Quote:

Originally Posted by davidfor

<snip>

And to the moderators, I'll stop now.

Reading this thread for the past couple of weeks has been not unlike reading the Reader Comments on the UK Daily Telegraph Opinion pieces for the past couple of years

What was that Italian aphorism Voltaire famously used?

Ah, yes "Le meglio è l'inimico del bene"

BR

Adoby · 08-09-2018, 11:12 AM

OK!

I have tested it on around 100 books and I am quite impressed.

I let it work on some books from my baen monthly bundles and some non-fiction pdfs and some random files. And the script organize-ebooks.sh did a great job. I will definitely start using this to preprocess all my ebook files before importing to calibre!

I had it the script place books it was uncertain about in a specific folder. Only one book, a comic book I have no idea where I got, was placed there. But it was correctly renamed...

However I never got -oft (output filename template) working. I tried several ways with, quotes in many different ways, including copied from the docs, but it never worked. So all the books are in a single folder. If you, sealbeater, could post a snippet showing how to get organize-ebooks.sh to use -oft and/or output books to subfolders it would be great!

A few caveats: The scripts is intended for actual published books. With ISBN. Fan fiction or self-published stuff I suspect would be very hit-and-miss depending on how metadata is stored in the file.

Specify -owi to attempt to organize books without ISBN.

The WorldCat ISBN calibre plugin had to be rezipped to a more shallow structure to be installed in calibre. And it has a limit of 1000 lookups per day. Fine for my needs...

Tesseract (OCR-software) in Ubuntu 18.04 is the latest 4.0, so it can be installed from normal repositories. Seems to OCR fine!

If you have the source books in subfolders the subfolders are not removed as the renamed books are removed.

Some familiarity with linux and bash scripting is needed.

But, as said, I am very pleased. Thanks to sealbeater for making me look at these scripts a little closer. I've seen them before, but not tried them.

Just point the script at a folder with books and have it shug away. A few seconds per book unless it has to be OCR:ed.

Then import to calibre. I still have to manually organize by genre... Perhaps also download cover.

kovidgoyal · 08-09-2018, 12:11 PM

Quote:

Originally Posted by davidfor

Throughout this discussion you have gone to great lengths to say how perfect your scripts are and how they get perfect results and that anyone using calibre is an idiot (OK, my word but I don't think it is a stretch). But, those scripts are using calibre to download the metadata. And that means that the difference in what you do and someone using the calibre interface is doing is that you don't check the metadata before accepting it. What that means is that your definition of "perfect" is actually "whatever calibre finds for me".

The scripts are basically written to get around the deliberate rate limiting I do in the calibre GUI as a courtesy to site owners. The scripts use fetch-ebook-metadata.exe from calibre to download metadata and ebook-metadata.exe from calibre to read/write metadata. Being disrespectful of site owners resources is pretty much the extent of the great innovation in these scripts

To any innocent bystanders reading this, please do not use these scripts. You will place undue load on the various servers calibre downloads metadata from, making the site owners lives more difficult and making it more likely that they will implement more rigorous bot detection algorithms which in turn means that metadata download will stop working for all calibre users, not just users of these "scripts". And everybody that uses those services will have to solve more captchas.

sealbeater · 08-09-2018, 04:10 PM

Quote:

Originally Posted by Adoby

I let it work on some books from my baen monthly bundles and some non-fiction pdfs and some random files. And the script organize-ebooks.sh did a great job. I will definitely start using this to preprocess all my ebook files before importing to calibre!

Exactly! It works very well for that I'm finding.

Quote:

Originally Posted by Adoby

However I never got -oft (output filename template) working. I tried several ways with, quotes in many different ways, including copied from the docs, but it never worked. So all the books are in a single folder. If you, sealbeater, could post a snippet showing how to get organize-ebooks.sh to use -oft and/or output books to subfolders it would be great!

This is what I use:

--output-filename-template='"${d[AUTHORS]// & /, }/${d[AUTHORS]// & /, } - ${d[SERIES]:+[${d[SERIES]//:/ -}] - }${d[TITLE]//:/ -}${d[PUBLISHED]:+ (${d[PUBLISHED]%%-*})}${d[ISBN]:+ [${d[ISBN]}]}.${d[EXT]}"'

Quote:

Originally Posted by Adoby

But, as said, I am very pleased. Thanks to sealbeater for making me look at these scripts a little closer. I've seen them before, but not tried them.

No worries! I'm glad someone sees the utility and gets some use out of them.

You can install proxychains and put "proxychains -q" in front of any script commands to proxy any connection requests through Tor and avoid IP banning.

I hope you find it useful.

sealbeater · 08-09-2018, 04:22 PM

Quote:

Originally Posted by davidfor

Not really as you did post the "cute" comment and then said that what I said wasn't going to be your reaction without stating what it was. A bit later you stated that you didn't think a GUI display of text was the same as a command line display, so I suppose that was your reaction.

Sigh. Reading comprehension would serve you well here.

Quote:

Originally Posted by davidfor

Well I see a line of well formatted text in one place and a line of well formatted text in another. They appear in windows on my screen and I can scroll those windows to see other lines. Or maybe the command-line can't if it is sized differently.

You can try to belabor this point but if you seriously are telling me that you are incapable of seeing a difference between a gui application and a text based session, there's nothing I can do for you.

Quote:

Originally Posted by davidfor

So yes, I apparently "just don't know" and would need you to explain the difference.

I see no reason why I would try. It would be like explaining the colour blue to someone blind from birth.

Quote:

Originally Posted by davidfor

Sorry, I thought you said it was perfect. If so, why do you have an uncertain directory? Doesn't that mean you have to do some manual checking? Doesn't that mean it isn't "perfect"?

It is perfect. When it finds a match, it's matched, when it doesn't it doesn't. Perhaps you should try it for yourself and that would make things clearer.

If you are capable of it.

Quote:

Originally Posted by davidfor

And you are not hearing me. Based on the metadata I see with books I download, or the file names I see when I download the books, that is not possible. The scripts are relying on file names. I just downloaded a book that I purchased from Kobo. The file name is "smashwords-epub-a030871b-dc97-493b-b4e8-7ebe19777523.epub". How does your script handle that? Or do you have to set the file name to something close to what is desired first. But, if you did, that would mean you would need manual editing and that the scripts are perfect.

Someone else tried for themselves and reported their experience. Why don't you read it and find out?

Quote:

Originally Posted by davidfor

That file name is just an example. Amazon use their own names, anything downloaded with ADE another other stores use other names that might not be unique when looking for metadata.

And you are not hearing me. The source for metadata are inconsistent and a simple take on downloading them doesn't produce what I think is perfect data.

It's very boring having a conversation with someone who doesn't know what they are talking about.

Quote:

Originally Posted by davidfor

Because I've written to many scripts and applications to believe that perfection in something like this is actually achievable. Because I maintain more than one metadata source plugin for calibre and know through my testing that even within a site, the data is inconsistent.

Well, let me know when you try it for youself...if you ever do.

Quote:

Originally Posted by davidfor

I never said you weren't reading. I just stated you were collecting more books than you could possibly read. I'll admit that I'm probably collecting more than I can read before I die, but probably by only a factor of two or three. Or maybe 10. And of course, that still leaves me with heaps of time to read.

Again condescending little phrases that have no purpose other than to be insulting.

Yes. It's a character flaw.

Quote:

Originally Posted by davidfor

I missed this and someone else pointed it out to me. So, I've had a look at them. And well, if I had seen this earlier, I wouldn't have responded because I would have been capable of it because I would have been

And yes, that is deliberately insulting.

I find it's more effective when you don't give the game away but ok...fools do enjoy laughter.

Quote:

Originally Posted by davidfor

Throughout this discussion you have gone to great lengths to say how perfect your scripts are and how they get perfect results and that anyone using calibre is an idiot (OK, my word but I don't think it is a stretch).

It really doesn't help your position when you have to put words in your opponents mouth.

Quote:

Originally Posted by davidfor

But, those scripts are using calibre to download the metadata. And that means that the difference in what you do and someone using the calibre interface is doing is that you don't check the metadata before accepting it. What that means is that your definition of "perfect" is actually "whatever calibre finds for me".

They do. That's not the only difference tho.

Quote:

Originally Posted by davidfor

You also must be either only sourcing books from places that have already done some work on the file names to make metadata downloading reasonably accurate. Or are pre-editing them. Which of course you can't be as that would be "manually editing metadata". You did mention a script "ebook-reader-prep.sh" which isn't in that package. Maybe that is what is doing the hard work of preparing the books to be able to get "perfect" metadata.

The more you talk, the more you show how much you don't know.

The ebook-reader-prep.sh script is my wrapper around it.

It does nothing special that anyone who was scripting for their own use would do.

And just a last comment:

Quote:

Originally Posted by davidfor

I know how the calibre metadata download works. If you have two million books and have not vetted them before downloading the metadata using calibre and not checked them afterwards, you have some books with wrong metadata. The only chance that there are no errors is if you had an ISBN for every book before you tried to get the metadata.

And to the moderators, I'll stop now.

If only saying so made it so. Well, read other's comments on how it works and see for yourself. Or live in a reality created in your own mind. Up to you which.

sealbeater · 08-09-2018, 04:24 PM

Quote:

Originally Posted by kovidgoyal

To any innocent bystanders reading this, please do not use these scripts. You will place undue load on the various servers calibre downloads metadata from, making the site owners lives more difficult and making it more likely that they will implement more rigorous bot detection algorithms which in turn means that metadata download will stop working for all calibre users, not just users of these "scripts". And everybody that uses those services will have to solve more captchas.

People can do what they want but I've been using these scripts for a year with no trouble. I do proxy though Tor using proxychains and I highly recommend using it if doing so.

Adoby · 08-09-2018, 05:30 PM

I might throttle the script myself. Since it can run unattended I am fine with it matching one book per minute, or whatever, instead of one every few seconds. It can run while I sleep or work.

I got -otf working. I tried setting the environment variable inside the script, but never got it working. I resorted to making a script for the script and setting it as a parameter, and it works fine!

davidfor · 08-09-2018, 09:47 PM

Quote:

Originally Posted by sealbeater

If only saying so made it so. Well, read other's comments on how it works and see for yourself. Or live in a reality created in your own mind. Up to you which.

Yes, it was a pretty stupid thing to say considering how I knew you would answer. Basically everything you said was a condescending little quip or something to avoid an actual answer. But, maybe I didn't make things clear enough.

You did comment on reading comprehension. And I can make the same comment. You seem to think that I am saying these scripts don't work. I have never said that. I questioned the word "perfect". And that is based on my understanding of calibre and how it fetches metadata. The simple fact is, that is not perfect. And the only way to get close to perfect is to already have nearly perfect metadata. You need the correct ISBN or a title and author that is correct. Any spelling mistakes in them and the likelihood of getting the wrong book, or no book, is high. Throw a colon in the title and again, the likelihood of an error is high. Especially if the first part is a series name.

All that means is that you have to do some preprocessing to get the correct results. Your implication has always been that you just run the script against a directory and there are no errors. But, you finally seem to be admitting that there are and that you have to fix them some other way. After all, why would you have have a problem directory?

You dismissed my statement about the file called "smashwords-epub-a030871b-dc97-493b-b4e8-7ebe19777523.epub" saying that I should try it and see. I don't need to try it, I can read code, I know how it works and I can see its limitations. But, what I wanted to know was how you handled these. Your statements imply that you don't do any manual processing but reply completely on the scripts. And that isn't possible with the file names that are coming from the shops. So, how do you handle this? There is nothing in that file name that tells you what the book is. And the book doesn't have an ISBN inside it. So, how do you handle this?

Of course, you used someone elses test to show how this is "perfect". Of course, that person reported 99% accuracy, which last time I checked, isn't perfect. And to me that is worse, that 1 book that failed was actually correct but treated as an error. That demonstrates the scripts are not perfect.

sealbeater · 08-09-2018, 10:27 PM

Quote:

Originally Posted by davidfor

Yes, it was a pretty stupid thing to say considering how I knew you would answer. Basically everything you said was a condescending little quip or something to avoid an actual answer. But, maybe I didn't make things clear enough..

I'm sorry, did you try it? Do you know what you are talking about? Are you basing your responses to me on what you *know* is actual reality or only what you *think* reality is?

Quote:

Originally Posted by davidfor

You did comment on reading comprehension. And I can make the same comment. You seem to think that I am saying these scripts don't work. I have never said that. I questioned the word "perfect". And that is based on my understanding of calibre and how it fetches metadata.

So, not based on any observable repeatable tests you have conducted yourself.

Quote:

Originally Posted by davidfor

The simple fact is, that is not perfect. And the only way to get close to perfect is to already have nearly perfect metadata. You need the correct ISBN or a title and author that is correct.

Yes, I think that is understood by all. You aren't giving anyone any new revelations just like you didn't when you said I would be IP banned for hammering websites.

Do you really think someone could have 2 million books...import half a million into calibre...decide that this method was a better way and blow out a 1.5 TB calbre dir and start over and not have some clue as to what they were doing?

Perhaps more clue than you. That damn character flaw of mine again.

Quote:

Originally Posted by davidfor

Any spelling mistakes in them and the likelihood of getting the wrong book, or no book, is high. Throw a colon in the title and again, the likelihood of an error is high. Especially if the first part is a series name.

So boring....so very very very tedious and boring...like arguing with a child.

Quote:

Originally Posted by davidfor

All that means is that you have to do some preprocessing to get the correct results. Your implication has always been that you just run the script against a directory and there are no errors. But, you finally seem to be admitting that there are and that you have to fix them some other way. After all, why would you have have a problem directory?

The only preprocessing I do is the following (and I say this for other's edification, not your own):

binary deduplication.
file name sanitizing (by which I mean file renaming, ie, removing all spaces and lower-casing filenames
I then move all files into the root of the working dir and then remove all subdirectories.

I make a temp dir and move the files in one by one and run the script on them. (I find it's better to do it this way then deal with filesystem limits when dealing with lots of small files.)

This is all automated of course.

Quote:

Originally Posted by davidfor

You dismissed my statement about the file called "smashwords-epub-a030871b-dc97-493b-b4e8-7ebe19777523.epub" saying that I should try it and see. I don't need to try it, I can read code, I know how it works and I can see its limitations.

Oh dear. I dismissed your statement about the file called "mashwords-epub-a030871b-dc97-493b-b4e8-7ebe19777523.epub", because I, unlike you, know that the file name doesn't matter to the script. It could be called "7aLzN3dNqlUcqYVv1e+WyWXGfn+e0IuTA1CwJO/pm0NOxkkdc+i5gqdpB9zA.epub" and it would be identified correctly. I know this because I have tried it for myself and because of that, i know you haven't. Therefore, one of us knows what he is talking about and the other doesn't. i wonder who...

?

Quote:

Originally Posted by davidfor

But, what I wanted to know was how you handled these. Your statements imply that you don't do any manual processing but reply completely on the scripts. And that isn't possible with the file names that are coming from the shops. So, how do you handle this? There is nothing in that file name that tells you what the book is. And the book doesn't have an ISBN inside it. So, how do you handle this?

If the book doesn't have an ISBN, that's going to be tricky. Let's give it a proper test. Since you seem incapable of testing this for yourself and finding out for yourself, a sad position to be in, depending on others to discover truth for you, why don't you send me the book and I'll run the script on it and we'll all find out. i too, am curious now.

EDIT: The script would probably resort to OCR'ing at that point and make a best guess...this would probably mean it would end up in the "uncertain" dir for later review even if correct.

Of course, I consider this perfection..others who like to do tedious point and click work may disagree.

Quote:

Originally Posted by davidfor

Of course, you used someone elses test to show how this is "perfect". Of course, that person reported 99% accuracy, which last time I checked, isn't perfect.

There goes that reading comprehension again. The *comic* was correctly renamed but yes, it was placed in the "uncertain" directory so the user could review.

And yes, I used someone else's test. That seems to be the best way, when one can't test for themselves.

Quote:

Originally Posted by davidfor

And to me that is worse, that 1 book that failed was actually correct but treated as an error. That demonstrates the scripts are not perfect.

Well then, you have met my expectations of you. The script errs on the side of caution. It didn't treat it as an "error", it treated it as "uncertain". That's exactly what it's supposed to do. Unless you prefer things not to err on the side of caution, esp when automated? That doesn't seem too wise to me. Everything that gets put in the "certain" dir is extremely accurate.

Or, rather than just running a script and going to sleep, you can point and click your gui all day...can't do it during work hours, although I can...can't do it while sleeping, although I can...

Of course, you are free to believe that a person would turn that script loose on a 2 million ebook collection, without being very sure that it wouldn't end in disaster...well....please...feel free to believe whatever reality your mind can concoct. I'm glad it's not mine.

Adoby · 08-10-2018, 02:37 AM

My first test was very biased since it was made mainly on books I had earlier successfully imported to calibre. I have used the script on more books now and there are several more books in the uncertain folder. For some reason many mobi-files. Also the scripts doesn't even attempt to process periodicals like magazines. Junk files fail.

Still, I find the scripts very useful.

08-09-2018, 11:12 AM	#67
Adoby Handy Elephant Posts: 1,736 Karma: 26785668 Join Date: Dec 2009 Location: Southern Sweden, far out in the quiet woods Device: Thinkpad E595, Ubuntu Mate, Huawei Mediapad 5, Bouye Likebook Plus	OK! I have tested it on around 100 books and I am quite impressed. I let it work on some books from my baen monthly bundles and some non-fiction pdfs and some random files. And the script organize-ebooks.sh did a great job. I will definitely start using this to preprocess all my ebook files before importing to calibre! I had it the script place books it was uncertain about in a specific folder. Only one book, a comic book I have no idea where I got, was placed there. But it was correctly renamed... However I never got -oft (output filename template) working. I tried several ways with, quotes in many different ways, including copied from the docs, but it never worked. So all the books are in a single folder. If you, sealbeater, could post a snippet showing how to get organize-ebooks.sh to use -oft and/or output books to subfolders it would be great! A few caveats: The scripts is intended for actual published books. With ISBN. Fan fiction or self-published stuff I suspect would be very hit-and-miss depending on how metadata is stored in the file. Specify -owi to attempt to organize books without ISBN. The WorldCat ISBN calibre plugin had to be rezipped to a more shallow structure to be installed in calibre. And it has a limit of 1000 lookups per day. Fine for my needs... Tesseract (OCR-software) in Ubuntu 18.04 is the latest 4.0, so it can be installed from normal repositories. Seems to OCR fine! If you have the source books in subfolders the subfolders are not removed as the renamed books are removed. Some familiarity with linux and bash scripting is needed. But, as said, I am very pleased. Thanks to sealbeater for making me look at these scripts a little closer. I've seen them before, but not tried them. Just point the script at a folder with books and have it shug away. A few seconds per book unless it has to be OCR:ed. Then import to calibre. I still have to manually organize by genre... Perhaps also download cover. Last edited by Adoby; 08-09-2018 at 11:23 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
The pros and cons of Sigil	slowsmile	Sigil	44	02-15-2017 10:30 AM
What Are Your Pros And Cons?	MorganM	Which one should I buy?	22	09-26-2011 10:38 AM
So you have a Kindle. Pros/Cons	emclinux	Amazon Kindle	12	09-28-2010 08:20 AM
Accessories Pros and Cons between 2 covers	F1Wild	Amazon Kindle	5	07-08-2009 06:59 PM

08-09-2018, 04:01 AM	#65
Terisa de morgan Grand Sorcerer Posts: 6,262 Karma: 11768331 Join Date: Jun 2009 Location: Madrid, Spain Device: Kobo Clara/Aura One/Forma,XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2	Duplicated with previous post.

08-09-2018, 05:30 PM	#72
Adoby Handy Elephant Posts: 1,736 Karma: 26785668 Join Date: Dec 2009 Location: Southern Sweden, far out in the quiet woods Device: Thinkpad E595, Ubuntu Mate, Huawei Mediapad 5, Bouye Likebook Plus	I might throttle the script myself. Since it can run unattended I am fine with it matching one book per minute, or whatever, instead of one every few seconds. It can run while I sleep or work. I got -otf working. I tried setting the environment variable inside the script, but never got it working. I resorted to making a script for the script and setting it as a parameter, and it works fine!

08-10-2018, 02:37 AM	#75
Adoby Handy Elephant Posts: 1,736 Karma: 26785668 Join Date: Dec 2009 Location: Southern Sweden, far out in the quiet woods Device: Thinkpad E595, Ubuntu Mate, Huawei Mediapad 5, Bouye Likebook Plus	My first test was very biased since it was made mainly on books I had earlier successfully imported to calibre. I have used the script on more books now and there are several more books in the uncertain folder. For some reason many mobi-files. Also the scripts doesn't even attempt to process periodicals like magazines. Junk files fail. Still, I find the scripts very useful.

Advert