MobileRead Forums - View Single Post

unboggling · 08-12-2011, 11:43 PM

---

Link to latest version: Version 0.80, 2011-09-16, Post #243.

---

KISS (version 0.21)

Hello, new users of calibre. Keep It Simple. KISS your calibre use.

After that, everything else equates to gradually learning details of how to use various features and complexities and trying them one by one in small steps. Once I decided to KISS my calibre use I managed to avoid feeling overwhelmed by apparently insurmountable complexities and learning curves. If you feel overwhelmed, don't blame it on calibre or even on yourself, instead KISS things. From a relatively KISSy baseline of calibre use I can move towards any particular complexity in small steps.

I'll probably never use most calibre features or complexities because the software accommodates a range of users from new to very advanced, with many features that meet lots of different user needs for managing eBooks, eBook reader devices, and subscription feeds. I consider myself a calibre beginner and an eBook beginner after seven months. I've barely scratched the surface. I've noticed there is rarely just one "right" or "good" way to accomplish something. And as I learn more, the ways I use calibre gradually evolve.

General Advice.

Tangents. Avoid running off on complex time-consuming tangents that take you away from your main purpose, whatever that is. My main purpose is managing and reading fiction eBooks. Tangent examples: I spent a week writing and revising a script for printing the calibre catalog in Comma Separated Value (CSV) format exported to Excel; now printouts aren't practical anymore. I investigated drivers and front end interfaces to calibre's backend database engine, and I don't even like using Structured Query Language anyway. I developed a whole elaborate tagging schema of my own, spent weeks revising it then trying to comply with it; I don't use it now. I did those things to avoid feeling overwhelmed. Once I settled into my main purpose rather than tangents, I felt overwhelmed until I KISSed things.

Automation. Do a process manually for awhile until you're familiar with it before trying to automate it with scripts, regular expressions, templates, computed columns, or whatever. Attempted automation without prior innate understanding of the manual mechanics usually wastes lots more time than it saves. Not knowing what I was doing but trying to combine different types of automation all at once anyway led to me wasting time and feeling frustrated and overwhelmed. Until I KISSed things.

Raw Books. Keep them. I keep all downloaded files that got copied by calibre when I Added them. I throw them in a "Raw Books" folder on an external drive - I've found myself searching Raw Books numerous times and re-adding books into calibre for one reason or another. They have bad metadata or haven't been format-cleaned but at least they are the original incoming formats; keeping them available is an insurance policy against future need.

Backups. Have backup software automatically backing up your calibre libraries at least hourly to a different disk than the disk your calibre libraries are on. Buy an external drive if necessary. Do it even if you use a file hosting/syncing service such as DropBox, because - what if they go bankrupt or their servers go down or you lose internet connectivity for awhile and you suddenly need the backup right away? I've had to recover from backups three different times after I made various blunders.

Metadata Loss. Don't make any decisions that result in losing metadata until you're more experienced and become aware of potential ramifications. Some small examples that require lots of work later to rectify: Deleting The, An, A from title. Changing Spectra to Bantam so publishers are more consistent. Using only one of the 3 or 4 co-authors of an anthology where the actual editor's name isn't available, rather than taking the trouble to use all the names. I don't recommend KISSing things to the extreme of losing data. I wasted time, having to fix things later.

Some of My Current calibre Work Habits.

This section is not advice, but I offer it because it might give you some ideas you could adapt into your own calibre use. These habits are just some of the ways I've been doing things, and I don't claim that they are good, or best practices, or the KISSiest possible. Though they're KISSier and improved over what they were months ago.

Habits Change. Here are two significant examples. I recently learned on the forum that my habit of converting everything to ePub is a bad practice that results in lower quality eBooks than keeping the original for some formats, and that multiple conversions can engender problems and lose things, similar to photocopying copies of copies. When I learn new things like that, which is an ongoing process, I want to revise my relevant practices. Actually doing that requires consideration of ramifications across all of my calibre libraries, and learning details and testing the revised method, so there is a lag-time for implementation on new books as well as a decision to be made about how much work to do (if any) to bring older books in the library up to par. People much more experienced than me suggested a strategy of not converting anything until just before reading and always keeping the original incoming format in calibre rather than deleting it, because disk space is cheap and technology and the user's conversion skills improve over time. That is opposite to what I've been doing, and the copies of copies problem concerns me because I do a lot of conversions in sequence on one book during cleanup. So I want to change those habits more towards best practices. But haven't yet. I've marked those habits currently needing revision with: ((Habit needs revision)).

Preferred Format. ((Habit needs revision, delay conversions, keep original incoming formats)) I standardize on ePub, convert everything to it, consider it the master copy once it's cleaned up, delete any other formats, and generate whatever formats I need for various devices from that master on the fly as I need them. I load my devices with only a few books at a time, the author or series I'm currently reading, and delete them from device when done. I add a content-rating tag to the book's record in my main library and delete those extra formats. Saves disk space and backup time. My "masters" are ePub rather than mobi even though my primary reading device is Kindle, because I noticed the calibre viewer opens ePubs faster, and calibre conversions seem to complete faster from ePub.

Format Exceptions. ((Habit needs revision, broaden exceptions)) The only exceptions I make to keeping everything ePub are items that don't convert well, such as books with complex graphics, computer language books, textbooks, scientific journal articles with equations, etc - all of which I currently keep out of my calibre libraries. If I had enough of them or subscribed to technical newsfeeds I'd keep a separate Technical Library in which technical items are stored in their native formats, but for now I don't worry about it because being retired I mostly just read fiction.

Templates. I currently use one template only and that one adds series and series index to title for my Kindle.

Regular Expressions (regex). I am only now starting to incorporate search/replace regex into my workflow. The only regex I used for 6 months was calibre-menu-supplied for importing books by filename or putting series info into title on Kindle.

Custom columns. I use only 4 simple custom columns:
#isbn, to see at a glance.
#format, to see at a glance.
#act, for temporary working tags when processing a group of books one way or another.
#notes, for variant titles, pseudonyms, miscellaneous.

Series. I tried multiple series columns but eventually went back to using just the one default series column. When it's important to get several subseries in the right order, I handle multi-level series like this: SeriesUniverseAbbreviation; Series Name (a); SubseriesName. For example for Star Wars, as follows. SW; Clone Wars (b); SubseriesName. It's easy to search on "SW;" while ANDing any other desired keywords. If I use (a)'s, (b)'s, etc correctly, sorts by series come out in chronological order per original pubdate or recommended reading order, whichever I initially preferred it to be. But for most books in most series I don't worry or care about all that, and just use the smallest/lowest level subseries name. Sometimes I use the broadest series name only, put in reading order, such as "Valdemar" but I prefer to do that only when I'm certain of reading order and a series is complete. When a series is up to date, in Tags I add %su which means none missing and up to date through the most recent series member in the library - a particularly useful tag in the case of a multi-author series. For multi-author series I use author name rather than seriesname in author column; if I want to see all series members in a list I do a search or just sort by series.

Metadata Grabs. I grab only pubdate, publisher, isbn, and cover. I always grab cover, even when it has internal cover. I keep only a few sources checked (figuring the more checked, the slower the grab.) Amazon's seemed more consistently accurate with broader item availability than others. I recently added Goodreads as a source, not sure how it compares to Amazon yet. Also by default I use ISBNdb and Open Library. Others I keep unchecked and only use on a case by case basis when necessary. Since several of the genres I read are speculative fiction, when the bulk metadata grab results don't please me I manually use ISFDB to get good covers and ISBN13s where available - it's amazing how much I use that site - and it's much faster that way than one by one in "Edit Metadata Individually" using Download Metadata button.

Format Quality. ((Habit needs revision re cleaning location (Add vs Main) & when to clean.)) I decided early-on that I would only put cleaned-up or clean formats into my main library. So I have two working libraries, Main and Add. Add is for incomings and cleaning up their metadata and formats. When a batch has been cleaned in Add, I save them out with covers and opfs, delete them from Add, add them (using metadata instead of filename) to Main. That forces all the opf, format, and calibredb metadata to be consistent, at least until the next time I change any of that metadata in Main. While working on them in Add, for each book format I examine it once then assign a format quality code. This works for me at format level because I keep only one format per record. And in the future, for any multiple formats, I trust that initial evaluation and don't examine it again. I choose the format with best quality code over the others. I may have a _q3 sitting in Main for 6 months and then a _q4 for that title shows up in Add, so eventually I replace the _q3 format in Main with the _q4 from Add. I rarely use anything except _q0, _q3, or _q4. Doing it this way, worse formats have a chance of getting replaced by better later. That applies to anything that has something wrong with it: Advance Reader Copies, incomings in any format that have no bold/italics (usually caused by it previously having been a text format), problem formats of any kind, all of which are convenient to keep as placeholders.

Format Quality Tags.
_q0 wishlist item. I also use it to color that book record's text red. And for bad formats, saves the trouble of creating an empty book record or empty book placeholder format.
_q1 not used.
_q2 mostly not used, a few cases = more than minor annoyance, not fixable, retained anyway.
_q3 okay, readable with only minor annoyance.
_q4 good, readable with no annoyance.
_q5 excellent, I don't bother with this, except for a few examples.

Format cleanups. ((Habit needs revision re when to clean & fewer conversions.)) I'm not a publisher, distributor, or editor. Any cleanup I do takes valuable time. My goal isn't to make it perfect, but to spend the least amount of time possible to make it "readable by me with as little annoyance as possible." I examine all incomings for format quality, and then tag each with format-quality code. If it looks like I won't be able to clean it up in 5 minutes or less, I scrap it as not worth it or tag it _q0 and add a tag code for the type of format problems it has. I always work on a copy rather than on the main format in calibre. I haven't worried about or cleaned up most Table of Contents (TOCs) because for most books except big omnibuses I don't use or care about TOCs. I do strip out header/footer/page# when I can without causing a lot of split paragraphs, otherwise I scrap it or code it _q0. I'm comfortable in Word so I go this workflow format-conversion route: calibre epub --> rtf saved out --> Word (search/replace) docx --> Open Office odt --> add to calibre --> ePub. The conversions from rtf to docx to odt each cleans out Microsoft format garbage and reduces size considerably. If I were more comfortable in Open Office I'd simplify that workflow a lot, but I'm not yet. This way works fairly well for a beginner like me. I don't bother going the html fix-it-up path because for my purposes that's overkill and I'm not that comfortable with it yet, in Sigil or even in html tags in general. Eventually I'll probably switch over to doing it the html way, which is apparently more precise, more flexible, more similar to internal calibre-conversions, and avoids some conversion problems, but I'm not there yet.

Tags. Since beginning with calibre I gradually used the tag browser less and the search box more. At about 3 or 4 months in I stopped using a lot of columns for genre, booktype, seriesstatus etc and started using the default tags column, tag prefixes to designate tag type, and abbreviations. Example of how I use tags now on a book:
_FormatQuality, ((GenrePrimary, (GenreSecondaries, [Type, %status, miscellaneous
_q4, ((fn, (mgc, (ya, [om, %sma, %su, r3
That translates to: Format good, fantasy with magic, young adult, omnibus, in multiauthor series, series up to date, myRating=3stars (and I've read it, otherwise it wouldn't be rated).

Empty books. I don't use Empty Book command or Empty Book records. I created a folder containing empty files (originally text, later converted to epub) titled Empty01 through Empty10 by author AAA, TBD. When I need to, I add a group of 10 "empty" books and change the metadata of one or several appropriately, keeping it format code _q0. The reason I do this is because empty books don't get included when you SaveToDisk a selection of books out. If you want them included in saves that eventually get added to a different library, book records indicating wishlist items need to hold a format. If you want to avoid all that just use CopyToLibrary instead which does copy empty books without formats.

Plugins. I use these Plugins frequently now - Find Dupes, Open With, Search Internet, Quality Check.

Collections on Devices. I don't use or want them. I prefer referring to tags in main library when necessary.

News. I haven't ever used a newsfeed. Maybe someday I will if I feel a need.

CSS. I haven't used style sheets. CSS is one of the complexities I want to take baby steps into soon.

Bottom-Line.

KISS your calibre use, then gradually learn any particular complexity in small steps from your KISSed baseline.

08-12-2011, 11:43 PM	#52
unboggling Wizard Posts: 1,065 Karma: 858115 Join Date: Jan 2011 Device: Kobo Clara, Kindle Paperwhite 10	--- Link to latest version: Version 0.80, 2011-09-16, Post #243. --- KISS (version 0.21) Hello, new users of calibre. Keep It Simple. KISS your calibre use. After that, everything else equates to gradually learning details of how to use various features and complexities and trying them one by one in small steps. Once I decided to KISS my calibre use I managed to avoid feeling overwhelmed by apparently insurmountable complexities and learning curves. If you feel overwhelmed, don't blame it on calibre or even on yourself, instead KISS things. From a relatively KISSy baseline of calibre use I can move towards any particular complexity in small steps. I'll probably never use most calibre features or complexities because the software accommodates a range of users from new to very advanced, with many features that meet lots of different user needs for managing eBooks, eBook reader devices, and subscription feeds. I consider myself a calibre beginner and an eBook beginner after seven months. I've barely scratched the surface. I've noticed there is rarely just one "right" or "good" way to accomplish something. And as I learn more, the ways I use calibre gradually evolve. General Advice. Tangents. Avoid running off on complex time-consuming tangents that take you away from your main purpose, whatever that is. My main purpose is managing and reading fiction eBooks. Tangent examples: I spent a week writing and revising a script for printing the calibre catalog in Comma Separated Value (CSV) format exported to Excel; now printouts aren't practical anymore. I investigated drivers and front end interfaces to calibre's backend database engine, and I don't even like using Structured Query Language anyway. I developed a whole elaborate tagging schema of my own, spent weeks revising it then trying to comply with it; I don't use it now. I did those things to avoid feeling overwhelmed. Once I settled into my main purpose rather than tangents, I felt overwhelmed until I KISSed things. Automation. Do a process manually for awhile until you're familiar with it before trying to automate it with scripts, regular expressions, templates, computed columns, or whatever. Attempted automation without prior innate understanding of the manual mechanics usually wastes lots more time than it saves. Not knowing what I was doing but trying to combine different types of automation all at once anyway led to me wasting time and feeling frustrated and overwhelmed. Until I KISSed things. Raw Books. Keep them. I keep all downloaded files that got copied by calibre when I Added them. I throw them in a "Raw Books" folder on an external drive - I've found myself searching Raw Books numerous times and re-adding books into calibre for one reason or another. They have bad metadata or haven't been format-cleaned but at least they are the original incoming formats; keeping them available is an insurance policy against future need. Backups. Have backup software automatically backing up your calibre libraries at least hourly to a different disk than the disk your calibre libraries are on. Buy an external drive if necessary. Do it even if you use a file hosting/syncing service such as DropBox, because - what if they go bankrupt or their servers go down or you lose internet connectivity for awhile and you suddenly need the backup right away? I've had to recover from backups three different times after I made various blunders. Metadata Loss. Don't make any decisions that result in losing metadata until you're more experienced and become aware of potential ramifications. Some small examples that require lots of work later to rectify: Deleting The, An, A from title. Changing Spectra to Bantam so publishers are more consistent. Using only one of the 3 or 4 co-authors of an anthology where the actual editor's name isn't available, rather than taking the trouble to use all the names. I don't recommend KISSing things to the extreme of losing data. I wasted time, having to fix things later. Some of My Current calibre Work Habits. This section is not advice, but I offer it because it might give you some ideas you could adapt into your own calibre use. These habits are just some of the ways I've been doing things, and I don't claim that they are good, or best practices, or the KISSiest possible. Though they're KISSier and improved over what they were months ago. Habits Change. Here are two significant examples. I recently learned on the forum that my habit of converting everything to ePub is a bad practice that results in lower quality eBooks than keeping the original for some formats, and that multiple conversions can engender problems and lose things, similar to photocopying copies of copies. When I learn new things like that, which is an ongoing process, I want to revise my relevant practices. Actually doing that requires consideration of ramifications across all of my calibre libraries, and learning details and testing the revised method, so there is a lag-time for implementation on new books as well as a decision to be made about how much work to do (if any) to bring older books in the library up to par. People much more experienced than me suggested a strategy of not converting anything until just before reading and always keeping the original incoming format in calibre rather than deleting it, because disk space is cheap and technology and the user's conversion skills improve over time. That is opposite to what I've been doing, and the copies of copies problem concerns me because I do a lot of conversions in sequence on one book during cleanup. So I want to change those habits more towards best practices. But haven't yet. I've marked those habits currently needing revision with: ((Habit needs revision)). Preferred Format. ((Habit needs revision, delay conversions, keep original incoming formats)) I standardize on ePub, convert everything to it, consider it the master copy once it's cleaned up, delete any other formats, and generate whatever formats I need for various devices from that master on the fly as I need them. I load my devices with only a few books at a time, the author or series I'm currently reading, and delete them from device when done. I add a content-rating tag to the book's record in my main library and delete those extra formats. Saves disk space and backup time. My "masters" are ePub rather than mobi even though my primary reading device is Kindle, because I noticed the calibre viewer opens ePubs faster, and calibre conversions seem to complete faster from ePub. Format Exceptions. ((Habit needs revision, broaden exceptions)) The only exceptions I make to keeping everything ePub are items that don't convert well, such as books with complex graphics, computer language books, textbooks, scientific journal articles with equations, etc - all of which I currently keep out of my calibre libraries. If I had enough of them or subscribed to technical newsfeeds I'd keep a separate Technical Library in which technical items are stored in their native formats, but for now I don't worry about it because being retired I mostly just read fiction. Templates. I currently use one template only and that one adds series and series index to title for my Kindle. Regular Expressions (regex). I am only now starting to incorporate search/replace regex into my workflow. The only regex I used for 6 months was calibre-menu-supplied for importing books by filename or putting series info into title on Kindle. Custom columns. I use only 4 simple custom columns: #isbn, to see at a glance. #format, to see at a glance. #act, for temporary working tags when processing a group of books one way or another. #notes, for variant titles, pseudonyms, miscellaneous. Series. I tried multiple series columns but eventually went back to using just the one default series column. When it's important to get several subseries in the right order, I handle multi-level series like this: SeriesUniverseAbbreviation; Series Name (a); SubseriesName. For example for Star Wars, as follows. SW; Clone Wars (b); SubseriesName. It's easy to search on "SW;" while ANDing any other desired keywords. If I use (a)'s, (b)'s, etc correctly, sorts by series come out in chronological order per original pubdate or recommended reading order, whichever I initially preferred it to be. But for most books in most series I don't worry or care about all that, and just use the smallest/lowest level subseries name. Sometimes I use the broadest series name only, put in reading order, such as "Valdemar" but I prefer to do that only when I'm certain of reading order and a series is complete. When a series is up to date, in Tags I add %su which means none missing and up to date through the most recent series member in the library - a particularly useful tag in the case of a multi-author series. For multi-author series I use author name rather than seriesname in author column; if I want to see all series members in a list I do a search or just sort by series. Metadata Grabs. I grab only pubdate, publisher, isbn, and cover. I always grab cover, even when it has internal cover. I keep only a few sources checked (figuring the more checked, the slower the grab.) Amazon's seemed more consistently accurate with broader item availability than others. I recently added Goodreads as a source, not sure how it compares to Amazon yet. Also by default I use ISBNdb and Open Library. Others I keep unchecked and only use on a case by case basis when necessary. Since several of the genres I read are speculative fiction, when the bulk metadata grab results don't please me I manually use ISFDB to get good covers and ISBN13s where available - it's amazing how much I use that site - and it's much faster that way than one by one in "Edit Metadata Individually" using Download Metadata button. Format Quality. ((Habit needs revision re cleaning location (Add vs Main) & when to clean.)) I decided early-on that I would only put cleaned-up or clean formats into my main library. So I have two working libraries, Main and Add. Add is for incomings and cleaning up their metadata and formats. When a batch has been cleaned in Add, I save them out with covers and opfs, delete them from Add, add them (using metadata instead of filename) to Main. That forces all the opf, format, and calibredb metadata to be consistent, at least until the next time I change any of that metadata in Main. While working on them in Add, for each book format I examine it once then assign a format quality code. This works for me at format level because I keep only one format per record. And in the future, for any multiple formats, I trust that initial evaluation and don't examine it again. I choose the format with best quality code over the others. I may have a _q3 sitting in Main for 6 months and then a _q4 for that title shows up in Add, so eventually I replace the _q3 format in Main with the _q4 from Add. I rarely use anything except _q0, _q3, or _q4. Doing it this way, worse formats have a chance of getting replaced by better later. That applies to anything that has something wrong with it: Advance Reader Copies, incomings in any format that have no bold/italics (usually caused by it previously having been a text format), problem formats of any kind, all of which are convenient to keep as placeholders. Format Quality Tags. _q0 wishlist item. I also use it to color that book record's text red. And for bad formats, saves the trouble of creating an empty book record or empty book placeholder format. _q1 not used. _q2 mostly not used, a few cases = more than minor annoyance, not fixable, retained anyway. _q3 okay, readable with only minor annoyance. _q4 good, readable with no annoyance. _q5 excellent, I don't bother with this, except for a few examples. Format cleanups. ((Habit needs revision re when to clean & fewer conversions.)) I'm not a publisher, distributor, or editor. Any cleanup I do takes valuable time. My goal isn't to make it perfect, but to spend the least amount of time possible to make it "readable by me with as little annoyance as possible." I examine all incomings for format quality, and then tag each with format-quality code. If it looks like I won't be able to clean it up in 5 minutes or less, I scrap it as not worth it or tag it _q0 and add a tag code for the type of format problems it has. I always work on a copy rather than on the main format in calibre. I haven't worried about or cleaned up most Table of Contents (TOCs) because for most books except big omnibuses I don't use or care about TOCs. I do strip out header/footer/page# when I can without causing a lot of split paragraphs, otherwise I scrap it or code it _q0. I'm comfortable in Word so I go this workflow format-conversion route: calibre epub --> rtf saved out --> Word (search/replace) docx --> Open Office odt --> add to calibre --> ePub. The conversions from rtf to docx to odt each cleans out Microsoft format garbage and reduces size considerably. If I were more comfortable in Open Office I'd simplify that workflow a lot, but I'm not yet. This way works fairly well for a beginner like me. I don't bother going the html fix-it-up path because for my purposes that's overkill and I'm not that comfortable with it yet, in Sigil or even in html tags in general. Eventually I'll probably switch over to doing it the html way, which is apparently more precise, more flexible, more similar to internal calibre-conversions, and avoids some conversion problems, but I'm not there yet. Tags. Since beginning with calibre I gradually used the tag browser less and the search box more. At about 3 or 4 months in I stopped using a lot of columns for genre, booktype, seriesstatus etc and started using the default tags column, tag prefixes to designate tag type, and abbreviations. Example of how I use tags now on a book: _FormatQuality, ((GenrePrimary, (GenreSecondaries, [Type, %status, miscellaneous _q4, ((fn, (mgc, (ya, [om, %sma, %su, r3 That translates to: Format good, fantasy with magic, young adult, omnibus, in multiauthor series, series up to date, myRating=3stars (and I've read it, otherwise it wouldn't be rated). Empty books. I don't use Empty Book command or Empty Book records. I created a folder containing empty files (originally text, later converted to epub) titled Empty01 through Empty10 by author AAA, TBD. When I need to, I add a group of 10 "empty" books and change the metadata of one or several appropriately, keeping it format code _q0. The reason I do this is because empty books don't get included when you SaveToDisk a selection of books out. If you want them included in saves that eventually get added to a different library, book records indicating wishlist items need to hold a format. If you want to avoid all that just use CopyToLibrary instead which does copy empty books without formats. Plugins. I use these Plugins frequently now - Find Dupes, Open With, Search Internet, Quality Check. Collections on Devices. I don't use or want them. I prefer referring to tags in main library when necessary. News. I haven't ever used a newsfeed. Maybe someday I will if I feel a need. CSS. I haven't used style sheets. CSS is one of the complexities I want to take baby steps into soon. Bottom-Line. KISS your calibre use, then gradually learn any particular complexity in small steps from your KISSed baseline. Last edited by unboggling; 09-17-2011 at 03:28 AM. Reason: Link to newer version.