View Single Post
Old 08-08-2011, 08:40 PM   #1
unboggling
by the bootstraps
unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.
 
Posts: 1,055
Karma: 858115
Join Date: Jan 2011
Location: Southeast US
Device: PRS-T2, Nexus 7, KindleT, iPad1, Kindle3KB
KISS for New calibre Users (a work in progress…)

-----

Please link to the latest thread & revision:

How I Manage eBooks with calibre

-----


This is a first draft:

KISS for New calibre Users (version 0.10)

I consider myself still a "noob" after seven months of using calibre intensively.

There is only one overarching, bottom-line, all encompassing rule. And that is, Keep It Simple (KISS). I KISS things, I'm happy. If I unKISS things too often or too deeply, then I'm sad. For me, all of my basic calibre practices should derive from that.

In the first few weeks I went stampeding off at many tangents, nearly all at once. That was a way for me to avoid feeling overwhelmed by calibre's complexity and it's different pieces' learning curves. Now I wish I hadn't wasted my time (and others') pursuing tangents. Examples of that in my case: (1) Wrote a script for printing the booklist from calibre catalog csv format exported to Excel; now my main calibre library contains too many books for printed lists to be practical. (2) Investigated ODB drivers and front end interfaces to calibre's backend database engine for no purpose whatsoever related to managing books; and I don't even like using Structured Query Language anyway. (3) Added lots of custom fields, each for specific reasons that seemed to make sense at the time, redoing everything to use them; now nearly all of those custom fields are deleted and the workflows have evolved and settled into much simpler flows. (4) Developed whole elaborate tagging schema of my own and spent weeks revising it then trying to comply with it; maybe it was a worthwhile thought exercise, but I don't use it now and what I do use is much KISSier, simpler, easier to remember, harder to make mistakes with.

I also argued with the calibre forum moderators and other experts when I didn't know what I was talking about, yet I stubbornly and pridefully persisted. I still catch myself arguing sometimes, from a not-too-solid basis for argument. Their overall patience for stubborn/prideful fools like me continues to amaze me every time. I'd guess that I'm still on some folks' ignore lists here.

KISS Tips:

Do everything manually for two or three months before trying to automate anything with scripts, regex, templates, computed columns, or whatever. Attempted automation of a process without prior innate understanding of its manual mechanics usually wastes lots more time than it saves.

Pick one format, ePub or mobi, standardize on it, convert everything to it, consider it the master copy once it's cleaned up, delete any other formats, and generate whatever formats you need for various devices from that master on the fly as you need them. I now load my devices with only a few books at a time, the author or series I'm currently reading, and delete them from device when done. I add a content-rating tag to the book's record in my main library and delete those extra formats. Saves disk space and backup time. My "masters" are ePub rather than mobi even though my primary reading device is Kindle, because I noticed the calibre viewer opens ePubs faster, and calibre conversions seem to complete faster from ePub.

The only exceptions to the convert-to-one-master-format idea are items that don't convert well, such as books with complex graphics, computer language books, textbooks, scientific journal articles with equations, etc - all of which I currently keep out of my calibre libraries. If I had enough of them or subscribed to technical newsfeeds I'd keep a separate Technical Library in which technical items are stored in their native formats, but for now I don't worry about it because being retired I mostly just read fiction.

Copies. I always work on a copy, never directly on a "master" format. Also, I keep all downloaded files that got copied by calibre when I Added them. I throw them in a "Raw Books" folder on an external drive - I've found myself going back to search Raw Books numerous times for one reason or another. They have bad metadata or haven't been format-cleaned but at least they are the original incoming formats; keeping them available is a sort of insurance policy against future need arising from blunders made while tired or whatever.

Backups. If you don't have backup software automatically backing up your calibre libraries at least hourly to a different disk than the disk your calibre libraries are on, then get backup software and external drive if necessary and start doing so. I've had to recover from backups three different times after various blunders I made.

Data loss. Don't make any decisions that result in losing data until you're more experienced and become aware of potential ramifications. Some small examples that require lots of work later to rectify: Deleting The, An, A from title. Changing Spectra to Bantam so publishers are more consistent. Using only one of the 3 or 4 co-authors of an anthology where the actual editor's name isn't available, rather than taking the trouble to use all the names. KISSing things to the point of losing data is strongly not recommended unless you like wasting hours or days fixing things later.

Templates. I currently use one template only and that one adds series and series index to title for my Kindle. I don't bother with Kindle collections, preferring to refer to the tags I use in calibre's booklist.

Regex. I am only now starting to incorporate search/replace regex into my workflow. Until a couple weeks ago, the only regex I used for 6 months was calibre-menu-supplied for importing books by filename or putting series info into title on Kindle.

Custom columns. I use only 5 simple custom columns, soon reducing them to 4:
#isbn and #format, to see them at a glance.
#act, for temporary working tags when I'm processing a group of books one way or another.
#notes, occasional brief notes about an author or series.
#aka, for pseudonyms or author real names, but I don't use it much and will delete it soon because I really don't care much about that info and in cases where I do, I can put it in #notes with prefix "aka: ".

Series. I tried multiple series columns but eventually went back to using just the one default series column. When it's important to get several subseries in the right order, I handle multi-level series like this: SeriesUniverseAbbreviation; Series Name (a); SubseriesName. For example for Star Wars, as follows. SW; Clone Wars (b); SubseriesName. It's easy to search on "SW;" while ANDing any other desired keywords. If I use (a)'s, (b)'s, etc correctly, sorts by series come out in chronological order per original pubdate or recommended reading order, whichever I initially preferred it to be. But for most books in most series I don't worry or care about all that, and just use the smallest/lowest level subseries name. Sometimes I use the broadest series name only, put in reading order, such as "Valdemar" - I prefer to do that only when I'm certain of reading order and a series is complete. When a series is complete, in Tags I add %su which means that series or subseries has nothing missing and is up to date through the most recent series member in the library - a particularly useful tag in the case of a multiauthor series. For multiauthor series I use author name rather than seriesname in author column; if I want to see all series members in a list I do a search or just sort by series.

Metadata grabs. I grab only pubdate, publisher, and isbn. I keep only a few sources checked (figuring the more checked, the slower the grab.) Amazon's seemed more consistently accurate with broader item availability than others. I recently added Goodreads as a source, not sure how it compares to Amazon yet. Also by default I use ISBNdb and Open Library. Others I keep unchecked and only use on a case by case basis when necessary. Since several of the genres I read are speculative fiction, when the bulk metadata grab results don't please me I manually use ISFDB a lot to get good covers and, for books published since late seventies, ISBN 13s where available - it's amazing how much I use that site - and it's much faster that way than one by one in "Edit Metadata Individually" using Download Metadata button.

Format Issues:

Format quality. I decided early-on that I would only put cleaned-up or clean formats into my main library. So I have two working libraries, Main and Add. Add is for incomings and cleaning up their metadata and formats. When a batch has been cleaned in Add, I save them out with covers and opfs, delete them from Add, add them (using metadata instead of filename) to Main. That forces all the opf, format, and calibredb metadata to be consistent, at least until the next time I change any of that metadata in Main. While working on them in Add, for each book format I examine it once then assign a format quality code. This works for me at format level because I keep only one format per record. And in the future, for any multiple formats, I trust that initial evaluation and don't examine it again. I choose the format with best quality code over the others. I may have a _q3 sitting in Main for 6 months and then a _q4 for that title shows up in Add, so eventually I replace the _q3 format in Main with the _q4 from Add. I rarely use anything except _q0, _q3, or _q4. Doing it this way, worse formats have a chance of getting replaced by better later. That applies to anything that has something wrong with it: Advance Reader Copies, previously converted-from-text items with no bold/italics, problem formats, etc, all of which are convenient to keep as placeholders.

Format quality tags.
_q0 wishlist item. I also use it to color that book record's text red. And for bad formats, saves the trouble of creating an empty book record or empty book placeholder format.
_q1 not used.
_q2 mostly not used, a few cases = more than minor annoyance, not fixable, retained anyway.
_q3 okay, readable with only minor annoyance.
_q4 good, readable with no annoyance.
_q5 excellent (I don't bother with this, except for a few examples)

Format cleanups. I'm not a publisher or distributor or editor or author. Any cleanup I do takes valuable time. My goal isn't to make it perfect, but to spend the least amount of time possible to make it "readable by me with as little annoyance as possible." I examine all incomings for format quality, and then tag each with format-quality code. If it looks like I won't be able to clean it up in 5 minutes or less, I scrap it as not worth it or tag it _q0 and add a tag code for the type of format problems it has. I haven't worried about or cleaned up most Table of Contents (TOCs) because for most books except big omnibuses I don't use or care about TOCs. I do strip out header/footer/page# when I can without causing a bunch of split paragraphs, otherwise I scrap it or code it _q0. I'm comfortable in Word so I go this workflow format-conversion route: calibre epub --> rtf saved out --> Word (search/replace) docx --> Open Office odt --> add to calibre --> epub. The conversions from rtf to docx to odt each cleans out Microsoft format garbage and reduces size considerably. If I were more comfortable in Open Office I'd simplify that workflow a lot, but I'm not. I don't bother going the html fix-it-up path because for my purposes that's overkill and I'm not that comfortable with it yet, in Sigil or even in html tags in general. Eventually I'll probably switch over to doing it the html way, which is apparently more precise, more flexible, more similar to internal calibre-conversions, but I'm not there yet. So for now I don't worry about it, when doing it the way I'm most comfortable works pretty well.

Empty books. I don't use Empty Book command or Empty Book records. Based on someone's advice here early on, I created a folder containing empty files (originally text, later converted to epub) titled Empty01 through Empty10 by author AAA, TBD. When I need to, I add a group of 10 "empty" books and change the metadata of one or several appropriately, keeping it format code _q0. The reason to use a file format instead of an actual empty book record with no format is simple. Empty books don't get included when you save a selection of books out. If you want them included in saves that eventually get added to a different library, book records indicating wishlist items need to hold a format. Edit/Correction: Per Kiwidude, Copy to Library does copy empty books holding no format, while SaveToDisk doesn't.

CSS. Edit/Correction: Per Starson, it was a wrong assumption on my part to say the Conversion CSS style sheet box applied only to the viewer, and not to the format. So now that's one of the complexities I want to learn soon.

KISS is your friend. KISS is my friend. I detailed some of the ways I do things now to illustrate that as time went on my workflow and habits got KISSier rather than more complex. I'm not advocating that you do things my way. But I am advising new calibre users to consciously use KISS as much as possible, and that there's no need to get hung up in the complexities. Eventually, if something more complex makes sense to you, by all means start using it. But don't feel like you have to know everything there is to know about regex or templates or CSS or whatever all right away up front. I made the mistake of letting complexities get in my way at first instead of (if I were smarter) just ignoring them. When I KISSed my calibre use things started going a lot smoother for me. And now when one of those complexities starts to beckon, I have some relatively KISSy baseline calibre knowledge and practices, which I can take little steps away from while beginning to explore that particular complexity.

Last edited by unboggling; 09-22-2012 at 08:04 AM. Reason: Link to newest thread and version.
unboggling is offline