View Single Post
Old 08-21-2011, 12:19 AM   #119
unboggling
by the bootstraps
unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.
 
Posts: 1,053
Karma: 858115
Join Date: Jan 2011
Location: Southeast US
Device: PRS-T2, Nexus 7, KindleT, iPad1, Kindle3KB
---

Link to latest version: Version 0.80, 2011-09-16, Post #243.

---



KISS (v 0.31)


Project Information:

Project. KISS my use of eBooks. KISS my use of calibre.

Purpose of Project. Learn about eBooks. Determine strategies and methods for gathering, managing, cleaning, and reading eBooks, and determine relevant "best practices." Learn to use calibre and associated software better. Manage eBooks better.

Baseline. I am proceeding from a baseline that a completely brand new user to eBooks and calibre might have at the start, after doing these initial tasks as soon as possible: read Quick-Start Guide thoroughly, set system security auto-scan antivirus to exclude calibre libraries, learn and set how to get the specific metadata desired onto user's current primary reading device (for example the metadata plugboard for my Kindle), and learn and set how to deal with Digital Rights Management (DRM) as it relates to converting eBooks to format of choice for reading device of choice.

Request Feedback. All strategies and work habits are proposed and are not offered as advice to anyone, but as examples of what one new user is doing, struggling with, or trying to do. I hope this may be useful to other new users, increasing in usefulness over time as it is refined in successive iterations. I request feedback from experienced users correcting bad assumptions and suggesting better strategies, methods, work habits, or workflow. Feedback from new users is also welcome. Please post feedback and discussion in this thread or send me a private message.

Spoiler:
Explanation of KISS. I'm using the principle "Keep It Simple Stupid" (K.I.S.S.) as a verb meaning to simplify a complex project or task. The word "Stupid" in the principle is not used or intended in a pejorative manner against myself or anyone else, but instead has a connotation of praise. A brief explanation at Wikipedia is here.

Reasons to KISS. When I started out new to eBooks and calibre in January 2011, I frequently felt overwhelmed. Paper books (pBooks) are fundamentally different than eBooks, generating a need to determine different strategies and methods for using eBooks, which I hadn't done yet. The calibre application for managing eBooks is simple to use and accomodates more advanced users with many features and complexities, but I was overwhelmed at first because I didn't know much about eBooks in general and I let myself get tangled in calibre complexities. So I sidetracked into tangents for awhile before settling in. Later I noticed that the more I learned about eBooks in general and the more I consciously simplified my use of calibre, the more I was successful in managing eBooks.

Project Summary To Date. At 7 months in, I realized I needed feedback on my then-current strategies, methods, and work habits, so I laid out what I was doing hoping to get discussion. Details, discussion, and KISS versions 0.10 and 0.21 are available in this forum thread "KISS for New calibre Users". Per that discussion, I'm recasting the KISS posts from "giving advice to new users" to "documenting what I'm doing, as one new user." Also per discussion there, I scrapped everything and started over at baseline zero with both eBooks and calibre. I intend to do things better ways than I had been, during this next iteration cycle of my calibre use.

Status at Present. I use a MacBookPro17 running latest released version of OS X Snow Leopard with latest released binary versions of calibre for OS X, and an older MacBook set up the same way, both on an Airport wireless network. Both computers use auto-updated ESET Cybersecurity on auto-scan with calibre libraries excluded. My devices Kindle3, iPad1, and iPhone3 are also up to date with OS's and firmware. I have one Library named Main containing one eBook in a clean new calibre installation. With plugins added and preferences configured, user-specific info added (such as user name or ISBNdb key), custom columns added, 1 template in metadata plugboards added, no new regex menu items added, no tweaks. Conversion settings are at default except for Input Options/ComicBook yes to "Disable conversion of images to black/white". In Look and Feel/Column Coloring I used tag "_q0" (a format quality rating) to color authors, series, title, and tags red. In Behavior I checked yes to all listed formats figuring I can always uncheck any later. My Raw Books folder outside of calibre is empty and I now have only a few miscellaneous eBooks scattered around: calibre Quick-Start Guide (in Main), Kindle User Guide, various user guide pdf formats, and a few technical, reference, and other old pdfs.



Strategy for eBooks (Proposed, for me):

Overall Strategy. Obtain, cleanup, and read "on demand." I plan to make exceptions where warranted for good reasons such as "found a great edition that's hard to find" or "it's temporarily on sale" and "I want it enough to break the rule." But my primary purpose is to read eBooks, not to gather, clean, and hold them.

Spoiler:
Gathering Strategy. Gather for read-on-demand strategy as possible. Primarily by browsing the internet. Places like Amazon, Barnes & Noble, Baen, and so on.

Cleaning Strategy. Lean and mean according to my skill level. Minimize multiple conversions in sequence. Retain original incoming format. Always do major clean-up work on copies first saved outside of calibre just as a matter of habitual hygiene.



Work Habits, General (Proposed, for me):

Spoiler:
Change. When I learn a new strategy, method, or skill, my strategies, methods, and work habits suddenly or gradually change to accomodate whatever I learned.

Cybersecurity. My anti-virus software is set to auto-scan all volumes, with specific settings to exclude calibre libraries from scans. That prevents the security software from causing performance slow-downs when calibre accesses the book files for one reason or another, which it does frequently. The books that were added to calibre had been scanned at download, then scanned again if they were accessed by any other application such as a compression expander or a reader application, then scanned again during Add Books to calibre.

Backups. I use backup software (Time Machine) to automatically backup my internal disk on an hourly basis to an external drive. The calibre application and all associated files are on my internal disk. I have file hosting/syncing services (Mobile Me, DropBox, SugarSync) but haven't used them with calibre because I didn't want another layer of complexity yet. In the future if I do use a file hosting/syncing service, I will continue to do an automated backup of my own rather than depending on a server owned by someone else.

Raw Books. I keep all downloaded files that got copied by calibre when I Added them. After Adding, I put them in folders in this structure: Raw Books/Source/DL Group/Original Filename. DL Group refers to identifying a group of files downloaded at once, or to a broader category such as Baen Free CDs. I've found myself searching Raw Books numerous times and re-adding books into calibre for one reason or another. They have bad metadata or haven't been cleaned up but at least they are the original incoming formats; keeping them available is an insurance policy against future need.

Tangents. I want to avoid running off on complex time-consuming tangents that take me away from my main purpose, reading eBooks. Tangent examples: I spent a week writing and revising a script for printing the calibre catalog in Comma Separated Value (CSV) format exported to Excel; printouts aren't practical once the number of books exceeds a couple thousand. I investigated drivers and front end interfaces to calibre's backend database engine, and I don't even like using Structured Query Language anyway. I developed a whole elaborate tagging schema of my own, spent weeks revising it then trying to comply with it; I don't use it now. I did those things initially to avoid feeling overwhelmed.

Automation. I do a process manually for awhile until I'm familiar with it before trying to automate it with scripts, regular expressions, templates, computed columns, or whatever. Attempted automation without prior innate understanding of the manual mechanics usually wastes lots more of my time than it saves. Not knowing what I was doing but trying to combine different types of automation all at once anyway led to me feeling frustrated and overwhelmed.



Work Habits, Devices (Proposed, for me):

Spoiler:
Device Choices Per Type of Reading. I use Kindle most for fiction and iPad for technical and graphics eBooks. I haven't used iPhone for reading eBooks, although it's nice to know it's set up with readers in case of emergency with the other devices.

Conversion Before Device Loading. Delay conversion until just before reading, convert on-the-fly to whatever format is necessary for the desired device. Preferred conversion source format is the original incoming format that had been retained after adding to calibre, if it didn't need fixing, otherwise a cleaned-up format. I leave nearly all conversion settings at default until I've tested a setting change enough to know what it does.

Device Loading. I load my devices only 2 or 3 books at a time then when finished reading, delete off the device using calibre directly (preferred if possible), directly from iTunes (haven't tried that), or directly from device. After reading I assign a content rating in calibre metadata, then decide if there's even a slight chance I'll read it again. If not, I'll probably delete it from Main unless it can be useful as a placeholder item.



Work Habits, calibre (Proposed, for me):

Spoiler:
Metadata Loss. I don't make any decisions that result in losing metadata, until I'm more experienced and become aware of potential ramifications. Examples of what I did that needed considerable work later to fix: Deleted The, An, A from title. Changed Spectra to Bantam so publishers were more consistent. Used only one of the 3 or 4 co-authors of an anthology where the actual editor's name wasn't available, rather than taking the trouble to use all the names. I don't want to KISS things to the extreme of losing data.

Mouse Tips in calibre. (The little boxes that come up when I hover the cursor over something.) I plan to pay more attention to them than I used to. Their info is important and much more up to date than the manual due to calibre going through revisions and updates so rapidly.

Preferred Format. I prefer ePub format because it opens fast in the calibre viewer, works on my iPad without conversion, usually easily converts to Mobi format for my Kindle, and is useful for clean-up purposes.

Formats To Keep. Mobi (for my Kindle), ePub (for my iPad, initial evaluation, and possible clean-up), original incoming format (for clean-up-related or other conversion).

Formats To Read. When the device supports it, I want to read the book in it's native incoming format if it doesn't need clean-up, otherwise I'll read a conversion after any necessary clean-up.

Timing for Obtaining eBooks. I want to acquire on demand shortly before reading a book, if I don't already have it. Methods to determine what I shortly want to read: generally, browse the internet; specifically, To Be Determined (TBD).

Timing for Tasks. Using the calibre viewer I evaluate an incoming eBook's original format shortly after it's added then assign a format quality tag. If it's clean-up-able or better I delay any conversions for clean-up or conversions for reading until shortly before reading. If it's not, I tag it as a wishlist/placeholder item and go looking for a better format for that title, either right away or later. I delay clean-up and conversions until just before reading for several reasons. My skill at it will be better in future than now. Conversions in sequence for whatever reason is similar to photocopying copies of copies - the result is worse than copying the original once.

Templates. I currently use one template. It's in Metadata Plugboards and adds series and series index to title for my Kindle. {series}{series_index:0>2s| - | - }{title}

Regular Expressions (regex). I am just beginning to incorporate search/replace regex into my workflow. The only regex I used for 6 months was calibre-menu-supplied for importing books by filename, or the Kindle title template.

Adding Books. Methods depend on a group of books' place of origin (source) and their filename structure. If that source provides good metadata in the formats, it's easiest for me to add them to calibre by reading that metadata from internal file rather than filename. If not, there are several methods using filename. Manually fix all the author/series/title info in the filename out in operating system (OS), or do it there using renaming tools (which use regex also) by batching together files with similar structures to fix, then Add Books to calibre once filenames are fixed to match one of the Add Books by filename regexs in the menu. If I knew enough regex I could write and use successively a different regex to match each varying file structure of those books and import the right info into the right field during Add Books - but I don't know regex enough yet. There are also scripts and tools available on this forum to help solve this problem, standardizing names before import. Or I could add that group of books as a mess without fixing names in the OS, then fix everything selectively using regex in the Edit Metadata in Bulk Search/Replace window. I've tried it all those ways except with renaming tools or scripts, and they all require work, but it seems importing the whole mess into calibre without standardizing names first requires even more time fixing it up than the other methods, for someone like me who isn't adept at regex yet.

Custom columns. I use these.
#isbn, to see at a glance (computed from other columns).
#format, to see at a glance (computed from other columns).
#act, for temporary working tags to batch process groups of books (comma separated text like tags).
#notes, for variant titles, pseudonyms, miscellaneous (text, show in tag browser).
#source, for origin of book (comma separated text like tags).

Series. I tried multiple series columns but eventually went back to using just the one default series column. When it's important to get several subseries in the right order, I handle multi-level series like this: SeriesUniverseAbbreviation; Series Name (a); SubseriesName. For example for Star Wars, as follows. SW; Clone Wars (b); SubseriesName. It's easy to search on "SW;" while ANDing any other desired keywords. If I use (a)'s, (b)'s, etc correctly, sorts by series come out in chronological order per original pubdate or recommended reading order, whichever I initially preferred it to be. But for most books in most series I don't worry or care about all that, and just use the smallest/lowest level subseries name. Sometimes I use the broadest series name only, put in reading order, such as "Valdemar" but I prefer to do that only when I'm certain of reading order and a series is complete. When a series is up to date, in Tags I add %su which means none missing and up to date through the most recent series member in the library - a particularly useful tag in the case of a multi-author series. For multi-author series I use author name rather than seriesname in author column; if I want to see all series members in a list I do a search or just sort by series.

Metadata Downloads. I grab only published-date, publisher, comments, cover, and isbn (isbn is calibre-automatic). I always grab cover, even when it has internal cover. I keep only a few sources checked (figuring the more checked, the slower the grab.) Amazon's seemed more consistently accurate with broader item availability than others. By default I also use ISBNdb and Open Library. Others I keep unchecked and only use on a case by case basis when necessary. Since several of the genres I read are speculative fiction, when the bulk metadata grab results don't please me I manually use ISFDB to get good covers and ISBN13s where available - it's amazing how much I use that site - and it's faster that way than one by one in "Edit Metadata Individually" using Download Metadata button.

Format Quality Tag Abbreviations. In a Format Quality Tag, the way I use them, for example "_q4" - the "_" sorts it to the beginning of the list of comma separated tags in the Tag column. The "q" reminds me that it's a format-Quality rating, and the "4" is the rating. I don't suggest anyone else do it that way, necessarily. The reason I use abbreviations in tags is to see them all in a small space in one relatively narrow column without needing to scroll through a long wide column or long sequence of different columns.

Format Quality. I want to achieve the highest level of format quality my skills allow, but I also want to minimize the time I spend fixing things. My clean-up skills gradually improve over time so older "cleaned" books in my library tend to have lower format quality levels than newer books. That's why I want to delay clean-ups until just before reading. For each new added book I examine it once then assign a format quality rating tag, as well as a separate tag identifying it as needing cleaning if it needs it. That quality rating applies to all the formats retained in that book record. When I want to keep formats for that title that have different format quality ratings, I make sure I put them in their own record with a higher or lower format quality rating. When I find and compare title duplicates the main criterion I use is the format quality rating and retain the record holding the higher quality format. I may have a _q3 sitting in the library for 6 months and then a _q4 for that title shows up, so eventually I replace the _q3 format with the _q4. I rarely use anything except _q0, _q3, or _q4. Doing it this way, worse formats have a chance of getting replaced by better later. That applies to anything that has something wrong with it: Advance Reader Copies, incomings in any format that have no bold/italics (usually caused by it previously having been a text format), problem formats of any kind, all of which are convenient to keep as placeholders.

Format Quality Tags.
_q0 wishlist item. I also use it to color that book record's text red. And for bad formats, saves the trouble of creating an empty book record or empty book placeholder format.
_q1 not used.
_q2 mostly not used, a few cases = more than minor annoyance, not fixable, retained anyway.
_q3 okay, readable with only minor annoyance.
_q4 good, readable with no annoyance.
_q5 excellent, I don't bother with this, except for a few examples.

Format cleanups. I'm not a publisher, distributor, or editor. Any cleanup I do takes valuable time. My goal isn't to make it perfect, but to spend the least amount of time possible to make it "readable by me with as little annoyance as possible." I examine all incomings for format quality, and then tag each with format-quality tag. If it looks like I won't be able to clean it up in 5 minutes or less, I scrap it as not worth it or tag it _q0 and add a tag for the type of format problems it has. I always work on a copy rather than on the main format in calibre. I haven't worried about or cleaned up most Table of Contents (TOCs) because for most books except big omnibuses I don't use or care about TOCs. I do strip out header/footer/page# when I can without causing a lot of split paragraphs, otherwise I scrap it or code it _q0. I'm comfortable in Word so I had been using this workflow format-conversion route: calibre epub --> rtf saved out --> Word (search/replace) docx --> Open Office odt --> add to calibre --> ePub. The conversions from rtf to docx to odt each reduced size considerably. During this next iteration of calibre use, I want to reduce the number of conversions and simplify that process so I'm now looking for better and simpler routes. I can simplify that workflow by eliminating Word and opening the rtf directly into Open Office and fixup there. Once I learn more HTML to be comfortable with it, I can switch to doing clean-ups going epub --> clean-up in Sigil --> epub. The Sigil route seems most efficient with the least conversions, so I'm making it a high priority to learn that.

Tags. Since beginning with calibre I gradually used the tag browser less and the search box more. At about 3 or 4 months in I stopped using a lot of columns for genre, booktype, series-status and so on, and started using the default tags column, with tag prefixes to designate tag type, accompanied by abbreviations. Example of how I use tags now on a book:
_FormatQuality, ((GenrePrimary, (GenreSecondaries, [Type, %status, miscellaneous
_q4, ((fn, (mgc, (ya, [om, %sma, %su, r3
That translates to: Format good, fantasy with magic, young adult, omnibus, in multiauthor series, series up to date, myRating=3stars (and I've read it, otherwise it wouldn't be rated).

Empty books. I don't use Empty Book command or Empty Book records. I created a folder containing empty files (originally text, later converted to epub) titled Empty01 through Empty10 by author AAA, TBD. When I need to, I add a group of 10 "empty" book formats and change the metadata of one or several appropriately, keeping it format quality tag _q0. The reason I do this is because empty books don't get included when I SaveToDisk a selection of books out. When I want them included in saves that eventually get added to a different library, book records indicating wishlist items need to hold a format. If I want to avoid all that I can just use CopyToLibrary instead which does copy empty books without formats.

Plugins. I use these Plugins frequently - Find Dupes, Open With, Search Internet, Quality Check.

Collections on Devices. I don't use or want them. I prefer referring to tags in main library when necessary.

Features not used or incorporated in workflow. Fetch News, Get Books, collections in calibre or on devices, calibre Server, CSS, Sigil in conjuntion with calibre for clean-ups.



Short Term Goals (Proposed, for me):

Learn. Learn CSS and Sigil. Begin learning HTML, familiarize with HTML tags. Learn better strategies and methods on Mobile Read, in an ongoing manner.

Last edited by unboggling; 09-17-2011 at 03:29 AM. Reason: Link to newer version.
unboggling is offline