View Single Post
Old 08-13-2012, 06:53 AM   #1
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
[GUI Plugin] Import List

This wizard-based plugin allows you to match existing or create empty books in calibre based on lists of books from external sources. For some users it may be they want to import an existing list of their own reading/books, for others it can be that they want to import lists of bestsellers, popular books, genre recommendations, award winners and so on. Once matched you can integrate with the Reading List plugin to record as a list to read, send to a device or just view on screen.
  • Import from Clipboard - paste in a list of book title/authors copied to your clipboard, such as from a website forum post of favourite books.
  • Import from CSV File - many users have lists of books stored in applications like Excel which can export to CSV, or use websites like Goodreads/Library Thing which also have export to CSV capability.
  • Import from Web Page - over 100 predefined websites are configured (including Goodreads, Amazon etc) or you can add your own. Included in the predefined websites is Fantastic Fiction, which can be an easy way to scrape title, author, series and pubdate metadata for all books by an author. Note that for this and any of the predefined websites that you are not limited to the specific URL configured. You can navigate to the website page of interest in your browser, then drag/drop or copy the URL into the Website tab of this plugin. most websites use the same layout for their webpages so no other configuration needs to be changed.
Using the wizard is a three step process (more detail in example spoilers below):
  • STEP 1: Select/configure a list source - either choose a predefined source or configure your own.
  • STEP 2: Resolve matches - the plugin uses fuzzy logic algorithms to best match against existing books in your library. You can then fine tune the results with further searches and/or choose to add empty books for those that do not exist in your library.
  • STEP 3: Display/save the results - with the matched results you can create/append to a Reading List plugin list or just display temporarily on screen. You also have the option of saving your customised configuration as user settings for future reuse.

Main Features of v1.1.5
  • Import lists of books from Clipboard, CSV files or websites.
  • Choose from over 100 predefined websites and/or add your own configurations.
  • Import into standardfields, identifiers or custom columns
  • Option to update metadata of existing books
  • Predefined websites can be viewed as a list or grouped by category
  • Websites can be directly opened in a web browser
  • Supports importing title, author, series, series index and pubdate (all but title are optional)
  • Customise clipboard imports with regular expressions (common examples available on a dropdown)
  • Customise CSV imports to define the numbered column and other options such as delimiters.
  • Customise website imports using XPath expressions, with highlighting available to show matches.
  • Website URLs support template expressions to allow automatic substitution of values such as dates. For an example look at the Goodreads Popular This Month/Year settings.
  • Automatically match books in your library using a progression of identical/similar/fuzzy matching algorithms
  • User can manually search/refine matches, create empty books, remove books from the list etc.
  • Optionally put the resulting matched books into a list for use with the Reading List plugin, or just display on screen
  • Configurations can be exported/imported for sharing with other users.

Special Notes:
  • Requires Calibre 0.8.57 or later.

Installation Notes:
Usage Example 1: Workflow for importing from a website
Spoiler:
  • Choose the Predefined setting tab
  • To see the webpage in your browser for that site, click on the Browser button (optional step)
  • Double-click or click "Preview" to see the titles/authors for that website link
  • Click Next to see what books have automatically been matched against those in your library. It uses a variety of special fuzzy algorithms to attempt this initial pass.
  • For any books that haven't yet matched, you can double-click on them in the top grid to execute a calibre search showing results in the bottom. Refine the search if needed and double-click on the book in the bottom to select it as the match. Alternatively you can add an empty book or remove that book from the list. Of course you don't have to match/delete every book - it is up to you how much of the "list" you want to keep before moving to the next wizard step.
  • Click Next to be given the choice of optionally saving all the matched books to a reading list. You can also save your list coniguration (more relevant when using your own list sources).
  • Click Finish to see the books displayed in your library.

Usage Example 2: Getting books for an author from Fantastic Fiction
Spoiler:
  • Bring up the web page for that author in your browser. I suggest using the Search the Internet plugin as the fastest way of doing so.
  • Start the Import List plugin
  • From the Predefined tab, choose the "Fantastic Fiction" setting and click Edit
  • Either drag/drop or paste the url from your browser into the "Download from url" combo at the top of the Web Page tab.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.

Usage Example 3: Loading books from the clipboard
Spoiler:
  • Use the Clipboard tab.
  • Paste in your text
  • Specify your regular expression to extract the title/author. At a minimum you must have a title.
  • Some predefined expressions are available to help in the script button to the right of the dropdown.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.

Usage Example 4: Load from a CSV file
Spoiler:
  • Use the CSV File tab.
  • Browse to the file, click Preview and the columns should be displayed.
  • Alter your separator if required and specify the title/author column numbers.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.

Usage Example 5: Scrape from a custom website
Spoiler:
  • Select the Web Page tab.
  • Click the Clear button to remove any previous website settings.
  • Drag/drop or paste the url into the Url combobox.
  • Click preview to see the underlying html that you will be specifying XPath expressions for.
  • You might find it useful to get to the part of the source html of interest by using the "Find" text in this dialog, typing for instance the first book name in the list and clicking on the Find button.
  • You can *either* specify an XPath to what I call a "row" for each book, and then use a relative XPath to the title/author, *or* you can just specify a direct XPath to the title/author. It depends on the site as to which approach works best.
  • Use the marker icon on the right of each XPath combo to preview what text your expression is going to select. You can step forward/backward through the matches.
  • Your Title and Author expressions must extract the text(). The optional regular expression in the Strip field will then be applied, along with a number of special cleanups coded within the plugin to strip common unnecessary characters.
  • If desired you can reverse the ist order - for instance a countdown from 50 to 1 on the web page you might want in your list as 1 to 50.
  • There are some additional complex options for dealing some difficult websites which use non utf-8 encodings, or require javascript to execute to load the page content. Some Amazon pages make use of these settings for an example.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.


Paypal Donations:
  • If you find this or any of my other plugins useful please feel free to show your appreciation. I have spent many thousands of unpaid hours in their development and support so any encouragement for me to continue is appreciated!

Version History:
Spoiler:

Version 1.1.5 - 19 Aug 2014
Support for upcoming calibre 2.0

Version 1.1.4 - 13 Oct 2013
Fix for search matched books list right clicks not working correctly

Version 1.1.3 - 30 Sep 2013
Submission from wolf23 to support comma separated values for any type of custom column that supports multiple values.

Version 1.1.2 - 29 Sep 2013
Remove the select next matched/unmatched etc buttons/menus. Replace with Show All/Matched/Unmatched radio buttons that filter.
Add a predefined setting for the Goodreads search results page
Fix various broken predefined settings from changes to websites being scraped
Change logic so that if scraped website book data has no title it is not added to the right-hand side preview books list
Always put a horizontal scroll bar on both lists in the Resolve page of wizard to reduce change of vertical scrolling out of sync
If multiple values for a tags field from xpath when scraping off the web, separate the values with a comma and store as a single value.

Version 1.1.1 - 13 Dec 2012
Skip blank rows in CSV files when previewing
Always display headers on CSV tab rather than hiding when Skip first row is checked

Version 1.1 - 25 Oct 2012
Allow importing into custom columns
Allow updating metadata of existing books in your library from data off the list (standard or custom columns)
Allow clipboard, csv and web page import to retrieve into a dynamic set of calibre standard columns including identifier fields
Dynamically change the Preview, Search and Match columns presented to match what is configured for the import source
Remove the ability to specify columns to display on the configuration screen
Fix matched count on the Resolve step so multiple matches are not counted as matched
Fix for importing non utf-8 csv files such as for titles from LibraryThing
Allow direct editing of plugin configuration data from config dialog (use at your own risk!)

Version 1.0 - 13 Aug 2012
Add a "Select all matched" option to the right-click menu

Version 0.4 - 16 Jul 2012
If a book has a series name but no series index, default to a series index of zero

Version 0.3 - 15 Jul 2012
Add Series and Series Index columns to the csv page, web page tab and preview, change FF to scrape into this column.
Add pubdate, series and series_index to the Clipboard tab
Fix bug of highlighting involving self-closing tags
Fix incorrect XPath for the Goodreads Shelves lists
Additional change to way encodings are handled to simplify into utf-8
Change the icon size to 24 rather than 16 on the predefined/user settings

Version 0.2 - 15 Jul 2012
Allow author XPath expressions to not be relative to a row XPath
Add a Pubdate column to the csv page, web page tab and preview, change FF to scrape the year into this column.

Version 0.1 - 14 Jul 2012
Initial beta release of Import List plugin

Attached Thumbnails
Click image for larger version

Name:	Page1a_PredefinedList.png
Views:	2030
Size:	75.1 KB
ID:	90680   Click image for larger version

Name:	Page1b_PredefinedTree.png
Views:	1002
Size:	58.6 KB
ID:	90681   Click image for larger version

Name:	Page1c_UserSettings.png
Views:	889
Size:	47.8 KB
ID:	90682   Click image for larger version

Name:	Page1d_ImportClipboard.png
Views:	3872
Size:	56.7 KB
ID:	90683   Click image for larger version

Name:	Page1e_ImportCSV.png
Views:	705
Size:	62.5 KB
ID:	90684   Click image for larger version

Name:	Page1f_ImportWeb.png
Views:	645
Size:	64.8 KB
ID:	90685   Click image for larger version

Name:	Page2Matches.png
Views:	627
Size:	51.1 KB
ID:	90686   Click image for larger version

Name:	Page3Save.png
Views:	652
Size:	30.7 KB
ID:	90687   Click image for larger version

Name:	Page1g_Fields.png
Views:	536
Size:	15.8 KB
ID:	94902   Click image for larger version

Name:	Page1h_Regex.png
Views:	462
Size:	11.4 KB
ID:	94903   Click image for larger version

Name:	Page2b_UpdateMetadata.png
Views:	447
Size:	13.0 KB
ID:	94904  
Attached Files
File Type: zip Import List-qt5.zip (116.7 KB, 6447 views)

Last edited by kovidgoyal; 08-19-2014 at 02:39 AM. Reason: v1.1.5 Released
kiwidude is offline   Reply With Quote