View Single Post
Old 07-14-2012, 03:05 PM   #19
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
v0.1 Beta

Here at last is a version for people to play with. I've updated the screenshots on the first page - quite a number of things have changed over the last month as I have refined things.

There's a lot of subtle, hidden behaviour that I won't bore people with at this point. Be sure to look at right-clicks on the various grids to see some of it. Also in most lists/grids double-clicking on things tends to shortcut a lot of the action. I have also incorporated chaley's template language so a number of the URLs such as with Goodreads now dynamically resolve their dates to the URL using that, such as "Popular this Month" or "Popular this Year". You can customize what columns to display in the "Resolve" page of the wizard using the "Options" button for the wizard (you must close/reopen it to take effect). Note that any columns you add will be read-only.

For a quick example of how the workflow works for using a predefined web page list setting:
Spoiler:
  • Choose the predefined setting tab
  • To see the webpage in your browser for that site, click on the Browser button (optional step)
  • Double-click or click "Preview" to see the titles/authors for that website link
  • Click Next to see what books have automatically been matched against those in your library. It uses a variety of special fuzzy algorithms to attempt this initial pass.
  • For any books that haven't yet matched, you can double-click on them in the top grid to execute a calibre search showing results in the bottom. Refine the search if needed and double-click on the book in the bottom to select it as the match. Alternatively you can add an empty book or remove that book from the list. Of course you don't have to match/delete every book - it is up to you how much of the "list" you want to keep before moving to the next wizard step.
  • Click Next to be given the choice of optionally saving all the matched books to a reading list. You can also save your list coniguration (more relevant when using your own list sources).
  • Click Finish to see the books displayed in your library.

Another example - getting books for an author via Fantastic Fiction:
Spoiler:
  • Bring up the web page for that author in your browser. I suggest using the Search the Internet plugin as the fastest way of doing so.
  • Start the Import List plugin
  • From the Predefined tab, choose the "Fantastic Fiction" setting and click Edit
  • Either drag/drop or paste the url from your browser into the "Download from url" combo at the top of the Web Page tab.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.

If you want to load your own list of text from the clipboard (such as copied from a forum post or web page):
Spoiler:
  • Use the Clipboard tab.
  • Paste in your text
  • Specify your regular expression to extrac the title/author. At a minimum you must have a title.
  • Some predefined expressions are available to help in the script button to the right of the dropdown.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.

To load from a CSV file (such as a calibre CSV catalog file, a Goodreads export or whatever):
Spoiler:
  • Use the CSV File tab.
  • Browse to the file, click Preview and the columns should be displayed.
  • Alter your separator if required and specify the title/author column numbers.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.

To scrape from a different website not already configured in this plugin:
Spoiler:
  • Select the Web Page tab.
  • Drag/drop or paste the url into the top combo.
  • Click preview to see the underlying html that you will be specifying xpath expressions for.
  • You might find it useful to get to the part of the source html of interest by using the "Find" text in this dialog, specifying for instance the first book name in the list and clicking on the Find button.
  • You can *either* specify an xpath to what I call a "row" for each book, and then use a relative xpath to the title/author, *or* you can just specify a direct xpath to the title/author. It depends on the site as to which approach works best.
  • Use the marker icon on the right of each xpath combo to preview what text your expression is going to select. You can step forward/backward through the matches.
  • Your Title and Author expressions must extract the text(). The optional regular expression in the Strip field will then be applied, along with a number of special cleanups coded within the plugin to strip common unnecessary characters.
  • If desired you can reverse the ist order - for instance a countdown from 50 to 1 on the web page you might want in your list as 1 to 50.
  • There are some additional complex options for dealing some difficult websites which use non utf-8 encodings, or require javascript to execute to load the page content. Some Amazon pages make use of these settings for an example.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.

Just have a play around and experiment - you can't harm your library in any way (at worst you will create some empty books if you choose to do so and click Finish on the last wizard page). It may be that you never actually "import" a list, and instead just use the plugin as a quick way to launch various websites from the category view of predefined sites.

There's probably around a 100 website pages all preconfigured at this point covering various types of lists be they "popular", "bestselllers", "new releases", "top xxx" or indeed just bibliography style with Fantastic Fiction. I'll add to this over time - if you have a site/page not covered and want to see it included just feel free to ask - I don't expect everyone to be bothered with figuring out xpath expressions though it can be a fun challenge at times to do so if you are so inclined...

Last edited by kiwidude; 07-15-2012 at 10:22 AM. Reason: Removing attachment as later version in this thread
kiwidude is offline   Reply With Quote