Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 08-13-2012, 05:53 AM   #1
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
[GUI Plugin] Import List

This wizard-based plugin allows you to match existing or create empty books in calibre based on lists of books from external sources. For some users it may be they want to import an existing list of their own reading/books, for others it can be that they want to import lists of bestsellers, popular books, genre recommendations, award winners and so on. Once matched you can integrate with the Reading List plugin to record as a list to read, send to a device or just view on screen.
  • Import from Clipboard - paste in a list of book title/authors copied to your clipboard, such as from a website forum post of favourite books.
  • Import from CSV File - many users have lists of books stored in applications like Excel which can export to CSV, or use websites like Goodreads/Library Thing which also have export to CSV capability.
  • Import from Web Page - over 100 predefined websites are configured (including Goodreads, Amazon etc) or you can add your own. Included in the predefined websites is Fantastic Fiction, which can be an easy way to scrape title, author, series and pubdate metadata for all books by an author. Note that for this and any of the predefined websites that you are not limited to the specific URL configured. You can navigate to the website page of interest in your browser, then drag/drop or copy the URL into the Website tab of this plugin. most websites use the same layout for their webpages so no other configuration needs to be changed.
Using the wizard is a three step process (more detail in example spoilers below):
  • STEP 1: Select/configure a list source - either choose a predefined source or configure your own.
  • STEP 2: Resolve matches - the plugin uses fuzzy logic algorithms to best match against existing books in your library. You can then fine tune the results with further searches and/or choose to add empty books for those that do not exist in your library.
  • STEP 3: Display/save the results - with the matched results you can create/append to a Reading List plugin list or just display temporarily on screen. You also have the option of saving your customised configuration as user settings for future reuse.

Main Features of v1.1.4
  • Import lists of books from Clipboard, CSV files or websites.
  • Choose from over 100 predefined websites and/or add your own configurations.
  • Import into standardfields, identifiers or custom columns
  • Option to update metadata of existing books
  • Predefined websites can be viewed as a list or grouped by category
  • Websites can be directly opened in a web browser
  • Supports importing title, author, series, series index and pubdate (all but title are optional)
  • Customise clipboard imports with regular expressions (common examples available on a dropdown)
  • Customise CSV imports to define the numbered column and other options such as delimiters.
  • Customise website imports using XPath expressions, with highlighting available to show matches.
  • Website URLs support template expressions to allow automatic substitution of values such as dates. For an example look at the Goodreads Popular This Month/Year settings.
  • Automatically match books in your library using a progression of identical/similar/fuzzy matching algorithms
  • User can manually search/refine matches, create empty books, remove books from the list etc.
  • Optionally put the resulting matched books into a list for use with the Reading List plugin, or just display on screen
  • Configurations can be exported/imported for sharing with other users.

Special Notes:
  • Requires Calibre 0.8.57 or later.

Installation Notes:
Usage Example 1: Workflow for importing from a website
Spoiler:
  • Choose the Predefined setting tab
  • To see the webpage in your browser for that site, click on the Browser button (optional step)
  • Double-click or click "Preview" to see the titles/authors for that website link
  • Click Next to see what books have automatically been matched against those in your library. It uses a variety of special fuzzy algorithms to attempt this initial pass.
  • For any books that haven't yet matched, you can double-click on them in the top grid to execute a calibre search showing results in the bottom. Refine the search if needed and double-click on the book in the bottom to select it as the match. Alternatively you can add an empty book or remove that book from the list. Of course you don't have to match/delete every book - it is up to you how much of the "list" you want to keep before moving to the next wizard step.
  • Click Next to be given the choice of optionally saving all the matched books to a reading list. You can also save your list coniguration (more relevant when using your own list sources).
  • Click Finish to see the books displayed in your library.

Usage Example 2: Getting books for an author from Fantastic Fiction
Spoiler:
  • Bring up the web page for that author in your browser. I suggest using the Search the Internet plugin as the fastest way of doing so.
  • Start the Import List plugin
  • From the Predefined tab, choose the "Fantastic Fiction" setting and click Edit
  • Either drag/drop or paste the url from your browser into the "Download from url" combo at the top of the Web Page tab.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.

Usage Example 3: Loading books from the clipboard
Spoiler:
  • Use the Clipboard tab.
  • Paste in your text
  • Specify your regular expression to extract the title/author. At a minimum you must have a title.
  • Some predefined expressions are available to help in the script button to the right of the dropdown.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.

Usage Example 4: Load from a CSV file
Spoiler:
  • Use the CSV File tab.
  • Browse to the file, click Preview and the columns should be displayed.
  • Alter your separator if required and specify the title/author column numbers.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.

Usage Example 5: Scrape from a custom website
Spoiler:
  • Select the Web Page tab.
  • Click the Clear button to remove any previous website settings.
  • Drag/drop or paste the url into the Url combobox.
  • Click preview to see the underlying html that you will be specifying XPath expressions for.
  • You might find it useful to get to the part of the source html of interest by using the "Find" text in this dialog, typing for instance the first book name in the list and clicking on the Find button.
  • You can *either* specify an XPath to what I call a "row" for each book, and then use a relative XPath to the title/author, *or* you can just specify a direct XPath to the title/author. It depends on the site as to which approach works best.
  • Use the marker icon on the right of each XPath combo to preview what text your expression is going to select. You can step forward/backward through the matches.
  • Your Title and Author expressions must extract the text(). The optional regular expression in the Strip field will then be applied, along with a number of special cleanups coded within the plugin to strip common unnecessary characters.
  • If desired you can reverse the ist order - for instance a countdown from 50 to 1 on the web page you might want in your list as 1 to 50.
  • There are some additional complex options for dealing some difficult websites which use non utf-8 encodings, or require javascript to execute to load the page content. Some Amazon pages make use of these settings for an example.
  • Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above.


Paypal Donations:
  • If you find this or any of my other plugins useful please feel free to show your appreciation. I have spent many thousands of unpaid hours in their development and support so any encouragement for me to continue is appreciated!

Version History:
Spoiler:
Version 1.1.4 - 13 Oct 2013
Fix for search matched books list right clicks not working correctly

Version 1.1.3 - 30 Sep 2013
Submission from wolf23 to support comma separated values for any type of custom column that supports multiple values.

Version 1.1.2 - 29 Sep 2013
Remove the select next matched/unmatched etc buttons/menus. Replace with Show All/Matched/Unmatched radio buttons that filter.
Add a predefined setting for the Goodreads search results page
Fix various broken predefined settings from changes to websites being scraped
Change logic so that if scraped website book data has no title it is not added to the right-hand side preview books list
Always put a horizontal scroll bar on both lists in the Resolve page of wizard to reduce change of vertical scrolling out of sync
If multiple values for a tags field from xpath when scraping off the web, separate the values with a comma and store as a single value.

Version 1.1.1 - 13 Dec 2012
Skip blank rows in CSV files when previewing
Always display headers on CSV tab rather than hiding when Skip first row is checked

Version 1.1 - 25 Oct 2012
Allow importing into custom columns
Allow updating metadata of existing books in your library from data off the list (standard or custom columns)
Allow clipboard, csv and web page import to retrieve into a dynamic set of calibre standard columns including identifier fields
Dynamically change the Preview, Search and Match columns presented to match what is configured for the import source
Remove the ability to specify columns to display on the configuration screen
Fix matched count on the Resolve step so multiple matches are not counted as matched
Fix for importing non utf-8 csv files such as for titles from LibraryThing
Allow direct editing of plugin configuration data from config dialog (use at your own risk!)

Version 1.0 - 13 Aug 2012
Add a "Select all matched" option to the right-click menu

Version 0.4 - 16 Jul 2012
If a book has a series name but no series index, default to a series index of zero

Version 0.3 - 15 Jul 2012
Add Series and Series Index columns to the csv page, web page tab and preview, change FF to scrape into this column.
Add pubdate, series and series_index to the Clipboard tab
Fix bug of highlighting involving self-closing tags
Fix incorrect XPath for the Goodreads Shelves lists
Additional change to way encodings are handled to simplify into utf-8
Change the icon size to 24 rather than 16 on the predefined/user settings

Version 0.2 - 15 Jul 2012
Allow author XPath expressions to not be relative to a row XPath
Add a Pubdate column to the csv page, web page tab and preview, change FF to scrape the year into this column.

Version 0.1 - 14 Jul 2012
Initial beta release of Import List plugin

Attached Thumbnails
Click image for larger version

Name:	Page1a_PredefinedList.png
Views:	1781
Size:	75.1 KB
ID:	90680   Click image for larger version

Name:	Page1b_PredefinedTree.png
Views:	832
Size:	58.6 KB
ID:	90681   Click image for larger version

Name:	Page1c_UserSettings.png
Views:	728
Size:	47.8 KB
ID:	90682   Click image for larger version

Name:	Page1d_ImportClipboard.png
Views:	2798
Size:	56.7 KB
ID:	90683   Click image for larger version

Name:	Page1e_ImportCSV.png
Views:	569
Size:	62.5 KB
ID:	90684   Click image for larger version

Name:	Page1f_ImportWeb.png
Views:	516
Size:	64.8 KB
ID:	90685   Click image for larger version

Name:	Page2Matches.png
Views:	499
Size:	51.1 KB
ID:	90686   Click image for larger version

Name:	Page3Save.png
Views:	542
Size:	30.7 KB
ID:	90687   Click image for larger version

Name:	Page1g_Fields.png
Views:	431
Size:	15.8 KB
ID:	94902   Click image for larger version

Name:	Page1h_Regex.png
Views:	375
Size:	11.4 KB
ID:	94903   Click image for larger version

Name:	Page2b_UpdateMetadata.png
Views:	360
Size:	13.0 KB
ID:	94904  
Attached Files
File Type: zip Import List.zip (404.1 KB, 10985 views)

Last edited by kiwidude; 10-13-2013 at 08:24 AM. Reason: v1.1.4 Released
kiwidude is offline   Reply With Quote
Old 08-13-2012, 08:04 AM   #2
loximuthal
Connoisseur
loximuthal can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterloximuthal can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterloximuthal can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterloximuthal can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterloximuthal can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterloximuthal can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterloximuthal can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterloximuthal can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterloximuthal can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterloximuthal can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterloximuthal can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
loximuthal's Avatar
 
Posts: 67
Karma: 12960
Join Date: Jan 2011
Location: Maryland
Device: NST, Kindle Fire, iPad2
Looks exciting! I can't wait to get home and try it out.
loximuthal is offline   Reply With Quote
Old 08-14-2012, 09:24 AM   #3
nynaevelan
eBook Junkie
nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.
 
nynaevelan's Avatar
 
Posts: 1,337
Karma: 1459924
Join Date: May 2010
Location: USA
Device: Kindle Fire HD 2012, Kindle PW2, Galaxy Tab 10.1
Wow, Kiwidude, this is awesome and just when I discovered Listopia on Goodreads. Great Job!!
nynaevelan is offline   Reply With Quote
Old 08-14-2012, 09:36 AM   #4
nynaevelan
eBook Junkie
nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.
 
nynaevelan's Avatar
 
Posts: 1,337
Karma: 1459924
Join Date: May 2010
Location: USA
Device: Kindle Fire HD 2012, Kindle PW2, Galaxy Tab 10.1
Kiwidude:

A question: When a list is longer than 100 books, how do I get the plugin to import the entire list, not just the first 100?
nynaevelan is offline   Reply With Quote
Old 08-14-2012, 11:06 AM   #5
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@Nyn - the limit is not "100 books", it is whatever is displayed on the web page you are pulling it from.

If you want further books appended, just run the wizard again with the url for the second page. You will see that if you are putting these books on a Reading List there is an option to Append to the list, rather than recreating it.
kiwidude is offline   Reply With Quote
Old 08-14-2012, 06:13 PM   #6
nynaevelan
eBook Junkie
nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.nynaevelan ought to be getting tired of karma fortunes by now.
 
nynaevelan's Avatar
 
Posts: 1,337
Karma: 1459924
Join Date: May 2010
Location: USA
Device: Kindle Fire HD 2012, Kindle PW2, Galaxy Tab 10.1
Quote:
Originally Posted by kiwidude View Post
@Nyn - the limit is not "100 books", it is whatever is displayed on the web page you are pulling it from.

If you want further books appended, just run the wizard again with the url for the second page. You will see that if you are putting these books on a Reading List there is an option to Append to the list, rather than recreating it.
Thanks
nynaevelan is offline   Reply With Quote
Old 08-16-2012, 09:21 PM   #7
Gunnerp245
Gadget Freak
Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.
 
Gunnerp245's Avatar
 
Posts: 1,108
Karma: 1043832
Join Date: Nov 2007
Location: US
Device: Sony 700; Entourage Edge, Kindle 3, Pocket Edge
Question Authorization needed?

@kiwidude

How do I import my unread list from Goodreads? When I use the 'webpage' tab I get no results, though inspection of the data in the source html window finds;
  • "Sorry, that person's shelf is private"
  • "sign in"
  • "This Profile Is Restricted to Goodreads Users."
* I added the bold and red entries in the spoiler to highlight the areas listed above. *

Edit:
1. I reauthorized the Goodreads plugin API and get the same results.
2. Even being logged into the Goodreads site does not change the results.
3. The predefined Goodreads url 'does' return books.

Spoiler:
<html><head><title>Rich (GunnerP245) - (305 Books, 3.93 Average Rating)</title></head><body><div class="content">
<div class="uitext" id="siteheader">
<div class="mainContent">
<ul class="nav" id="usernav"><li>
<strong>
<a href="/user/new" class="navlink" rel="nofollow">register</a>
</strong>
</li>
<li>
<a href="/about/how_it_works" class="navlink" rel="nofollow">tour</a>
</li>
<li>
<a href="/user/sign_in?returnurl=%2Fuser%2Fshow%2F4051996-rich" class="navlink" rel="nofollow">sign in</a>
</li>
</ul><div id="logo">
<a href="/">
<img alt="Goodreads: Book reviews, recommendations, and discussion" border="0" src="http://www.goodreads.com/assets/layout/goodreads_logo_140-a9873c579d01dd64752423bdba10276c.png"></a>
</div>
<div id="sitesearch">

<div>
</div>
<div class="auto_complete_field_wrapper">
<div id="sitesearch_autocomplete">
</div>
<img alt="Loading-trans" class="loading" id="sitesearch_field_loading" src="http://www.goodreads.com/assets/loading-trans-3e04cd6ed6ad31063972e688820d7866.gif"></div>
<a class="submitLink" href="#">
<img alt="search" src="http://www.goodreads.com/assets/layout/magnifying_glass-d9f211c02aa67acf63fa5f27a1876619.png" title="Title / Author / ISBN" width="16"></a>

</div>
<ul class="nav" id="sitenav"><li>
<a href="/" class="navlink">Home</a>
</li>
<li>
<a href="/review/list" class="navlink" rel="nofollow">My Books</a>
</li>
<li>
<a href="/friend" class="navlink" rel="nofollow">Friends</a>
</li>
<li>
<a href="/recommendations" class="navlink" rel="nofollow">Recommendations</a>
</li>
<li class="withsubnav">
<div class="subnav">
<ul class="content"><li>
<a href="/list" title="Popular lists of books">listopia</a>
</li>
<li>
<a href="/giveaway" title="Free book giveaways">giveaways</a>
</li>
<li>
<a href="/book/popular_by_date/2012/8" title="Popular New Releases">popular</a>
</li>
<li>
<a href="/voice">goodreads voice</a>
</li>
<li>
<a href="/ebooks">ebooks</a>
</li>
</ul><ul class="content"><li>
<b>fun</b>
</li>
<li>
<a href="/trivia">trivia</a>
</li>
<li>
<a href="/quizzes">quizzes</a>
</li>
<li>
<a href="/quotes">quotes</a>
</li>
</ul><ul class="content"><li>
<b>community</b>
</li>
<li>
<a href="/group">groups</a>
</li>
<li>
<a href="/story">creative writing</a>
</li>
<li>
<a href="/user/online_now" rel="nofollow">people</a>
</li>
<li>
<a href="/event">events</a>
</li>
</ul></div>
<a href="/book" class="navlink" title="Explore books">Explore</a>
<a class="subnavlink inlineblock" href="#">*</a>
</li>
</ul></div>
</div>
<div class="mainContentContainer ">
<div class="mainContent">
<div class="mainContentFloat">
<div id="flashContainer">
<div id="header_error_container">
<div class="box noticeBox errorBox">
<a class="greyText" href="#">[x]</a>
Sorry, that person's shelf is private.
</div>
<br></div>
</div>
<h1>Rich (GunnerP245)'s profile</h1>
<table width="100%" cellspacing="3" border="0"><tr valign="top"><td width="120px">
<img alt="Rich" src="http://photo.goodreads.com/users/1336826458p3/4051996.jpg" title="Rich"><br><div class="smallText">
171 ratings |
28 reviews
<br><a href="#">avg rating: 3.93</a>
<div class="floatingBox" id="ratingDistribution4051996">
</div>
</div>
</td>
<td>
<div class="bigGreyBox">
<div class="bigGreyBoxBody">
<div class="bigGreyBoxContent">
<div id="privateProfile" class="mediumText">
This Profile Is Restricted to Goodreads Users.
<br><br>
Sign in to Goodreads to Learn More About Rich.
<br><br><div>
<a href="/user/new?remember=true" class="button" rel="nofollow">sign up »</a>
</div>
</div>
</div>
</div>
<div class="bigGreyBoxBottom">
</div>
</div>
<br class="clear"><br><span id="userActions">
<a href="/friend/add_as_friend/4051996?return_url=%2Fuser%2Fshow%2F4051996-rich" class="button" rel="nofollow">add as a friend</a>
</span>
*
<a href="/message/new/4051996-rich" rel="nofollow">send message</a>
</td>
</tr></table><br><br><br><br><br><br><br><br><br><br><a href="/user/flag_photo/4051996" class="smallText greyText" rel="nofollow">flag photo as inappropriate</a>
<a class="greyText smallText" href="#">?</a>
<div id="flagExplanation4051996" class="floatingBox">Flagging an image will send it to the Goodreads Customer Care team for review. We take great care to make Goodreads a place that everyone can visit. Only flag photos that clearly need our attention. We will only consider removing photos that are extremely offensive in nature (pornographic, pro-Nazi, etc). Our policy on nudity is if you can walk down the street in it without getting a ticket we'll let it stand.</div>
|
</div>
<div class="clear">
</div>
</div>
<div class="clear">
</div>
</div>
<div class="clear">
</div>
<div class="footerContainer">
<div class="footer">
<div class="copyright">© 2012 Goodreads Inc</div>
<div class="adminLinksContainer">
<ul class="adminLinks"><li>
<a href="/about/us" class="first" rel="nofollow">about us</a>
</li>
<li>
<a href="/advertisers" rel="nofollow">advertise</a>
</li>
<li>
<a href="/author/program" rel="nofollow">author program</a>
</li>
<li>
<a href="/jobs" rel="nofollow">jobs</a>
</li>
<li>
<a href="/api" rel="nofollow">api</a>
</li>
<li>
<a href="/blog">our blog</a>
</li>
<li>
<a href="/about/terms" rel="nofollow">terms</a>
</li>
<li>
<a href="/about/privacy" rel="nofollow">privacy</a>
</li>
<li>
<a href="/help" class="last" rel="nofollow">help</a>
</li>
</ul><br><br></div>
</div>
</div>
</div><div id="overlay">
</div><div id="box">
<img id="close" src="/assets/close.gif" alt="Close" title="Close this window"><div id="boxContents">
</div>
<div id="boxContentsLeftovers">
</div>
<a class="right actionLinkLite smallText greyText" href="#" id="lightBoxRightClose">close</a>
<div class="clear">
</div>
</div><div id="fbSigninNotification">
<p>Welcome back. Just a moment while we sign you in to your Goodreads account.</p>
<img alt="Login_animation" src="http://www.goodreads.com/assets/facebook/login_animation-e6b8fb8dad5d9614794b0d13e31aa9ba.gif"></div><div id="fb-root">
</div><noscript>
<img src="http://pixel.quantserve.com/pixel/p-0dUe_kJAjvkoY.gif" border="0" height="1" width="1" alt="Quantcast"></noscript><noscript>
<img src="http://b.scorecardresearch.com/b?c1=2&amp;c2=6035830&amp;c3=&amp;c4=&amp;c5=&amp; c6=&amp;c15=&amp;cv=1.3&amp;cj=1" width="0" height="0" alt=""></noscript></body></html>

Last edited by Gunnerp245; 08-16-2012 at 09:27 PM. Reason: Additional information.
Gunnerp245 is offline   Reply With Quote
Old 08-17-2012, 02:56 AM   #8
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@GunnerP - the plugin cannot utilise any cookies from your browser or authenticating sessions. The way around this is to just save the html page to disk from your authenticated browser, and then drag/drop the html file into the url field.

It is a hidden feature that I had to incorporate for testing (the proxies at my work prevent calibre from accessing the internet directly) but as it turns out it should help solve such a problem. You don't need to save the whole web page content of course, just the html file itself. My plugin looks at the URL field, and if the url starts with file:// (which it will after a drag/drop) then it loads that html file locally rather than loading via a web browser...

Last edited by kiwidude; 08-17-2012 at 05:39 AM.
kiwidude is offline   Reply With Quote
Old 08-17-2012, 08:09 AM   #9
Gunnerp245
Gadget Freak
Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.
 
Gunnerp245's Avatar
 
Posts: 1,108
Karma: 1043832
Join Date: Nov 2007
Location: US
Device: Sony 700; Entourage Edge, Kindle 3, Pocket Edge
@kiwidude
Thanks, I shall try later today after work.
How does the Goodreads plugin able to access one's shelves?
Gunnerp245 is offline   Reply With Quote
Old 08-17-2012, 09:40 AM   #10
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
The Goodreads plugin uses an OAUTH library and a specific Goodreads API to retrieve the shelf information (getting json/xml responses) - it doesn't scrape from the web pages.
kiwidude is offline   Reply With Quote
Old 08-17-2012, 02:55 PM   #11
Gunnerp245
Gadget Freak
Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.
 
Gunnerp245's Avatar
 
Posts: 1,108
Karma: 1043832
Join Date: Nov 2007
Location: US
Device: Sony 700; Entourage Edge, Kindle 3, Pocket Edge
Lightbulb Personal GoodReads Shelf Import

Quote:
Originally Posted by kiwidude View Post
The Goodreads plugin uses an OAUTH library and a specific Goodreads API to retrieve the shelf information (getting json/xml responses) - it doesn't scrape from the web pages.
Okay. I got it to work.
I created a separate 'import library' so as to not mess up my regular library while I experimented.
For others the entries I used to produce the list after saving the applicable 'shelf' as a webpage as kiwidude suggested;
  • Row: //table/tbody/tr
  • Title: td[@class="field title"]/div[@class="value"]/a/text()
  • Author: td[@class="field author"]/div[@class="value"]/a/text()
  • Series: td[@class="field title"]/div[@class="value"]/a/span[@class="darkGreyText"]/text()
  • Series Idx: text()
  • Pubdate: td[@class="field date_pub"]/div[@class="value"]/text()

However, the Series and Series Index regex needs to be tweaked to remove parenthesis, pound sign and have the number in the correct column.
Spoiler:


Last edited by Gunnerp245; 08-17-2012 at 07:31 PM. Reason: Put picture in spolier tags
Gunnerp245 is offline   Reply With Quote
Old 08-17-2012, 03:18 PM   #12
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Haha, glad you are having fun. Did you use the highlighting buttons etc to help you along the way? You have no idea how long it took me to get that working but when figuring out xpath expressions I found it invaluable.

So long as you don't hit Finish on the Wizard you can safely experiment without using a test library, after all it is only if you actually specifically click the buttons to create empty books and click Finish that this plugin will actually make any changes to your library.

The series is indeed a problem in this situation. Of course this plugin didn't originally have series as an option, it was only when I added FF as a source and looked at the "scraping" possibilities that it came in. My thoughts were always that you could just do a metadata download to get series information populated like you would with any other import of books into your library, so in your situation I would just leave the series/series idx columns blank.

*If* the strip field was allowed to be applied at the field level (allowing different regexes for title, author, series etc) then you would have a solution to both you series columns above, allowing you to strip what you don't want out in both circumstances and would be more flexible. I'm not overly against changing the plugin to support this (it always felt a bit filthy applying the same "strip" regex to both title and author) but it is a question of how/where to configure this on that screen. Maybe it would have to be a popup dialog - so where it says strip it just tells you what fields have been given regexes for stripping, and a button pops up showing a grid where for each field you can edit the regex for it... how does that sound?

I can also appreciate how in the situation of scraping from goodreads (or from FF for that matter) that it would be darned useful to scrape the goodreads id as an identifier. So then when you do a metadata download you are getting data for exactly that same book edition. Hmmm...
kiwidude is offline   Reply With Quote
Old 08-17-2012, 07:30 PM   #13
Gunnerp245
Gadget Freak
Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.
 
Gunnerp245's Avatar
 
Posts: 1,108
Karma: 1043832
Join Date: Nov 2007
Location: US
Device: Sony 700; Entourage Edge, Kindle 3, Pocket Edge
@kiwidude

Not too sure of the 'fun' part but it was enlightening. In reality, I had not notion of xpath, other than the small area in calibre when converting books. But I reviewed the expressions in the pre-defined webpages and to see what was matched to determine why. With alot of trial and error I got one to work and just kept plugging from there.

The highlighting was invaluable in seeing exactly what was matched.

I realized the test library was not needed if 'finish' was not selected, but desired to see the results in calibre.

If there is no simple way to enter an expression in the plugin entries a popup dialog would be good.

In the meantime, the edit bulk metadata will remove the parenthesis and pound sign.

Last edited by Gunnerp245; 08-17-2012 at 07:33 PM.
Gunnerp245 is offline   Reply With Quote
Old 09-13-2012, 12:42 AM   #14
Sidetrack
Enthusiast
Sidetrack began at the beginning.
 
Posts: 31
Karma: 10
Join Date: Jan 2009
Location: South Pacific
Device: Kindle DX
Goodreads Awards Pages

Has anyone looked at settings for Goodreads Award pages http://www.goodreads.com/award

I'm looking at them and trying to figure out XPath notation, but not getting very far very fast. Row: //table[@class="tableList"] seems correct, but pulling a title or author is beyond me. Seems as if Title: a[@class="bookTitle"]/span/text() should do the trick... but no.

I'd also like to pull the Award Name/Year as Series/Index, but needless to say I haven't managed that either.
Sidetrack is offline   Reply With Quote
Old 09-13-2012, 03:28 AM   #15
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@Sidetrack - you will need the next version to properly do this, you will hit the same issue that Gunnerp245 had of not being able to separate the year from the name of the award into two separate fields.

To show you what is close:
Row: //table[@class="tableList"]/tr/td[2]
Title: a[@class="bookTitle"]/span/text()
Author: span/a[@class="authorName"]/span/text()
Series: em/text()
Series Idx: text()

In the next version you can apply a regex to strip at individual column level. So for series you could strip the trailing brackets, and for series index you could strip everything except the numbers.

Thanks for the suggestion btw, I will make sure it is in the predefined settings when I release it.
kiwidude is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Reading List kiwidude Plugins 649 07-18-2014 02:00 AM
Import List plugin idea thread kiwidude Development 45 08-13-2012 06:05 AM
[GUI Plugin] WebOS Kindle-Import CranstD Plugins 0 01-24-2012 03:36 PM
[GUI Plugin] Manage Sony x50 Reader Book List kpw Plugins 159 01-07-2012 02:44 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 04:20 PM.


MobileRead.com is a privately owned, operated and funded community.