Import List plugin idea thread - Page 2

ElizabethN · 07-02-2012, 09:57 PM

I was just thinking of the implications with importing CVS files, all the other uses mentioned would be very handy. I hadn't realized that this plugin would let us create lists from websites, what a time saver.

For example, when I find an author new to me, I'll go to their website first for their backlist. Amazingly many authors don't list their backlist or haven't updated their website in years. So, then I start compiling titles from research done with Goodreads and Library Thing, supplemented by fictfact and fantastic fiction. Usually calibre author profile open on one side of the screen and the pertinent website on the other side as I type away in calibre.

I'm brain-dead at the moment, so I can't think of many compilation websites except those mentioned above. Here's a few individual author sites that come to mind as I remember typing lots of their books into calibre.

http://www.pinbeambooks.com/ebooks-y...niverse%C2%AE/

http://michellesagara.com/bibliography/

http://www.dendarii.com/inprint.html

http://www.jdrobb.com/books/allbooks.php

Some really nice authors have downloadable lists, thank you JD Robb, but many only have titles and images. I'm assuming that if I wanted to import from a website and that if book data couldn't pulled by the plugin, that the plugin would just give an error message, so that I would know that manual entry was needed.

kiwidude · 07-03-2012, 10:20 AM

@ElizabethN - thanks for the sites, I shall take a look at home where I won't be battling against work web filters. I will say that provided an author is at Fantastic Fiction then there isn't normally any need to use an author specific site. For the authors it covers the FF site is generally very good. And it should save you the data entry you mention.

Title/author is all the data needs - in fact it is all the plugin currently allows you to extract be it from clipboard, CSV or the web. It also supports "title only" matching with the obvious downsides that will have. In theory I could add support for other metadata fields, but as this is not intended as a replacement for metadata download and would clutter the UI then title/author should be sufficient. You can let the plugin create the empty books, and then do a metadata download to get the rest of the data.

Just on your final point. If you point the plugin at a website that isn't bundled with it, then you almost certainly won't get any menaningful data from the page. Every website needs its own configuration, because every website displays different html which we scrape the data from. Just like we have different metadata download plugins for different websites.

The good news is that frequently a single website will use the same html template for all its pages - hence why a single template can cater for any author from Fantastic Fiction rather than a different one for each author for instance. Creating the configuration for a website does require some xpath knowledge so I don't anticipate every user out there diving in to do it but the ability is there for those who want to and export it. The more websites over time I bundle with this plugin the more generally useful it may be out of the box.

ElizabethN · 07-03-2012, 09:59 PM

Quote:

Originally Posted by kiwidude

Title/author is all the data needs - in fact it is all the plugin currently allows you to extract be it from clipboard, CSV or the web. It also supports "title only" matching with the obvious downsides that will have....You can let the plugin create the empty books, and then do a metadata download to get the rest of the data.

This is what I currently do when I load empty books into calibre. It'll be nice to have the plugin create the book title/author rather than manually typing them all, so that all I then have to do is the metadata and fine-tuning.

Sites like Fantastic Fiction and Goodreads are usually my first choice for an author's backlist as the info on an author's website can range from very detailed to just a book image.

Sounds like another useful plugin, looking forward to it's release. No rush though as the problem that I've discovered with plugins is that each additional plugin increases the time I spend manipulating data which then decreases the time I spend reading. If only I didn't feel the need to keep perfecting my library or have to sleep or work...

kiwidude · 07-14-2012, 02:05 PM

Here at last is a version for people to play with. I've updated the screenshots on the first page - quite a number of things have changed over the last month as I have refined things.

There's a lot of subtle, hidden behaviour that I won't bore people with at this point. Be sure to look at right-clicks on the various grids to see some of it. Also in most lists/grids double-clicking on things tends to shortcut a lot of the action. I have also incorporated chaley's template language so a number of the URLs such as with Goodreads now dynamically resolve their dates to the URL using that, such as "Popular this Month" or "Popular this Year". You can customize what columns to display in the "Resolve" page of the wizard using the "Options" button for the wizard (you must close/reopen it to take effect). Note that any columns you add will be read-only.

For a quick example of how the workflow works for using a predefined web page list setting:

Spoiler:

Another example - getting books for an author via Fantastic Fiction:

Spoiler:

If you want to load your own list of text from the clipboard (such as copied from a forum post or web page):

Spoiler:

To load from a CSV file (such as a calibre CSV catalog file, a Goodreads export or whatever):

Spoiler:

To scrape from a different website not already configured in this plugin:

Spoiler:

Just have a play around and experiment - you can't harm your library in any way (at worst you will create some empty books if you choose to do so and click Finish on the last wizard page). It may be that you never actually "import" a list, and instead just use the plugin as a quick way to launch various websites from the category view of predefined sites.

There's probably around a 100 website pages all preconfigured at this point covering various types of lists be they "popular", "bestselllers", "new releases", "top xxx" or indeed just bibliography style with Fantastic Fiction. I'll add to this over time - if you have a site/page not covered and want to see it included just feel free to ask - I don't expect everyone to be bothered with figuring out xpath expressions though it can be a fun challenge at times to do so if you are so inclined...

Perkin · 07-14-2012, 05:51 PM

Just gave it a try on a new author for my mother's library and found it worked very well (used the FantasticFiction route).

One extra function I'd request, even though I know it'd be difficult, would be that on import as well as Author & Title, to also include some other fields (Series, Series# and PubDate), especially when getting info from FF - as the info is on the page, just a matter of being able to scrape and use it.

Many thanks for your work on this, even as it is it's a great time-saver.

kiwidude · 07-15-2012, 05:00 AM

@Perkin - thanks for giving it a whirl and the feedback. Yeah I briefly mentioned my thoughts about other metadata fields above to ElizabethN - there are two issues with it. The first is the extra clutter it would add to the UI gui for something that is so rarely available in a usable fashion. The second is actually getting a quality source for it. From a CSV file no problem. However from a web page very few pages that display books in a list will put the series information in a reliable structured fashion. Everything becomes very bespoke and series data is ordinarily scraped from the individual page for a book (in fact my FF metadata plugin does not scrape the web page for it - it fires the same database query that is used to construct the page by FF that gets a JSON result). You can see just looking at the FF page the difficulties involved - series name is just placed in a <strong> tag that appears there "sometimes", their HTML is not structured very nicely at all.

Edit - actually getting the series name is not that difficult (though I found a bug in the plugin while doing so) - it is series # that is difficult. Still experimenting...

Pubdate on the other hand would be easy to scrape and would at least give a reliable source instead of the too frequent garbage dates we get from Worldcat through metadata download (at the cost of it only being a year - at least it is the correct year!). However if I was going to offer Pubdate I would "want" to do series as well.

I shall do some experimentation and see if I can figure out some new xpath combinations that would generically work for the FF screen. TBH that is probably about the only site this would work with, since most sites will just list series name/# as part of the book title and then that means a regex to extract it (like on the clipboard tab) rather than xpath. Which is a whole different level of additional UI complexity!

kiwidude · 07-15-2012, 05:25 AM

Ok, here is an xpath "challenge" for someone (I need to go do some other things so if someone solves it for me in the meantime I shall be happy!)... lets say you have this html:

Code:

          <strong>Alex Cross</strong>
          <br>
          1.
          <a href="/p/james-patterson/along-came-spider.htm">Along Came a Spider</a>
          <span class="year">
            (
            <a href="/years/1992.htm">1992</a>
            )
          </span>
          <br>
          2.
          <a href="/p/james-patterson/kiss-girls.htm">Kiss the Girls</a>
          <span class="year">
            (
            <a href="/years/1994.htm">1994</a>
            )
          </span>
          <br>

Now lets say that you are iterating through each book in that page, using the <a> tag for the title above as your "root". Then you can extract the following with xpath:

title: text()
pubdate: following-sibling::span[@class="year"]/a/text()
series name: ../strong/text()
series #: ???

For series number I thought I could do something like:
preceding-sibling::text()

but that doesn't give me any results. Any other suggestions?

chaley · 07-15-2012, 07:31 AM

The number is part of the parent, not a sibling, because <br> is self closing.

It isn't obvious to me how to isolate those numbers. If you know that there is a number for each title, and if the numbers are sequential, then you can do it by counting them, but I suspect there are too many 'if's involved. You might be able to do it by getting the text of the parent and counting lines.

What does the parent html block look like?

kiwidude · 07-15-2012, 09:22 AM

Hi Charles,

Yeah I had considered a fallback to auto-number the series index based on whether they have a series name. But that has a few problems - such as when FF list a book in a series that is written with other authors and only show the book written by that author - it would always make it "number 1" when it isn't. So it really needs the associated number off the page.

Here is the URL being parsed in this example above:
http://www.fantasticfiction.co.uk/p/james-patterson/

The parent expression I am using to identify only the titles on the page that are of interest is:
//div[@class="sectionleft"]/a[contains(@href,".htm")]

You will see that unfortunately there is no true "parent" for each "row". There are just a number of div sections for each series or grouping of titles, with a title contained within the a href. Hence why I am using that <a> tag as my row identifier and then grabbing data relative to that.

I've attached a new version 0.2 below - this adds the Pubdate implementation and fixes a couple of bugs.

chaley · 07-15-2012, 09:57 AM

The following seems to work, but I make no guarantees. It produces a list of numbers and a list of titles. The cruft in the middle is necessary to filter out ancillary text such as "aka". As far I can tell from brief looks, the numbers and titles correspond until the numbers run out. The titles after the numbers run out seem to be anthologies or other "non-numbered" books.

This script runs with calibre-debug -e

Code:

from lxml import html
import urllib2
from calibre import browser
from contextlib import closing

url = 'http://www.fantasticfiction.co.uk/p/james-patterson/'
br = browser()
with closing(br.open(url, timeout=10)) as f:
    doc = html.fromstring(f.read())
    for data in doc.xpath(('//div[@class="sectionleft"]')):
        t = data.xpath('./text()')
        numbers = []
        for x in t:
            try:
                f = float(x)
                numbers.append(int(f))
            except:
                pass
        books = data.xpath('a[contains(@href,".htm")]/text()')
        print len(numbers), len(books), numbers, books

kiwidude · 07-15-2012, 01:32 PM

Hi Charles,

Thanks for that. And yeah it requires a bit of rejigging of the way I currently iterate through matches to try to accommodate it - since my previous "assumption" was that if a user specified a "row xpath" then there would only be one result for a title/author etc xpath. However on the FF site it does all have to be treated rather differently, and a "Row" is really a "section" of the document, with potentially multiple matches inside it.

I'm hacking the code around to see if I can make it all work without breaking everything else, we shall see what falls out at the end... thanks again.

kiwidude · 07-15-2012, 01:57 PM

Success...

Now I have to plumb in all the rest of the series support through the rest of the wizard...

kiwidude · 07-15-2012, 04:40 PM

Here is a new version with series name/index fully plumbed in along with pubdate. So you can now for instance import all the data available from a Fantastic Fiction page for an author into empty books and get "proper" publication years as well as the series information.

I've also fixed a few other bugs I found and some predefined settings that needed tweaking. This is probably close enough to a 1.0 release by my standards but I will let it sit as a beta for a while to see if anything else comes up in terms of feedback.

Perkin · 07-15-2012, 05:55 PM

Reminder to users: after uninstalling older beta, restart calibre and then install the beta 0.3, then restart calibre again.

Without the restart(s) the xpath expressions weren't correctly filled.

@kiwidude, with regards the FF import, tried it on a few authors with mixed amounts of series/individual novels etc, worked perfectly. Most impressive.

One major gripe, why couldn't you have done this last year, and saved me hours of tedious tracking down author lists

Gonna test it some more.

Many thanks.

Perkin · 07-15-2012, 06:18 PM

Just found one problem, if the book is in a series but hasn't got a number, then no series or # is generated, would it be possible in those cases to use a '0' for the # and still keep the series. (May just be a tweak for the expression, probably not, but I thought I'd ask anyway, just in case.

)

I noticed when I did a test on the page for Sir Arthur Conan Doyle, I noticed it didn't generate the series for the Gerard stories, as they have no numbering.

07-14-2012, 02:05 PM	#19
kiwidude Calibre Plugins Developer Posts: 4,636 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	v0.1 Beta Here at last is a version for people to play with. I've updated the screenshots on the first page - quite a number of things have changed over the last month as I have refined things. There's a lot of subtle, hidden behaviour that I won't bore people with at this point. Be sure to look at right-clicks on the various grids to see some of it. Also in most lists/grids double-clicking on things tends to shortcut a lot of the action. I have also incorporated chaley's template language so a number of the URLs such as with Goodreads now dynamically resolve their dates to the URL using that, such as "Popular this Month" or "Popular this Year". You can customize what columns to display in the "Resolve" page of the wizard using the "Options" button for the wizard (you must close/reopen it to take effect). Note that any columns you add will be read-only. For a quick example of how the workflow works for using a predefined web page list setting: Spoiler: Choose the predefined setting tab To see the webpage in your browser for that site, click on the Browser button (optional step) Double-click or click "Preview" to see the titles/authors for that website link Click Next to see what books have automatically been matched against those in your library. It uses a variety of special fuzzy algorithms to attempt this initial pass. For any books that haven't yet matched, you can double-click on them in the top grid to execute a calibre search showing results in the bottom. Refine the search if needed and double-click on the book in the bottom to select it as the match. Alternatively you can add an empty book or remove that book from the list. Of course you don't have to match/delete every book - it is up to you how much of the "list" you want to keep before moving to the next wizard step. Click Next to be given the choice of optionally saving all the matched books to a reading list. You can also save your list coniguration (more relevant when using your own list sources). Click Finish to see the books displayed in your library. Another example - getting books for an author via Fantastic Fiction: Spoiler: Bring up the web page for that author in your browser. I suggest using the Search the Internet plugin as the fastest way of doing so. Start the Import List plugin From the Predefined tab, choose the "Fantastic Fiction" setting and click Edit Either drag/drop or paste the url from your browser into the "Download from url" combo at the top of the Web Page tab. Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above. If you want to load your own list of text from the clipboard (such as copied from a forum post or web page): Spoiler: Use the Clipboard tab. Paste in your text Specify your regular expression to extrac the title/author. At a minimum you must have a title. Some predefined expressions are available to help in the script button to the right of the dropdown. Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above. To load from a CSV file (such as a calibre CSV catalog file, a Goodreads export or whatever): Spoiler: Use the CSV File tab. Browse to the file, click Preview and the columns should be displayed. Alter your separator if required and specify the title/author column numbers. Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above. To scrape from a different website not already configured in this plugin: Spoiler: Select the Web Page tab. Drag/drop or paste the url into the top combo. Click preview to see the underlying html that you will be specifying xpath expressions for. You might find it useful to get to the part of the source html of interest by using the "Find" text in this dialog, specifying for instance the first book name in the list and clicking on the Find button. You can either specify an xpath to what I call a "row" for each book, and then use a relative xpath to the title/author, or you can just specify a direct xpath to the title/author. It depends on the site as to which approach works best. Use the marker icon on the right of each xpath combo to preview what text your expression is going to select. You can step forward/backward through the matches. Your Title and Author expressions must extract the text(). The optional regular expression in the Strip field will then be applied, along with a number of special cleanups coded within the plugin to strip common unnecessary characters. If desired you can reverse the ist order - for instance a countdown from 50 to 1 on the web page you might want in your list as 1 to 50. There are some additional complex options for dealing some difficult websites which use non utf-8 encodings, or require javascript to execute to load the page content. Some Amazon pages make use of these settings for an example. Click Preview again to see what titles/authors have been extracted, then Next to continue with the rest of the wizard as per the instructions above. Just have a play around and experiment - you can't harm your library in any way (at worst you will create some empty books if you choose to do so and click Finish on the last wizard page). It may be that you never actually "import" a list, and instead just use the plugin as a quick way to launch various websites from the category view of predefined sites. There's probably around a 100 website pages all preconfigured at this point covering various types of lists be they "popular", "bestselllers", "new releases", "top xxx" or indeed just bibliography style with Fantastic Fiction. I'll add to this over time - if you have a site/page not covered and want to see it included just feel free to ask - I don't expect everyone to be bothered with figuring out xpath expressions though it can be a fun challenge at times to do so if you are so inclined... Last edited by kiwidude; 07-15-2012 at 09:22 AM. Reason: Removing attachment as later version in this thread

07-15-2012, 05:00 AM	#21
kiwidude Calibre Plugins Developer Posts: 4,636 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	@Perkin - thanks for giving it a whirl and the feedback. Yeah I briefly mentioned my thoughts about other metadata fields above to ElizabethN - there are two issues with it. The first is the extra clutter it would add to the UI gui for something that is so rarely available in a usable fashion. The second is actually getting a quality source for it. From a CSV file no problem. However from a web page very few pages that display books in a list will put the series information in a reliable structured fashion. Everything becomes very bespoke and series data is ordinarily scraped from the individual page for a book (in fact my FF metadata plugin does not scrape the web page for it - it fires the same database query that is used to construct the page by FF that gets a JSON result). You can see just looking at the FF page the difficulties involved - series name is just placed in a <strong> tag that appears there "sometimes", their HTML is not structured very nicely at all. Edit - actually getting the series name is not that difficult (though I found a bug in the plugin while doing so) - it is series # that is difficult. Still experimenting... Pubdate on the other hand would be easy to scrape and would at least give a reliable source instead of the too frequent garbage dates we get from Worldcat through metadata download (at the cost of it only being a year - at least it is the correct year!). However if I was going to offer Pubdate I would "want" to do series as well. I shall do some experimentation and see if I can figure out some new xpath combinations that would generically work for the FF screen. TBH that is probably about the only site this would work with, since most sites will just list series name/# as part of the book title and then that means a regex to extract it (like on the clipboard tab) rather than xpath. Which is a whole different level of additional UI complexity! Last edited by kiwidude; 07-15-2012 at 05:15 AM.

07-15-2012, 05:25 AM	#22
kiwidude Calibre Plugins Developer Posts: 4,636 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Ok, here is an xpath "challenge" for someone (I need to go do some other things so if someone solves it for me in the meantime I shall be happy!)... lets say you have this html: Code: <strong>Alex Cross</strong> <br> 1. <a href="/p/james-patterson/along-came-spider.htm">Along Came a Spider</a> <span class="year"> ( <a href="/years/1992.htm">1992</a> ) </span> <br> 2. <a href="/p/james-patterson/kiss-girls.htm">Kiss the Girls</a> <span class="year"> ( <a href="/years/1994.htm">1994</a> ) </span> <br> Now lets say that you are iterating through each book in that page, using the <a> tag for the title above as your "root". Then you can extract the following with xpath: title: text() pubdate: following-sibling::span[@class="year"]/a/text() series name: ../strong/text() series #: ??? For series number I thought I could do something like: preceding-sibling::text() but that doesn't give me any results. Any other suggestions? Last edited by kiwidude; 07-15-2012 at 05:30 AM.

07-15-2012, 09:22 AM	#24
kiwidude Calibre Plugins Developer Posts: 4,636 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Hi Charles, Yeah I had considered a fallback to auto-number the series index based on whether they have a series name. But that has a few problems - such as when FF list a book in a series that is written with other authors and only show the book written by that author - it would always make it "number 1" when it isn't. So it really needs the associated number off the page. Here is the URL being parsed in this example above: http://www.fantasticfiction.co.uk/p/james-patterson/ The parent expression I am using to identify only the titles on the page that are of interest is: //div[@class="sectionleft"]/a[contains(@href,".htm")] You will see that unfortunately there is no true "parent" for each "row". There are just a number of div sections for each series or grouping of titles, with a title contained within the a href. Hence why I am using that <a> tag as my row identifier and then grabbing data relative to that. I've attached a new version 0.2 below - this adds the Pubdate implementation and fixes a couple of bugs. Last edited by kiwidude; 07-15-2012 at 04:41 PM. Reason: Removing attachment as later version in this thread

07-15-2012, 09:57 AM	#25
chaley Grand Sorcerer Posts: 11,734 Karma: 6690881 Join Date: Jan 2010 Location: Notts, England Device: Kobo Libra 2	The following seems to work, but I make no guarantees. It produces a list of numbers and a list of titles. The cruft in the middle is necessary to filter out ancillary text such as "aka". As far I can tell from brief looks, the numbers and titles correspond until the numbers run out. The titles after the numbers run out seem to be anthologies or other "non-numbered" books. This script runs with calibre-debug -e Code: from lxml import html import urllib2 from calibre import browser from contextlib import closing url = 'http://www.fantasticfiction.co.uk/p/james-patterson/' br = browser() with closing(br.open(url, timeout=10)) as f: doc = html.fromstring(f.read()) for data in doc.xpath(('//div[@class="sectionleft"]')): t = data.xpath('./text()') numbers = [] for x in t: try: f = float(x) numbers.append(int(f)) except: pass books = data.xpath('a[contains(@href,".htm")]/text()') print len(numbers), len(books), numbers, books

07-02-2012, 09:57 PM	#16
ElizabethN reader, ebook junkie Posts: 109 Karma: 436806 Join Date: Dec 2007 Location: western nebraska Device: droid, kindle, kobo, eslick, sony	I was just thinking of the implications with importing CVS files, all the other uses mentioned would be very handy. I hadn't realized that this plugin would let us create lists from websites, what a time saver. For example, when I find an author new to me, I'll go to their website first for their backlist. Amazingly many authors don't list their backlist or haven't updated their website in years. So, then I start compiling titles from research done with Goodreads and Library Thing, supplemented by fictfact and fantastic fiction. Usually calibre author profile open on one side of the screen and the pertinent website on the other side as I type away in calibre. I'm brain-dead at the moment, so I can't think of many compilation websites except those mentioned above. Here's a few individual author sites that come to mind as I remember typing lots of their books into calibre. http://www.pinbeambooks.com/ebooks-y...niverse%C2%AE/ http://michellesagara.com/bibliography/ http://www.dendarii.com/inprint.html http://www.jdrobb.com/books/allbooks.php Some really nice authors have downloadable lists, thank you JD Robb, but many only have titles and images. I'm assuming that if I wanted to import from a website and that if book data couldn't pulled by the plugin, that the plugin would just give an error message, so that I would know that manual entry was needed.

07-03-2012, 10:20 AM	#17
kiwidude Calibre Plugins Developer Posts: 4,636 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	@ElizabethN - thanks for the sites, I shall take a look at home where I won't be battling against work web filters. I will say that provided an author is at Fantastic Fiction then there isn't normally any need to use an author specific site. For the authors it covers the FF site is generally very good. And it should save you the data entry you mention. Title/author is all the data needs - in fact it is all the plugin currently allows you to extract be it from clipboard, CSV or the web. It also supports "title only" matching with the obvious downsides that will have. In theory I could add support for other metadata fields, but as this is not intended as a replacement for metadata download and would clutter the UI then title/author should be sufficient. You can let the plugin create the empty books, and then do a metadata download to get the rest of the data. Just on your final point. If you point the plugin at a website that isn't bundled with it, then you almost certainly won't get any menaningful data from the page. Every website needs its own configuration, because every website displays different html which we scrape the data from. Just like we have different metadata download plugins for different websites. The good news is that frequently a single website will use the same html template for all its pages - hence why a single template can cater for any author from Fantastic Fiction rather than a different one for each author for instance. Creating the configuration for a website does require some xpath knowledge so I don't anticipate every user out there diving in to do it but the ability is there for those who want to and export it. The more websites over time I bundle with this plugin the more generally useful it may be out of the box.

07-14-2012, 05:51 PM	#20
Perkin Guru Posts: 655 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)	Just gave it a try on a new author for my mother's library and found it worked very well (used the FantasticFiction route). One extra function I'd request, even though I know it'd be difficult, would be that on import as well as Author & Title, to also include some other fields (Series, Series# and PubDate), especially when getting info from FF - as the info is on the page, just a matter of being able to scrape and use it. Many thanks for your work on this, even as it is it's a great time-saver.

07-15-2012, 07:31 AM	#23
chaley Grand Sorcerer Posts: 11,734 Karma: 6690881 Join Date: Jan 2010 Location: Notts, England Device: Kobo Libra 2	The number is part of the parent, not a sibling, because <br> is self closing. It isn't obvious to me how to isolate those numbers. If you know that there is a number for each title, and if the numbers are sequential, then you can do it by counting them, but I suspect there are too many 'if's involved. You might be able to do it by getting the text of the parent and counting lines. What does the parent html block look like?

07-15-2012, 01:32 PM	#26
kiwidude Calibre Plugins Developer Posts: 4,636 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Hi Charles, Thanks for that. And yeah it requires a bit of rejigging of the way I currently iterate through matches to try to accommodate it - since my previous "assumption" was that if a user specified a "row xpath" then there would only be one result for a title/author etc xpath. However on the FF site it does all have to be treated rather differently, and a "Row" is really a "section" of the document, with potentially multiple matches inside it. I'm hacking the code around to see if I can make it all work without breaking everything else, we shall see what falls out at the end... thanks again.

07-15-2012, 01:57 PM	#27
kiwidude Calibre Plugins Developer Posts: 4,636 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Success... Now I have to plumb in all the rest of the series support through the rest of the wizard... Attached Thumbnails

07-15-2012, 04:40 PM	#28
kiwidude Calibre Plugins Developer Posts: 4,636 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	v0.3 Beta Here is a new version with series name/index fully plumbed in along with pubdate. So you can now for instance import all the data available from a Fantastic Fiction page for an author into empty books and get "proper" publication years as well as the series information. I've also fixed a few other bugs I found and some predefined settings that needed tweaking. This is probably close enough to a 1.0 release by my standards but I will let it sit as a beta for a while to see if anything else comes up in terms of feedback. Last edited by kiwidude; 07-15-2012 at 07:15 PM. Reason: Removing attachment as later version in this thread

07-15-2012, 05:55 PM	#29
Perkin Guru Posts: 655 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)	Reminder to users: after uninstalling older beta, restart calibre and then install the beta 0.3, then restart calibre again. Without the restart(s) the xpath expressions weren't correctly filled. @kiwidude, with regards the FF import, tried it on a few authors with mixed amounts of series/individual novels etc, worked perfectly. Most impressive. One major gripe, why couldn't you have done this last year, and saved me hours of tedious tracking down author lists Gonna test it some more. Many thanks.

07-15-2012, 06:18 PM	#30
Perkin Guru Posts: 655 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)	Just found one problem, if the book is in a series but hasn't got a number, then no series or # is generated, would it be possible in those cases to use a '0' for the # and still keep the series. (May just be a tweak for the expression, probably not, but I thought I'd ask anyway, just in case. ) I noticed when I did a test on the page for Sir Arthur Conan Doyle, I noticed it didn't generate the series for the Gerard stories, as they have no numbering.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[Old Thread] Feature Idea - Auto convert on import	jphphotography	Calibre	6	11-04-2012 09:17 PM
[GUI Plugin] WebOS Kindle-Import	CranstD	Plugins	0	01-24-2012 03:36 PM
No Module name Tkinter on plugin import	foghat	Plugins	1	11-11-2010 07:11 PM
New Plugin Type Idea: Library Plugin	cgranade	Plugins	3	09-15-2010 12:11 PM
Run plugin before import	dremo	Plugins	6	01-09-2009 12:40 PM