![]() |
#1 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Mar 2019
Location: Paris, France
Device: Kobo Aura Edition 2
|
Lookup into a list, an item beginning by...
Hi,
in the Comment field, I have a lot information formated in HTML (scrapped via the noosfere plugin, a french SF database) for example : original title, serie and issue number, translator, cover page artist and more... I would like to scrap some data into specific custom fields So, as a newbie, I've tried to learn regex and to use python in Calibre, but I'm now lost ;( example of comment field : Code:
<div> <p>Référence: <a href="https://www.noosfere.org/livres/niourf.asp?numlivre=2146591238">https://www.noosfere.org/livres/niourf.asp?numlivre=2146591238 </a></p> <p>Couverture: <a href="https://images.noosfere.org/couv/b/bragelonne857-2015.jpg">https://images.noosfere.org/couv/b/bragelonne857-2015.jpg </a></p> <p> Titre original : <em>Judgment of Tears / Dracula Cha Cha Cha, 1998 </em> Cycle : <a href="https://www.noosfere.org/livres/serie.asp?numserie=1507">Anno Dracula </a><a href="https://www.noosfere.org/livres/editionslivre.asp?numitem=4312"><<== </a>vol. 3 Traduction de <a href="https://www.noosfere.org/livres/auteur.asp?NumAuteur=1588">Thierry ARSON </a>& <a href="https://www.noosfere.org/livres/auteur.asp?NumAuteur=2147190331">Leslie DAMANT-JEANDEL </a> Illustration de <a href="https://www.noosfere.org/livres/auteur.asp?NumAuteur=2147190818">Noëmie CHEVALIER </a> <a href="https://www.noosfere.org/livres/editeur.asp?numediteur=-24371077">BRAGELONNE </a>(Paris, France), coll. <a href="https://www.noosfere.org/livres/collection.asp?NumCollection=1975550487&numediteur=-24371077">L'Ombre </a> </p> with a first re(val, pattern, replacement) function I can delete html balises Code:
re(field('comment'),<.+?>,'' ) Code:
Référence: https://www.noosfere.org/livres/niourf.asp?numlivre=2146591238 Couverture: https://images.noosfere.org/couv/b/bragelonne857-2015.jpg Titre original : Judgment of Tears / Dracula Cha Cha Cha, 1998 Cycle : Anno Dracula <<== vol. 3 Traduction de Thierry ARSON & Leslie DAMANT-JEANDEL Illustration de Noëmie CHEVALIER BRAGELONNE (Paris, France), coll. L'Ombre I have now a list of many items separated by end of line \n in regex if I'm correct so far, so good ![]() to extract for example the cover page artist : first, I've tried to use sublist(previous result,7,8,\n) to extract from the previous result 'Illustration de Noëmie CHEVALIER' and then use again re(val, pattern, replacement) to delete 'Illustration de ' the problem is that this item is not always at the 7th position (for example, a french stand-alone book will have no original title, no series and no translator ) is there any function to lookup the relevant item within the list beginning by ^'Illustration de ' (or something else for other data I want to retrieve) ? I have think to use switch(val, [pattern, value,]+ else_value) or lookup(val, [pattern, field,]+ else_field) but I don't understand very well how to use these functions perhaps I have to use a for loop like ?? Code:
for x in range(9): if contains( sublist (previous result,x,x+1,\n)) , ^'Illustration de ',true, false) sublist(previous result,x,x+1,\n) fi regards |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,444
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
One way to do it is with a loop. Something like
Code:
program: for i in '0,1,2,3,4,5,6,7,8,9': v = list_item($comments, i, '\n'); if substr(v, 0, 19) == 'Illustration de ' then # compute the result result = 'whatever'; break fi rof Code:
re($comments, '(?ms)(?:^|.*\n)<p>bar(.*?)(\n|$)', '\1') The first method will be much slower but gives you more control over what happens if there isn't a match. The speed doesn't matter if you are doing the operation in search/replace or in the Action Chains plugin. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Mar 2019
Location: Paris, France
Device: Kobo Aura Edition 2
|
Hi chaley
thanks a lot for your help !! ![]() but the list_item function doesn't work indeed, I've tried this : Code:
program: list_item(re(field('comments'),'<.+?>','') , 0, '\n') I've tried with r'\n' ''\n'', r''\n'' r"[\n]" but none of them work ![]() Code:
Référence: https://www.noosfere.org/livres/niourf.asp?numlivre=2146572761 \nCouverture: https://images.noosfere.org/couv/a/atalante420-2009.jpg \nTitre original : Ausgebrannt, 2007 Première parution : Bastei-Lübbe, 2007 Traduction de Frédéric WEINMANN Illustration de Matthias KULKA |
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,444
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Sorry, I forgot that the template language parser doesn't handle escaped characters. Use the function character('newline') instead of '\n'.
You might be happier/more successful implementing this in python as a user defined template function. |
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Mar 2019
Location: Paris, France
Device: Kobo Aura Edition 2
|
it works : thanks a lot !!
![]() |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
When you look at a list of your highlights in Books, does it go to beginning of book? | 2scre | Apple Devices | 2 | 10-24-2020 03:41 PM |
Selecting item on the list... | aleksei_iv | Calibre | 6 | 11-09-2017 12:04 PM |
How to remove item from recently read list | Acharn | Calibre | 6 | 01-03-2017 09:03 AM |
Windows 7 Jump List has just one item | Starko | Calibre | 0 | 09-22-2011 03:22 PM |
Output list of tag-item data? | unboggling | Library Management | 0 | 09-20-2011 08:23 AM |