![]() |
#121 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
Good morning, M. Sarmat89,
That helps, thank you. That line of code then, is the first code to be applied. However, and I would like to get this right before starting so that I have a good shot at getting this to work, I want to ask you to confirm that after using the long line of code to unfold lines I apply that resulting file to the four lines of code. The four lines are still necessary I assume. The file resulting from the four code lines, the csv file, would then be the file to put into pyglossary. There isn't any other code to apply to this csv file after the four lines and before putting it into pyglossary? I am truly embarrassed to have to ask you again to confirm all this for I am sure you can't believe how "slow" I am in taking this all in. You have put a lot of your time and effort in this to help me and I just want to avoid "operator" errors. And a very big thank you for what you have done so far. cordially, pz Last edited by pzack; 10-01-2022 at 09:11 AM. |
![]() |
![]() |
#122 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 515
Karma: 2268308
Join Date: Nov 2015
Device: none
|
No other code should be necessary.
|
![]() |
![]() |
#123 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
recent conversion pyglossary
Hello, M. Sarmat89,
Thank you for your response. I used the original djvu.txt file with your long line of code. And that was all; I didn't use any other codes. The files created did not show question marks except for the synonyms file. Pyglossary continues to indicate untabbed lines; I am not sure if that is a problem or not as not all the lines would be tabbed I supppose. However, in trying the dictionary, doing some word searches nothing is found; the dicitonary does not appear to be seen. The lines appear unfolded. I have attached a large section of the tsv file created from your code. Please have a look. I am not sure if any tabs made it into the file; perhaps I am not properly using notepad to see these signs. I think that the tabs would be red arrows pointing toward the bracket. You would know better than I. Cordially, pz |
![]() |
![]() |
#124 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 515
Karma: 2268308
Join Date: Nov 2015
Device: none
|
Interesting. Please attach this same piece of text in the source file also.
|
![]() |
![]() |
#125 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
textx pyglossary conversion
Good evening M.Sarmat89,
I have, per your request, attached a section of the file newtsv.txt which is the final output text for pyglossary and the djvu1txt.txt which is a section of the original file djvu.txt. Both files are different from what you have as I wanted to give you fairly matched files. Please look for the word "affadissant, e" near the beginning of both files. The unfolded file, newtsv.txt makes it a wee bit harder to find that headword but it is near the beginning. Unless I am mistaken, it appears that the headwords do not begin the unfolded line but, I don't know how pyglossary reads the tab for building the index. I certainly hope this helps. We at least have unfolded lines. Cordially, pz |
![]() |
![]() |
#126 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,778
Karma: 168765399
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Hmmm... looking at the djvu1txt.txt file, there are 43 instances of "GRAND LAROUSSE DE LA LANGUE FRANÇAISE", looking at the newTSV.txt file, there are 42 instances of that phrase. This strongly suggests that this is an attempt to pirate the Grand Larousse de la Langue Française dictionary.
Sorry but I'm out of this discussion. |
![]() |
![]() |
#127 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 515
Karma: 2268308
Join Date: Nov 2015
Device: none
|
Code:
perl -pe 's/^$/%%/sg' < djvu1txt.txt | perl -pe 's/\n/ /sg' | perl -pe 's/%%/\n/sg' | perl -pe "s:^ +::" | perl -pe "s:^([^[]+?) *(?=\[):\1\t:" > new.tsv |
![]() |
![]() |
#128 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 942
Karma: 149883
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
|
Links in the internet archive suggest that it is open source:
Quote:
pdf-link to the internet archive dvju.txt-link to the internet archive Why did it take 128 posts before this became apparent? Bad @pzack! You should really give better info. Last edited by Markismus; 10-02-2022 at 08:27 AM. |
|
![]() |
![]() |
#129 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Unlike Project Gutenberg, archive.org, doesn't check the copyright status of documents and'll only take down documents after repeated DMCA takedown requests. |
|
![]() |
![]() |
#130 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
Hello M. Markismus,
Thank you for your message, and, as I have indicated before, I am thankful for all your help towards this attempted conversion. I am astonished that I am being accused of attempted pirating! However, two members called my attention to possible copyright issues and use of complete files in the forum. I am new to forums and I don't know all the rules and limitations when dealing with electronic texts. It should be obvious to you now and to any member following the thread that we are working with open source and public domain material. If this is not the case, then I have had no indication to the contrary. Will you permit me to say that I don't think that I merit your admonishment. Please accept my sincere apologies, however, if you thought that I was withholding information from you. But, does this change anything? It appears that we are still working with a poor text that resists a pyglossary conversion. I am, as I have said before, very appreciative of the time and efforts that you have freely given in trying to help me with this stardict conversion attempt. Very cordially, pz |
![]() |
![]() |
#131 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Unless you find an official statement by the publisher of the Larousse dictionaries that the dictionary that you found has been released into the Public Domain, you might want to assume that it's still in copyright. |
|
![]() |
![]() |
#132 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
Good afternoon, DOITSU,
Given the massive quantity of electronic material made available to the public, I, like countless general users of the internet, usually do not have the time or means to track down copyright issues or violations. We are not talking about "deep sites" but widely known and used electronic information outlets that do provide educational material. This forum is not using nor distributing, nor could it, any file or files that I personally have presented in this forum and what little that I have provided is for my own personnal use. And what I have used is not, knowingly, illegal, pirated or to anything to that effect. That you want to make an issue of this is your lookout. And please, could we not waste our wind on this. cordially, pz |
![]() |
![]() |
#133 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,336
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
The dictionary isn't Public Domain. This dictionary is published electronically on the site "Gallica" which is the official site of the French National Library. What Gallica is. (in French) and The dictionary on their site. It doesn't have a "Download" button.
Here is the image of copyright page of the dictionary. Spoiler:
Here is what Gallica says about this dictionary. I put the part about rights in bold. Spoiler:
|
![]() |
![]() |
#134 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
Hello, M Sarmat89,
I will try the new code, thank you. Seems that a little can of worms has opened about the material that we are working on. Nevertheless, I would like to continue, confident that no violations are at hand here. However, I realise that you may, and possibly soon, tell me that,given the state of the material, the conversion cannot be effected. I am willing to continue to try if you still think that it is worth trying. I wish that I could assist you but,as you know, I have absolutely no expertise in the manipulation with perl of the text. Maybe,this time, there may be some luck. Cordially, pz |
![]() |
![]() |
#135 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
M. DOITSU,
Perhaps it bears repeating that nothing is being sold nor distributed for sale nor use in this forum(nor could it be)or anywhere else and is being used strictly and solely and unequivically for personal use. cordially, pz |
![]() |
![]() |
Tags |
pyglossary |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF to PDF conversion causes all the text to be aligned to the left | Swifty4635 | Conversion | 1 | 01-16-2022 10:17 PM |
Desktop App How do I run PyGlossary on Windows ? | Bilingual | Kobo Reader | 2 | 07-12-2020 01:54 PM |
epub 2 PDF conversion with OCR in PDF possible? | hobi2000 | Conversion | 2 | 03-25-2019 03:20 AM |
PDF conversion keeping pdf page | highstream | Conversion | 3 | 05-31-2016 11:46 AM |
PDF to PDF conversion creates much larger file? | rocketcat | Conversion | 11 | 09-30-2011 07:37 PM |