Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 10-01-2022, 09:07 AM   #121
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Good morning, M. Sarmat89,

That helps, thank you. That line of code then, is the first code to be applied. However, and I would like to get this right before starting so that I have a good shot at getting this to work, I want to ask you to confirm that after using the long line of code to unfold lines I apply that resulting file to the four lines of code. The four lines are still necessary I assume.

The file resulting from the four code lines, the csv file, would then be the file to put into pyglossary.

There isn't any other code to apply to this csv file after the four lines and before putting it into pyglossary?

I am truly embarrassed to have to ask you again to confirm all this for I am sure you can't believe how "slow" I am in taking this all in. You have put a lot of your time and effort in this to help me and I just want to avoid "operator" errors.

And a very big thank you for what you have done so far.

cordially,
pz

Last edited by pzack; 10-01-2022 at 09:11 AM.
pzack is offline  
Old 10-01-2022, 02:19 PM   #122
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
No other code should be necessary.
Sarmat89 is offline  
Old 10-01-2022, 03:53 PM   #123
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
recent conversion pyglossary

Hello, M. Sarmat89,

Thank you for your response.

I used the original djvu.txt file with your long line of code. And that was all; I didn't use any other codes.

The files created did not show question marks except for the synonyms file.

Pyglossary continues to indicate untabbed lines; I am not sure if that is a problem or not as not all the lines would be tabbed I supppose.

However, in trying the dictionary, doing some word searches nothing is found; the dicitonary does not appear to be seen.

The lines appear unfolded. I have attached a large section of the tsv file created from your code. Please have a look. I am not sure if any tabs made it into the file; perhaps I am not properly using notepad to see these signs. I think that the tabs would be red arrows pointing toward the bracket. You would know better than I.

Cordially,
pz
Attached Files
File Type: txt nouveau 5.txt (211.4 KB, 46 views)
pzack is offline  
Old 10-01-2022, 07:39 PM   #124
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Interesting. Please attach this same piece of text in the source file also.
Sarmat89 is offline  
Old 10-02-2022, 12:03 AM   #125
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
textx pyglossary conversion

Good evening M.Sarmat89,

I have, per your request, attached a section of the file newtsv.txt which is the final output text for pyglossary and the djvu1txt.txt which is a section of the original file djvu.txt.

Both files are different from what you have as I wanted to give you fairly matched files. Please look for the word "affadissant, e" near the beginning of both files.

The unfolded file, newtsv.txt makes it a wee bit harder to find that headword but it is near the beginning.

Unless I am mistaken, it appears that the headwords do not begin the unfolded line but, I don't know how pyglossary reads the tab for building the index.

I certainly hope this helps. We at least have unfolded lines.

Cordially,
pz
Attached Files
File Type: txt newTSV.txt (352.1 KB, 47 views)
File Type: txt djvu1txt.txt (365.8 KB, 43 views)
pzack is offline  
Old 10-02-2022, 12:21 AM   #126
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,401
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Hmmm... looking at the djvu1txt.txt file, there are 43 instances of "GRAND LAROUSSE DE LA LANGUE FRANÇAISE", looking at the newTSV.txt file, there are 42 instances of that phrase. This strongly suggests that this is an attempt to pirate the Grand Larousse de la Langue Française dictionary.

Sorry but I'm out of this discussion.
DNSB is online now  
Old 10-02-2022, 05:41 AM   #127
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Code:
perl -pe 's/^$/%%/sg' < djvu1txt.txt | perl -pe 's/\n/ /sg' | perl -pe 's/%%/\n/sg' | perl -pe "s:^ +::" | perl -pe "s:^([^[]+?) *(?=\[):\1\t:" > new.tsv
But again, the text quality is low and it requires manual correction to obtain useful results.
Sarmat89 is offline  
Old 10-02-2022, 08:20 AM   #128
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 897
Karma: 149877
Join Date: Jul 2013
Location: Netherlands
Device: Cracked HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
Links in the internet archive suggest that it is open source:
Quote:
Grand L... de la langue française
Publication date 1989
Topics français, langue française, français (langue), orthographe, dictionnaire, orthographe d’usage, encyclopédies, recte gallice loquor
Collection opensource
Language French

pdf-link to the internet archive
dvju.txt-link to the internet archive

Why did it take 128 posts before this became apparent? Bad @pzack! You should really give better info.

Last edited by Markismus; 10-02-2022 at 08:27 AM.
Markismus is offline  
Old 10-02-2022, 11:26 AM   #129
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Markismus View Post
It most certainly is not Open Source, because, AFAIK, the publishers of the Larousse dictionaries have never released older dictionaries into the Public Domain.
Unlike Project Gutenberg, archive.org, doesn't check the copyright status of documents and'll only take down documents after repeated DMCA takedown requests.
Doitsu is offline  
Old 10-02-2022, 12:15 PM   #130
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello M. Markismus,

Thank you for your message, and, as I have indicated before, I am thankful for all your help towards this attempted conversion.

I am astonished that I am being accused of attempted pirating! However, two members called my attention to possible copyright issues and use of complete files in the forum.

I am new to forums and I don't know all the rules and limitations when dealing with electronic texts.

It should be obvious to you now and to any member following the thread that we are working with open source and public domain material. If this is not the case, then I have had no indication to the contrary.

Will you permit me to say that I don't think that I merit your admonishment.

Please accept my sincere apologies, however, if you thought that I was withholding information from you. But, does this change anything? It appears that we are still working with a poor text that resists a pyglossary conversion.

I am, as I have said before, very appreciative of the time and efforts that you have freely given in trying to help me with this stardict conversion attempt.

Very cordially,
pz
pzack is offline  
Old 10-02-2022, 12:34 PM   #131
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by pzack View Post
It should be obvious to you now and to any member following the thread that we are working with open source and public domain material. If this is not the case, then I have had no indication to the contrary.
The person who uploaded the Larousse dictionary to archive.org. uploaded many other copyrighted textbooks and dictionaries to archive.org.
Unless you find an official statement by the publisher of the Larousse dictionaries that the dictionary that you found has been released into the Public Domain, you might want to assume that it's still in copyright.
Doitsu is offline  
Old 10-02-2022, 12:34 PM   #132
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Good afternoon, DOITSU,

Given the massive quantity of electronic material made available to the public, I, like countless general users of the internet, usually do not have the time or means to track down copyright issues or violations.

We are not talking about "deep sites" but widely known and used electronic information outlets that do provide educational material.

This forum is not using nor distributing, nor could it, any file or files that I personally have presented in this forum and what little that I have provided is for my own personnal use. And what I have used is not, knowingly, illegal, pirated or to anything to that effect.

That you want to make an issue of this is your lookout.

And please, could we not waste our wind on this.

cordially,
pz
pzack is offline  
Old 10-02-2022, 12:40 PM   #133
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,741
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
The dictionary isn't Public Domain. This dictionary is published electronically on the site "Gallica" which is the official site of the French National Library. What Gallica is. (in French) and The dictionary on their site. It doesn't have a "Download" button.

Here is the image of copyright page of the dictionary.
Spoiler:
Click image for larger version

Name:	Clipboard01.jpg
Views:	58
Size:	122.5 KB
ID:	196944


Here is what Gallica says about this dictionary. I put the part about rights in bold.
Spoiler:
Title : Grand dictionnaire des lettres ; 1-7. Grand Larousse de la langue française. Tome 1, A-CIP / [sous la dir. de Louis Guilbert,..., René Lagane,..., Georges Niobey,...]
Author : Larousse. Auteur du texte
Publisher : (Paris)
Publication date : 1989
Contributor : Guilbert, Louis (1912-1977). Directeur de publication
Contributor : Lagane, René. Directeur de publication
Contributor : Niobey, Georges. Directeur de publication
Set notice : http://catalogue.bnf.fr/ark:/12148/cb373349405
Relationship : Titre d'ensemble : Grand dictionnaire des lettres
Relationship : http://catalogue.bnf.fr/ark:/12148/cb374592134
Type : text
Type : monographie imprimée
Language : french
Format : 7 vol. (XCVI-6528 p.) ; 29 cm
Format : Nombre total de vues : 826
Description : Avec mode texte
Description : Dictionnaires
Rights : restricted use
Identifier : ark:/12148/bpt6k1200532b
Source : Larousse, 2012-144936
Provenance : Bibliothèque nationale de France
Online date : 24/12/2012
chaley is offline  
Old 10-02-2022, 12:44 PM   #134
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello, M Sarmat89,

I will try the new code, thank you.

Seems that a little can of worms has opened about the material that we are working on.

Nevertheless, I would like to continue, confident that no violations are at hand here.

However, I realise that you may, and possibly soon, tell me that,given the state of the material, the conversion cannot be effected.

I am willing to continue to try if you still think that it is worth trying. I wish that I could assist you but,as you know, I have absolutely no expertise in the manipulation with perl of the text.

Maybe,this time, there may be some luck.


Cordially,
pz
pzack is offline  
Old 10-02-2022, 01:23 PM   #135
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
M. DOITSU,

Perhaps it bears repeating that nothing is being sold nor distributed for sale nor use in this forum(nor could it be)or anywhere else and is being used strictly and solely and unequivically for personal use.

cordially,
pz
pzack is offline  
Closed Thread

Tags
pyglossary


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to PDF conversion causes all the text to be aligned to the left Swifty4635 Conversion 1 01-16-2022 10:17 PM
Desktop App How do I run PyGlossary on Windows ? Bilingual Kobo Reader 2 07-12-2020 01:54 PM
epub 2 PDF conversion with OCR in PDF possible? hobi2000 Conversion 2 03-25-2019 03:20 AM
PDF conversion keeping pdf page highstream Conversion 3 05-31-2016 11:46 AM
PDF to PDF conversion creates much larger file? rocketcat Conversion 11 09-30-2011 07:37 PM


All times are GMT -4. The time now is 01:58 AM.


MobileRead.com is a privately owned, operated and funded community.