Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-10-2022, 11:57 AM   #16
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Dear Saramt89,

Adding to what I just posted as reply to your response. Would you need to write the code so that the beginning bracket would be the indicator of the headword or more exactly the beginning of the line that contains the headword? Sometimes there are two headwords on the same line if you have masculine and feminin endings. But, always there is a headword(s)before the first or leading bracket. These brackets do not appear in the text of the definitions. There may be parentheses but not brackets which only are used for the prononciation of the headword.

Thus, do we need a find and replace(or insert?)a tab instructionthat will put a tab somewhere beginning with brackets or the leading bracket and the headword before it?

pz
pzack is offline  
Old 09-10-2022, 01:27 PM   #17
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Dear Markismus(and Sarmat89)

Here, enclosed is the real skinny; Couldn't figure out how to attach file but below is the actual text taken from the full text file.

zymogène [zims3en] adj. (de zymo- et de
-gène, du gr.gennân, engendrer, produire ;
1888, Larousse, comme qualificatif d’une
substance qui produit un ferment soluble,
par une transformation spontanée ; sens
actuel, 1964, Larousse). Pouvoir zymogène,
propriété des cellules de fabriquer leurs
propres enzymes ; propriété des glandes
spécialisées de produire les enzymes néces-
saires à l'organisme.


© n. m. (1964, Robert). Précurseur inactif
d'un enzyme. (Syn. PROENZYME.)


zymotechnie [zimotekni] n. f. (de zymo-
et de -fechnie, du gr. tekhné, art [manuel],
industrie, métier ; 1762, Acad.). Art de
produire et de diriger une fermentation.


zymotechnique [zimoteknik] adj. (de
zymotechnie ; 1872, Littré). Qui se rapporte
à la zymotechnie.


zymotique [zimotik] adj. (gr. zumôtikos,
propre à faire fermenter, de zumôtos, fer-
menté, dér. de zumoün, faire fermenter, de
zum, levain ; 1855 [d'après Robert, 1977],
puis 1868, Souviron, 585). Qui se rapporte
aux ferments solubles.


zythum {zitsm] ou zython [zit5] n.m.
(lat. zythum, bière, boisson faite avec de
l'orge, du gr. zuthos, décoction d'orge,
bière ; 1710, Richelet — additions —
[zythum], et 1923, Larousse [zython]). Bière
que les Égyptiens préparaient avec de l’orge
fermentée.

Very cordially,
pz
pzack is offline  
Advert
Old 09-10-2022, 01:30 PM   #18
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Dear Markismus,

I wanted to make clear that the text just sent to you is the actual text as it appears in the full text file copied in bloc-notes win 11. No alterations on my part.

pz
pzack is offline  
Old 09-10-2022, 03:13 PM   #19
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,507
Karma: 145557716
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
zymotique [zimotik] adj. (gr. zumôtikos,


zythum {zitsm] ou zython [zit5] n.m.

Is there a reason for the use of [ and {? An error in your original .xml file?

Again, click the manage attachments button, click on browse. Locate and select the file that you want to attach (it must be one of the supported file types). Once you have selected the file you want, click on upload.
DNSB is offline  
Old 09-10-2022, 05:13 PM   #20
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
You need to unfold the lines first. Try replacing
Code:
(?<=\S)\n(?=\S)
(insert "\r" before "\n" if the expression fails) with a space.
Sarmat89 is offline  
Advert
Old 09-10-2022, 08:43 PM   #21
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Dear David(DNSB),

Thank you for responding. I didn't notice the change in bracket style, however, I think that what is important is the fact that the headword(s) will be located before the first leading bracket no matter the style.

I guess the code could be written to look for the first leading bracket in the two styles.

I hope that this is a help.

Cordially,
pz
pzack is offline  
Old 09-10-2022, 09:00 PM   #22
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Dear Sarmat89,

Thank you for responding.

I must ask you to forgive my obtuseness when it comes to programming and code. For example "unfold the lines first"? How do I "unfold a line". "Replacing code". Where and what code am I replacing?

Please don't assume that I know the techinical language that you probably are comfortable with.

May I ask you, since you have the real sample of the text to work with, to actually illustrate using the provided text of what you suggest needs to be done.

You are communicating with a first-grader when it comes to this type of programing. I have to be lead by the hand here.

Hopefully, you have the patience to walk me through this. I sense some impatiernce among some of the respondants, and I understand this, but I am not at the level of expertise of my respondants.

Very cordially,
pz
pzack is offline  
Old 09-10-2022, 09:07 PM   #23
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Dear Sarmat89,

Adding to the message just sent to you. I also don't know what text editor that you are using and what program(and how to obtain the program)that would be doing the text modifications.

I have bloc-notes under win 11. Perhaps, I need a different text editor?

pz
pzack is offline  
Old 09-10-2022, 09:52 PM   #24
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,507
Karma: 145557716
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
I would suggest installing Notepad++ (not absolutely certain but is bloc-notes the same as notepad?).

As for unfolding lines, what is meant is to take:

Code:
zymotique [zimotik] adj. (gr. zumôtikos,
propre à faire fermenter, de zumôtos, fer-
menté, dér. de zumoün, faire fermenter, de
zum, levain ; 1855 [d'après Robert, 1977],
puis 1868, Souviron, 585). Qui se rapporte
aux ferments solubles.
and convert it to a single line:

Code:
zymotique [zimotik] adj. (gr. zumôtikos, propre à faire fermenter, de zumôtos, fer- menté, dér. de zumoün, faire fermenter, de zum, levain ; 1855 [d'après Robert, 1977], puis 1868, Souviron, 585). Qui se rapporte aux ferments solubles.
DNSB is offline  
Old 09-11-2022, 07:55 AM   #25
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 897
Karma: 149877
Join Date: Jul 2013
Location: Netherlands
Device: Cracked HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
So the problem is of course in the assumptions.

Conversion to csv-file
I've used sublime3, because it supports Perl regex. However, with a bit of googling you'll find the slight differences in regex implementation in editors. I've also included the perl commands.

If you use the following substitutions in order, you get a csv-file.

Find --> Replace ALL, e.g. perl -pe 's/\n\n+/||/sg'
'\n\n+' --> '||' , masking of the lines separating articles
'\n' --> ' ' , removal of the <EOL>-characters inside an article
'\|\|' --> '\n' , insertion of <EOL>-character at the end of an article. The article is now on 1 line.
'^(\S+)' --> '$1|,|$1' , Repeating the first word and introducing a delimiter, e.g. |,|. The reason for a complex delimiter is that it will not occur naturally in the article.
'^(\S+)' --> '$1,' , Splitting the first word and introducing a comma

The last two replacements are alternatives.
I've added the original text-file and the intermediate results.
You can recreate them with the commands
Code:
perl -pe 's/\n\n+/\|\|/sg' <original.txt> output1.txt
perl -pe 's/\n/ /sg' <output1.txt> output2.txt
perl -pe 's/\|\|/\n/sg' <output2.txt> output3.txt
perl -pe 's/^(\S+)/$1 /sg' <output3.txt> output4.csv
A final result in the classical csv-format is this:
Code:
zymogène, [zims3en] adj. (de zymo- et de -gène, du gr.gennân, engendrer, produire ; 1888, Larousse, comme qualificatif d’une substance qui produit un ferment soluble, par une transformation spontanée ; sens actuel, 1964, Larousse). Pouvoir zymogène, propriété des cellules de fabriquer leurs propres enzymes ; propriété des glandes spécialisées de produire les enzymes néces- saires à l'organisme.
©, n. m. (1964, Robert). Précurseur inactif d'un enzyme. (Syn. PROENZYME.)
zymotechnie, [zimotekni] n. f. (de zymo- et de -fechnie, du gr. tekhné, art [manuel], industrie, métier ; 1762, Acad.). Art de produire et de diriger une fermentation.
zymotechnique, [zimoteknik] adj. (de zymotechnie ; 1872, Littré). Qui se rapporte à la zymotechnie.
zymotique, [zimotik] adj. (gr. zumôtikos, propre à faire fermenter, de zumôtos, fer- menté, dér. de zumoün, faire fermenter, de zum, levain ; 1855 [d'après Robert, 1977], puis 1868, Souviron, 585). Qui se rapporte aux ferments solubles.
zythum, {zitsm] ou zython [zit5] n.m. (lat. zythum, bière, boisson faite avec de l'orge, du gr. zuthos, décoction d'orge, bière ; 1710, Richelet — additions — [zythum], et 1923, Larousse [zython]). Bière que les Égyptiens préparaient avec de l’orge fermentée.
Problem
So what's the problem? You now have an article with the key '©' that has a quite new meaning. Apparently, there are articles that have subsections separated from the main article in the same way that articles are separated.

Stardict
Using my script I've added to the txt-file a csv-extension and ran it using
Code:
perl pocketbookdic.pl  zymogène.S-delimiter .txt.csv fr '|,|'
The result in both the xml- and zipped binary form are also uploaded.

The screen output (with '$isTestingOn = 1;' in the script) is like this:
Attached Thumbnails
Click image for larger version

Name:	Screenshot from 2022-09-11 14-32-07.png
Views:	219
Size:	257.0 KB
ID:	196444  
Attached Files
File Type: txt zymogène.txt (1.3 KB, 78 views)
File Type: txt n-||.txt (1.2 KB, 84 views)
File Type: txt n- .txt (1.2 KB, 78 views)
File Type: txt n.txt (1.2 KB, 92 views)
File Type: txt zymogène.S-delimiter .txt (1.3 KB, 80 views)
File Type: txt zymogène.S-, .txt (1.2 KB, 83 views)
File Type: zip zymogène.S-delimiter .txt_reconstructed.zip (1.8 KB, 91 views)
File Type: xml zymogène.S-delimiter .txt_reconstructed.xml (2.2 KB, 106 views)

Last edited by Markismus; 09-11-2022 at 08:50 AM.
Markismus is offline  
Old 09-11-2022, 12:07 PM   #26
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Dear David(DNSB)

It's Sunday and I don't know if you want "work" on Sunday.

I see your example, thankyou. Now, what is the reason for the one line and how do I actually do this in notepad++ and have it go through the over 100,000 listed words and the attached definitions?

Looking at the text you see that some lengthy definitions are separated into paragraphs with space between paragraphs; how would the separate paragraphs be included in the one line?

After everything is put on one line for each headword what would be the next step for getting pyglossary to convert the file to stardict? Would I be putting a tab somewhere? and if so, how would this be done?

Do I need a special "sub-editor" to work inside notepad++?

As for notepad++, it may be the same as bloc-notes, however, I will try to install notepad++.

I assume, then, that you prefer to have me work under win 11 with notepad++ than linux.

Whatever is the most simple is best for me.

Cordially,
pz
pzack is offline  
Old 09-11-2022, 01:01 PM   #27
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Dear Markismus,

You have put not a little work in your response to me and I am very appreciative of your efforts to help me.

Let me try to understand what you are proposing;

Firstly, I need to build a csv file. You applied the four lines of perl code to convert the one word to a csv formated file. Thus, do I plug in the original text file name in your first of four lines of code(perl -pe 's/\n\n+/\|\|/sg' <original.txt> output1.txt)and then follow through to the fourth line insserting the actual file names?

This then, would give me a complete csv file of the full text file of which you have the example?

Secondly, and I quote you:

"Stardict
Using my script I've added to the txt-file a csv-extension and ran it using
Code:

perl pocketbookdic.pl zymogène.S-delimiter .txt.csv fr '|,|'

I thought that we already built the csv file with your four lines of perl code. What txt file are you now adding a csv extension too. And what am I doing with the .xml and .zip files? Have I created these with your code?

Do I understand correctly that pyglossary will convert the csv file created? Does this side-step the tab-delimiting of the text file or was this accomplished in your code?

Where do I find "perl" and is this an instruction set to be used in a particular text editor. Is this under Linux terminal? What text editor are you using? Is sublime3 the editor? I am a little confused about what I actually need to impliment what you want me to do.

I hope that my understanding(or what little there is of)is not completely off base!

cordially,
pz
pzack is offline  
Old 09-11-2022, 01:47 PM   #28
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Dear Markismus,

Adding to the just-sent message this Sunday, I have installed notepad++ and have installed Perl in it. This is under windows 11.

cordially,
pz
pzack is offline  
Old 09-11-2022, 04:44 PM   #29
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Dear Markismus,

I was finally able to install ActivePerl for windows. I copied your first line of code into the command line for perl to execute it and it gave me back this message:

[ActiveState/ActivePerl-5.28] C:\Users\k\ActivePerl-5.28>perl -pe 's/\n\n+/\|\|/sg' grandl.txt output1.txt
'\' n’est pas reconnu en tant que commande interne
ou externe, un programme exécutable ou un fichier de commandes.

Which means that the '\' is not recognised as an internal nor external commande nor an executanle programm nor an file of commands.

How do execute the code, then, that you wrote?

I didn't want to impose upon you but would you convert the full text file that I have into a stardict dictionary for me? Otherwise, I can continue on this way-with your guidance. It is a learning experience in any event.

Thus, I think I have Perl installed under windows but I am stuck executing the code that you wrote.

pz
pzack is offline  
Old 09-11-2022, 05:05 PM   #30
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 897
Karma: 149877
Join Date: Jul 2013
Location: Netherlands
Device: Cracked HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
You can post a link to the full txt-file.
Markismus is offline  
Closed Thread

Tags
pyglossary


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to PDF conversion causes all the text to be aligned to the left Swifty4635 Conversion 1 01-16-2022 10:17 PM
Desktop App How do I run PyGlossary on Windows ? Bilingual Kobo Reader 2 07-12-2020 01:54 PM
epub 2 PDF conversion with OCR in PDF possible? hobi2000 Conversion 2 03-25-2019 03:20 AM
PDF conversion keeping pdf page highstream Conversion 3 05-31-2016 11:46 AM
PDF to PDF conversion creates much larger file? rocketcat Conversion 11 09-30-2011 07:37 PM


All times are GMT -4. The time now is 08:32 PM.


MobileRead.com is a privately owned, operated and funded community.