Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-29-2022, 12:30 AM   #106
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 24,260
Karma: 111597955
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by pzack View Post
Good evening, M Sarmat89

Thank you for your message. I have attached some untabbed headwords.

Cordially,

pz
Looking at your sample, you still haven't unfolded the lines so you can see more than the first line in the dictionary display.
DNSB is offline  
Old 09-29-2022, 04:43 AM   #107
Markismus
Guru
Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.
 
Markismus's Avatar
 
Posts: 864
Karma: 144987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo
Well that doesn't seem so bad. Odd those spaces in front of every line. However, with both the conditions of 1) preceded by two EOL-characters and 2) followed by [letters] you should be able to identify the start of an article.

@pzack Why don't you send me the whole file and I have another look at it?
Markismus is offline  
Advert
Old 09-29-2022, 10:58 AM   #108
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello, DNSB-David,

Thank you for your response. I thought that the lines were being unfolded!

Do you know some code that would properly unfold the lines? You have been following what I have been doing; I thought that some of the code provided was directed towards unfolding the lines.

I do not understand enough the workings of pyglossary and the tabbing of lines. The index produced in the stardict format appears corrupt because only certain headwords are found even though they are tabbed. And I don't understand why many headwords are not tabbed.

I don't have the knowledge to correct these problems.

At least pyglossary is building the stardict files even though they are skewed. The three big problem issues remaining are getting all headwords tabbed and seen and having all the lines of headword definition recognised.

The code thus far used has been perl code.

Cordially,
pz
pzack is offline  
Old 09-29-2022, 11:13 AM   #109
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello, M Markismus,

Thank you for your response and I am thankful for your continued interest and help in my efforts to get a working stardict dictionary, both you, DNSB and Sarmat89.

I am not sure about copyright. Let me send you a very large portion of the file taken out of notepad++.

Do I send a large portion of the tsv file created or from the original text file?

Cordially,
pz
pzack is offline  
Old 09-29-2022, 12:05 PM   #110
Markismus
Guru
Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.
 
Markismus's Avatar
 
Posts: 864
Karma: 144987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo
If you send the link via PM, you won't have to worry about copyright. It's not on mobileread. The original txt- or pdf-file will be good.
Markismus is offline  
Advert
Old 09-29-2022, 01:43 PM   #111
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 421
Karma: 2146264
Join Date: Nov 2015
Device: none
You didn't run my code.
Sarmat89 is offline  
Old 09-29-2022, 05:09 PM   #112
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello M Sarmat89,

Thank you for your message.

Well, I am not surprised; I may have gotten a little lost in what code to use. May I trouble you to give me that exact code again.

As I understand this, I need to unfold the lines first, then use the 4 lines of code supplied by M. Markismus and then use your other line of code which finally produces the file to use in pyglossary.

To be certain, I beleive that you want me to use the tsv file that was generated before. And here is where I have some confusion; what file exactly do you want me to start with in this process begining with the line unfolding.

Once again, and to be sure, that last line of code that you gave me is to be used on the file just prior to putting it into pyglossary?

Very cordially,
pz
pzack is offline  
Old 09-29-2022, 05:30 PM   #113
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 421
Karma: 2146264
Join Date: Nov 2015
Device: none
No, "s:^ +::" needs to be run before merging lines by those 4 lines, and adding tabs.

Please include all commands that you've used, in full in the posts in the future.
Sarmat89 is offline  
Old 09-29-2022, 05:32 PM   #114
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Pyglossary conversion to stardict Codes

Hello, M Sarmat89,

I thought it best to add to the message just sent to you. Here are the codes as I understand that I am to use;

^([^[]+?) *(?=\[)

replaced with
Code:

\1\t

The codes above were from one of your earlier posts. I don't think that I am to use the above codes as I think that you provided me some code that supercedes the above.

This is for unfolding the lines and to be used first;

(?<=\S)\n(?=\S)

Then I use these 4 lines of code;

erl -pe 's/\n\n+/\|\|/sg' <original.txt> output1.txt
perl -pe 's/\n/ /sg' <output1.txt> output2.txt
perl -pe 's/\|\|/\n/sg' <output2.txt> output3.txt
perl -pe 's/^(\S+)/$1 /sg' <output3.txt> output4.csv

Finally, I use this line of code which produces the final file for use in pyglossary;

perl -pe "s:^([^[]+?) *(?=\[):\1\t:" <your-file-here >destination.tsv

Kindly, please let me know if all this is correct otherwise please point out the errors/s.

Once again, please indicate what file I am to use to begin this process that begins with the line unfolding.

You had provided some other pieces of code in earlier posts but I assume that what is above supercedes the other code.

Thank you for your patience with all this.

Cordially,
pz

Last edited by pzack; 09-29-2022 at 05:35 PM.
pzack is offline  
Old 09-29-2022, 08:22 PM   #115
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 421
Karma: 2146264
Join Date: Nov 2015
Device: none
Try this:

Code:
perl -pe "s:^ +::" < destination.tsv | perl -pe 's/\n\n+/\|\|/sg' | perl -pe 's/\n/ /sg' | perl -pe 's/\|\|/\n/sg' | perl -pe 's/^(\S+)/$1 /sg' | perl -pe "s:^([^[]+?) *(?=\[):\1\t:" > new.tsv
Then use pyglossary on "new.tsv"
Sarmat89 is offline  
Old 09-30-2022, 09:19 AM   #116
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello, M Sarmat89,

Thank you for your message.

I am referring to your posts 113 and 115.

113; No, "s:^ +::" needs to be run before merging lines by those 4 lines, and adding tabs.

Please include all commands that you've used, in full in the posts in the future.

I am not sure what this means; is this the full code to use before the 4 lines? Is this for unfolding lines?

Does "s:^ +::" replace (?<=\S)\n(?=\S) for unfolding lines and to be used just before the 4 lines?

I assume, then, that perl -pe "s:^ +::" < destination.tsv | perl -pe 's/\n\n+/\|\|/sg' | perl -pe 's/\n/ /sg' | perl -pe 's/\|\|/\n/sg' | perl -pe 's/^(\S+)/$1 /sg' | perl -pe "s:^([^[]+?) *(?=\[):\1\t:" > new.tsv

replaces perl -pe "s:^([^[]+?) *(?=\[):\1\t:" <your-file-here >destination.tsv

Am I using the original full txt file to start with? I am a little confused about what file that I should be using to begin with and I begin with the unfolding of lines, correct?

Thus, is my post 114 correct except for your replacements above. I want to be certain about the code that I am to use for unfolding lines.

I am to dis-regard this code from your earlier posting; ^([^[]+?) *(?=\[)

replaced with
Code:

\1\t

Is that correct that this code has been superceded?


Cordially,
pz

Last edited by pzack; 09-30-2022 at 09:27 AM.
pzack is offline  
Old 09-30-2022, 02:39 PM   #117
jmurphy
Connoisseur
jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.
 
Posts: 55
Karma: 221652
Join Date: Sep 2007
Device: ipaq
Eight pages and it stopped being about Calibre on page one. You guys are saints!
Any chance one of you could just create a Notepad ++ macro to do the work? Then the steps are reduced to: Open Notepad++, install macro, load original text file, run macro, save final text file.

( https://xkcd.com/1171/ )
jmurphy is offline  
Old 09-30-2022, 04:23 PM   #118
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 421
Karma: 2146264
Join Date: Nov 2015
Device: none
That command can also be applied to the source file. That one long command should do all the necessary transformations.

Please post the exact entire commands you've entered in your terminal, with real filenames and all.
Sarmat89 is offline  
Old 09-30-2022, 09:32 PM   #119
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Good evening M Sarmat89,

I am sorry but I still need clarification.

primero;

"That command can be applied etc.," Please, what command are you referring too and where am I using it?

segundo;

Is perl -pe "s:^ +::" <original.txt>output1.txt

or;

Is perl -pe (?<=\S)\n(?=\S)<original.txt>output1.txt

to be used first-the first operation on the original txt file-before using the resulting file as the first file used in the four lines of code?

tercero;

This line of code; perl -pe "s:^ +::" < destination.tsv | perl -pe 's/\n\n+/\|\|/sg' | perl -pe 's/\n/ /sg' | perl -pe 's/\|\|/\n/sg' | perl -pe 's/^(\S+)/$1 /sg' | perl -pe "s:^([^[]+?) *(?=\[):\1\t:" > new.tsv

would use output4.csv(created by the four lines of perl code)which gets transformed into the new.tsv. New.tsv goes to pyglossary for the stardict conversion.

Thus, please clarify the very first code to be used; is it for pre-line unfolding or is it for the line unfolding itself?

It appears then, that the process is;

one line of code; line unfolding

four lines of code; creation of the .csv file

one long line of code; creation of the .tsv file

pyglossary using the final .tsv file

working stardict dictionary(we hope)

Please let me know where the error/s is nd/or confirm that 's good to go so that I can start in on this.

Cordially,
pz
pzack is offline  
Old 10-01-2022, 05:27 AM   #120
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 421
Karma: 2146264
Join Date: Nov 2015
Device: none
The command from post 115 should do all the necessary operations to produce an unfolded, tab-separated file for pyglossary. You can use either the file you have now, or your original text file. Use the name of the source file you want to use instead of "destination.tsv" in the beginning of the command.
Sarmat89 is offline  
Closed Thread

Tags
pyglossary

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to PDF conversion causes all the text to be aligned to the left Swifty4635 Conversion 1 01-16-2022 10:17 PM
Desktop App How do I run PyGlossary on Windows ? Bilingual Kobo Reader 2 07-12-2020 01:54 PM
epub 2 PDF conversion with OCR in PDF possible? hobi2000 Conversion 2 03-25-2019 03:20 AM
PDF conversion keeping pdf page highstream Conversion 3 05-31-2016 11:46 AM
PDF to PDF conversion creates much larger file? rocketcat Conversion 11 09-30-2011 07:37 PM


All times are GMT -4. The time now is 03:30 AM.


MobileRead.com is a privately owned, operated and funded community.