![]() |
#106 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,260
Karma: 111597955
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
|
![]() |
![]() |
#107 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 864
Karma: 144987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo
|
Well that doesn't seem so bad. Odd those spaces in front of every line. However, with both the conditions of 1) preceded by two EOL-characters and 2) followed by [letters] you should be able to identify the start of an article.
@pzack Why don't you send me the whole file and I have another look at it? |
![]() |
Advert | |
|
![]() |
#108 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
Hello, DNSB-David,
Thank you for your response. I thought that the lines were being unfolded! Do you know some code that would properly unfold the lines? You have been following what I have been doing; I thought that some of the code provided was directed towards unfolding the lines. I do not understand enough the workings of pyglossary and the tabbing of lines. The index produced in the stardict format appears corrupt because only certain headwords are found even though they are tabbed. And I don't understand why many headwords are not tabbed. I don't have the knowledge to correct these problems. At least pyglossary is building the stardict files even though they are skewed. The three big problem issues remaining are getting all headwords tabbed and seen and having all the lines of headword definition recognised. The code thus far used has been perl code. Cordially, pz |
![]() |
![]() |
#109 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
Hello, M Markismus,
Thank you for your response and I am thankful for your continued interest and help in my efforts to get a working stardict dictionary, both you, DNSB and Sarmat89. I am not sure about copyright. Let me send you a very large portion of the file taken out of notepad++. Do I send a large portion of the tsv file created or from the original text file? Cordially, pz |
![]() |
![]() |
#110 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 864
Karma: 144987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo
|
If you send the link via PM, you won't have to worry about copyright. It's not on mobileread. The original txt- or pdf-file will be good.
|
![]() |
Advert | |
|
![]() |
#111 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 421
Karma: 2146264
Join Date: Nov 2015
Device: none
|
You didn't run my code.
|
![]() |
![]() |
#112 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
Hello M Sarmat89,
Thank you for your message. Well, I am not surprised; I may have gotten a little lost in what code to use. May I trouble you to give me that exact code again. As I understand this, I need to unfold the lines first, then use the 4 lines of code supplied by M. Markismus and then use your other line of code which finally produces the file to use in pyglossary. To be certain, I beleive that you want me to use the tsv file that was generated before. And here is where I have some confusion; what file exactly do you want me to start with in this process begining with the line unfolding. Once again, and to be sure, that last line of code that you gave me is to be used on the file just prior to putting it into pyglossary? Very cordially, pz |
![]() |
![]() |
#113 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 421
Karma: 2146264
Join Date: Nov 2015
Device: none
|
No, "s:^ +::" needs to be run before merging lines by those 4 lines, and adding tabs.
Please include all commands that you've used, in full in the posts in the future. |
![]() |
![]() |
#114 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
Pyglossary conversion to stardict Codes
Hello, M Sarmat89,
I thought it best to add to the message just sent to you. Here are the codes as I understand that I am to use; ^([^[]+?) *(?=\[) replaced with Code: \1\t The codes above were from one of your earlier posts. I don't think that I am to use the above codes as I think that you provided me some code that supercedes the above. This is for unfolding the lines and to be used first; (?<=\S)\n(?=\S) Then I use these 4 lines of code; erl -pe 's/\n\n+/\|\|/sg' <original.txt> output1.txt perl -pe 's/\n/ /sg' <output1.txt> output2.txt perl -pe 's/\|\|/\n/sg' <output2.txt> output3.txt perl -pe 's/^(\S+)/$1 /sg' <output3.txt> output4.csv Finally, I use this line of code which produces the final file for use in pyglossary; perl -pe "s:^([^[]+?) *(?=\[):\1\t:" <your-file-here >destination.tsv Kindly, please let me know if all this is correct otherwise please point out the errors/s. Once again, please indicate what file I am to use to begin this process that begins with the line unfolding. You had provided some other pieces of code in earlier posts but I assume that what is above supercedes the other code. Thank you for your patience with all this. Cordially, pz Last edited by pzack; 09-29-2022 at 05:35 PM. |
![]() |
![]() |
#115 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 421
Karma: 2146264
Join Date: Nov 2015
Device: none
|
Try this:
Code:
perl -pe "s:^ +::" < destination.tsv | perl -pe 's/\n\n+/\|\|/sg' | perl -pe 's/\n/ /sg' | perl -pe 's/\|\|/\n/sg' | perl -pe 's/^(\S+)/$1 /sg' | perl -pe "s:^([^[]+?) *(?=\[):\1\t:" > new.tsv |
![]() |
![]() |
#116 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
Hello, M Sarmat89,
Thank you for your message. I am referring to your posts 113 and 115. 113; No, "s:^ +::" needs to be run before merging lines by those 4 lines, and adding tabs. Please include all commands that you've used, in full in the posts in the future. I am not sure what this means; is this the full code to use before the 4 lines? Is this for unfolding lines? Does "s:^ +::" replace (?<=\S)\n(?=\S) for unfolding lines and to be used just before the 4 lines? I assume, then, that perl -pe "s:^ +::" < destination.tsv | perl -pe 's/\n\n+/\|\|/sg' | perl -pe 's/\n/ /sg' | perl -pe 's/\|\|/\n/sg' | perl -pe 's/^(\S+)/$1 /sg' | perl -pe "s:^([^[]+?) *(?=\[):\1\t:" > new.tsv replaces perl -pe "s:^([^[]+?) *(?=\[):\1\t:" <your-file-here >destination.tsv Am I using the original full txt file to start with? I am a little confused about what file that I should be using to begin with and I begin with the unfolding of lines, correct? Thus, is my post 114 correct except for your replacements above. I want to be certain about the code that I am to use for unfolding lines. I am to dis-regard this code from your earlier posting; ^([^[]+?) *(?=\[) replaced with Code: \1\t Is that correct that this code has been superceded? Cordially, pz Last edited by pzack; 09-30-2022 at 09:27 AM. |
![]() |
![]() |
#117 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 55
Karma: 221652
Join Date: Sep 2007
Device: ipaq
|
Eight pages and it stopped being about Calibre on page one. You guys are saints!
Any chance one of you could just create a Notepad ++ macro to do the work? Then the steps are reduced to: Open Notepad++, install macro, load original text file, run macro, save final text file. ( https://xkcd.com/1171/ ) |
![]() |
![]() |
#118 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 421
Karma: 2146264
Join Date: Nov 2015
Device: none
|
That command can also be applied to the source file. That one long command should do all the necessary transformations.
Please post the exact entire commands you've entered in your terminal, with real filenames and all. |
![]() |
![]() |
#119 |
Connoisseur
![]() Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
|
Good evening M Sarmat89,
I am sorry but I still need clarification. primero; "That command can be applied etc.," Please, what command are you referring too and where am I using it? segundo; Is perl -pe "s:^ +::" <original.txt>output1.txt or; Is perl -pe (?<=\S)\n(?=\S)<original.txt>output1.txt to be used first-the first operation on the original txt file-before using the resulting file as the first file used in the four lines of code? tercero; This line of code; perl -pe "s:^ +::" < destination.tsv | perl -pe 's/\n\n+/\|\|/sg' | perl -pe 's/\n/ /sg' | perl -pe 's/\|\|/\n/sg' | perl -pe 's/^(\S+)/$1 /sg' | perl -pe "s:^([^[]+?) *(?=\[):\1\t:" > new.tsv would use output4.csv(created by the four lines of perl code)which gets transformed into the new.tsv. New.tsv goes to pyglossary for the stardict conversion. Thus, please clarify the very first code to be used; is it for pre-line unfolding or is it for the line unfolding itself? It appears then, that the process is; one line of code; line unfolding four lines of code; creation of the .csv file one long line of code; creation of the .tsv file pyglossary using the final .tsv file working stardict dictionary(we hope) Please let me know where the error/s is nd/or confirm that 's good to go so that I can start in on this. Cordially, pz |
![]() |
![]() |
#120 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 421
Karma: 2146264
Join Date: Nov 2015
Device: none
|
The command from post 115 should do all the necessary operations to produce an unfolded, tab-separated file for pyglossary. You can use either the file you have now, or your original text file. Use the name of the source file you want to use instead of "destination.tsv" in the beginning of the command.
|
![]() |
![]() |
Tags |
pyglossary |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF to PDF conversion causes all the text to be aligned to the left | Swifty4635 | Conversion | 1 | 01-16-2022 10:17 PM |
Desktop App How do I run PyGlossary on Windows ? | Bilingual | Kobo Reader | 2 | 07-12-2020 01:54 PM |
epub 2 PDF conversion with OCR in PDF possible? | hobi2000 | Conversion | 2 | 03-25-2019 03:20 AM |
PDF conversion keeping pdf page | highstream | Conversion | 3 | 05-31-2016 11:46 AM |
PDF to PDF conversion creates much larger file? | rocketcat | Conversion | 11 | 09-30-2011 07:37 PM |