Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old Today, 12:30 AM   #106
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 21,492
Karma: 101629577
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by pzack View Post
Good evening, M Sarmat89

Thank you for your message. I have attached some untabbed headwords.

Cordially,

pz
Looking at your sample, you still haven't unfolded the lines so you can see more than the first line in the dictionary display.
DNSB is offline   Reply With Quote
Old Today, 04:43 AM   #107
Markismus
Guru
Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.
 
Markismus's Avatar
 
Posts: 826
Karma: 144987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, cracked OnyxNotePro, lots of cracked Kobo's
Well that doesn't seem so bad. Odd those spaces in front of every line. However, with both the conditions of 1) preceded by two EOL-characters and 2) followed by [letters] you should be able to identify the start of an article.

@pzack Why don't you send me the whole file and I have another look at it?
Markismus is offline   Reply With Quote
Old Today, 10:58 AM   #108
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 68
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello, DNSB-David,

Thank you for your response. I thought that the lines were being unfolded!

Do you know some code that would properly unfold the lines? You have been following what I have been doing; I thought that some of the code provided was directed towards unfolding the lines.

I do not understand enough the workings of pyglossary and the tabbing of lines. The index produced in the stardict format appears corrupt because only certain headwords are found even though they are tabbed. And I don't understand why many headwords are not tabbed.

I don't have the knowledge to correct these problems.

At least pyglossary is building the stardict files even though they are skewed. The three big problem issues remaining are getting all headwords tabbed and seen and having all the lines of headword definition recognised.

The code thus far used has been perl code.

Cordially,
pz
pzack is offline   Reply With Quote
Old Today, 11:13 AM   #109
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 68
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello, M Markismus,

Thank you for your response and I am thankful for your continued interest and help in my efforts to get a working stardict dictionary, both you, DNSB and Sarmat89.

I am not sure about copyright. Let me send you a very large portion of the file taken out of notepad++.

Do I send a large portion of the tsv file created or from the original text file?

Cordially,
pz
pzack is offline   Reply With Quote
Old Today, 12:05 PM   #110
Markismus
Guru
Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.
 
Markismus's Avatar
 
Posts: 826
Karma: 144987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, cracked OnyxNotePro, lots of cracked Kobo's
If you send the link via PM, you won't have to worry about copyright. It's not on mobileread. The original txt- or pdf-file will be good.
Markismus is offline   Reply With Quote
Old Today, 01:43 PM   #111
Sarmat89
Addict
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 373
Karma: 2145408
Join Date: Nov 2015
Device: none
You didn't run my code.
Sarmat89 is online now   Reply With Quote
Old Today, 05:09 PM   #112
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 68
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello M Sarmat89,

Thank you for your message.

Well, I am not surprised; I may have gotten a little lost in what code to use. May I trouble you to give me that exact code again.

As I understand this, I need to unfold the lines first, then use the 4 lines of code supplied by M. Markismus and then use your other line of code which finally produces the file to use in pyglossary.

To be certain, I beleive that you want me to use the tsv file that was generated before. And here is where I have some confusion; what file exactly do you want me to start with in this process begining with the line unfolding.

Once again, and to be sure, that last line of code that you gave me is to be used on the file just prior to putting it into pyglossary?

Very cordially,
pz
pzack is offline   Reply With Quote
Old Today, 05:30 PM   #113
Sarmat89
Addict
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 373
Karma: 2145408
Join Date: Nov 2015
Device: none
No, "s:^ +::" needs to be run before merging lines by those 4 lines, and adding tabs.

Please include all commands that you've used, in full in the posts in the future.
Sarmat89 is online now   Reply With Quote
Old Today, 05:32 PM   #114
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 68
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Pyglossary conversion to stardict Codes

Hello, M Sarmat89,

I thought it best to add to the message just sent to you. Here are the codes as I understand that I am to use;

^([^[]+?) *(?=\[)

replaced with
Code:

\1\t

The codes above were from one of your earlier posts. I don't think that I am to use the above codes as I think that you provided me some code that supercedes the above.

This is for unfolding the lines and to be used first;

(?<=\S)\n(?=\S)

Then I use these 4 lines of code;

erl -pe 's/\n\n+/\|\|/sg' <original.txt> output1.txt
perl -pe 's/\n/ /sg' <output1.txt> output2.txt
perl -pe 's/\|\|/\n/sg' <output2.txt> output3.txt
perl -pe 's/^(\S+)/$1 /sg' <output3.txt> output4.csv

Finally, I use this line of code which produces the final file for use in pyglossary;

perl -pe "s:^([^[]+?) *(?=\[):\1\t:" <your-file-here >destination.tsv

Kindly, please let me know if all this is correct otherwise please point out the errors/s.

Once again, please indicate what file I am to use to begin this process that begins with the line unfolding.

You had provided some other pieces of code in earlier posts but I assume that what is above supercedes the other code.

Thank you for your patience with all this.

Cordially,
pz

Last edited by pzack; Today at 05:35 PM.
pzack is offline   Reply With Quote
Old Today, 08:22 PM   #115
Sarmat89
Addict
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 373
Karma: 2145408
Join Date: Nov 2015
Device: none
Try this:

Code:
perl -pe "s:^ +::" < destination.tsv | perl -pe 's/\n\n+/\|\|/sg' | perl -pe 's/\n/ /sg' | perl -pe 's/\|\|/\n/sg' | perl -pe 's/^(\S+)/$1 /sg' | perl -pe "s:^([^[]+?) *(?=\[):\1\t:" > new.tsv
Then use pyglossary on "new.tsv"
Sarmat89 is online now   Reply With Quote
Reply

Tags
pyglossary

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to PDF conversion causes all the text to be aligned to the left Swifty4635 Conversion 1 01-16-2022 10:17 PM
Desktop App How do I run PyGlossary on Windows ? Bilingual Kobo Reader 2 07-12-2020 01:54 PM
epub 2 PDF conversion with OCR in PDF possible? hobi2000 Conversion 2 03-25-2019 03:20 AM
PDF conversion keeping pdf page highstream Conversion 3 05-31-2016 11:46 AM
PDF to PDF conversion creates much larger file? rocketcat Conversion 11 09-30-2011 07:37 PM


All times are GMT -4. The time now is 11:19 PM.


MobileRead.com is a privately owned, operated and funded community.