Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-16-2022, 07:45 AM   #61
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 404
Karma: 2146264
Join Date: Nov 2015
Device: none
Your source is too low quality to rely on automatic procedures. You already have a file with long lines, open it in Notepad++ or like, paste the expressions into "Search" and "Replace" boxes, set the "Regular expression" switch, and press "Replace All". Then find where that expression failed, and replace those manually.

What "spaces" are you talking about?
Sarmat89 is offline  
Old 09-16-2022, 01:49 PM   #62
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Good afternoon M Samart89,

Thank you for your response.

Would you give me some concrete examples of the search and replace commands in notepad++. I have never really worked with a text in notepad.

The spaces are the empty lines between definition text; I don't know why the definition is not consecutive lines of text and if you have to eliminate these lines.

Look for the first beginning(string)headword on the line with the bracket "[" following the string headword; if found insert a tab,otherwise,continue to next line. This would be the english language instruction set(or something similar to this) to be put into code for tab delimiting the text file. I am not sure where you would insert the tab.

What would have been a "high quality" text file?

Are you really suggesting manual modification to this voluminous text file?

Cordially,
dk
pzack is offline  
Advert
Old 09-16-2022, 03:53 PM   #63
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 404
Karma: 2146264
Join Date: Nov 2015
Device: none
Your file contains OCR errors.

Code:
^([^[]+?) *(?=\[)
replaced with
Code:
\1\t
will insert the tab after the text preceding the first [ in the line, excluding the spaces, and the [ itself.

If you are going to use perl, try
Code:
perl -pe "s:^([^[]+?) *(?=\[):\1\t:" <your-file-here >destination.tsv
Sarmat89 is offline  
Old 09-17-2022, 02:48 PM   #64
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello M. Samart89,

Thank you for your response and the Perl codes. You have me creating a TSV file I think. However, pyglossary, as far as I know, supports csv files for conversion and not tsv. According to Github the tsv extension is not listed as supported extensions in pyglossary.

Any suggestions?

cordially,
pz
pzack is offline  
Old 09-17-2022, 04:49 PM   #65
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 404
Karma: 2146264
Join Date: Nov 2015
Device: none
Works fine here.
Sarmat89 is offline  
Advert
Old 09-18-2022, 12:34 PM   #66
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
pyglossary conversion of tsv file

Good afternoon M. Samart89,

Pyglossary did indeed use the tsv file. Here are two images included(hope they are attached correctly).

Lots of no tab errors, not the headwords,but just text. And the index .idx file is probably corrupt.

I didn't put the files into koreader to try them thinking it useless to do so.

Please take a look at the images enclosed. Please not the question marks on the index and synonym files.

Cordially,
pz
Attached Thumbnails
Click image for larger version

Name:	py1.jpg
Views:	33
Size:	144.1 KB
ID:	196609   Click image for larger version

Name:	py2.jpg
Views:	30
Size:	80.5 KB
ID:	196610  

Last edited by pzack; 09-18-2022 at 12:37 PM.
pzack is offline  
Old 09-18-2022, 01:12 PM   #67
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
pyglossary tsv file conversion message 2

M. Sarmat89,

Here is another sceen shot of the pyglossary tsv conversion summary.


Cordially,
pz
Attached Thumbnails
Click image for larger version

Name:	py3.jpg
Views:	32
Size:	127.8 KB
ID:	196615  
pzack is offline  
Old 09-18-2022, 01:57 PM   #68
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 404
Karma: 2146264
Join Date: Nov 2015
Device: none
You seem to have spurious line breaks in your destination.tsv file. Use the file with long lines obtained in the post 35 to add the tabulation.
Sarmat89 is offline  
Old 09-21-2022, 01:36 PM   #69
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Pyglossary conversion to stardict

Hello M. Sarmat89,

I think that I have been trying to drown a fish; the pyglossary conversion of my files/s has not been working. I will admit that there may be "operator error"-myself-to be exact.

You have been gracious enough to give me some perl code to try and convert the csv and txt files but the indexes are not constructed properly.

I have an .xml file-I am not sure if this is the complete dictionary-but pyglossary wants the formats and extensions supported included here:


Format Extension Read Write
ABBYY Lingvo DSL .dsl X
AppleDict Source .xml X
Babylon .bgl X
Babylon Source .gls X
DictionaryForMIDs X
DICTD dictionary server .index X X
FreeDict .tei X
Gettext Source .po X X
SQLite MDic .m2 Sib .sdb X X
Octopus MDic .mdx X
Octopus MDic Source .txt X X
Omnidic X X
PMD X X
Sdictionary Binary .dct X
Sdictionary Source .sdct X
SQL X
StarDict .ifo X X
Tabfile .txt, .dic X X
TreeDict X
XDXF .xdxf X
xFarDic .xdb

The xml that I have is probably not in an appledictsource format. Can't say for certain. I tried it in pyglossary but pyglossary seems to hang and nothing appears in the window.

There are the stardict and penelope converters but I cannot seem to get someone to tell me how to install them in windows or linux much less work with these applications. Plus, I am not sure that these apps would be successful where pyglossary was not.

I am not a programmer and this puts me at a disadvantage and I am certainly thankful for the time and help(you and M. Markismus) that you have given me.

I am disappointed, to say the least, that I couldn't succeed in getting this dictionary into stardict under koreader. The pdf version is a searchable file but has nowhere near the convenience of use under stardict and koreader.

I don't know what else can be done.

Very cordially,
pz

Last edited by pzack; 09-21-2022 at 01:38 PM.
pzack is offline  
Old 09-21-2022, 01:50 PM   #70
Markismus
Guru
Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.
 
Markismus's Avatar
 
Posts: 848
Karma: 144987
Join Date: Jul 2013
Location: Netherlands
Device: Cracked HiSenseA5ProCC, cracked OnyxNotePro, lots of cracked Kobo's
Well, you could upload the file and have us take a look at it.
Markismus is offline  
Old 09-21-2022, 08:08 PM   #71
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Good evening, M Markismus,

Good to hear from you.

I will try to upload a folder containing several files including txt and xml.

Or, please let me know if it might be easier to upload a torrent file containing all the files.

Cordially,

pz
pzack is offline  
Old 09-21-2022, 08:16 PM   #72
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
up torrent of grand dictionnaire for conversion to stardict

Good evening M Markismus, M. Sarmat89,

Perhaps, you will succeed where I did not. It would give me great pleasure if you can convert this dictionary to stardict for use under Koreader.

Have at it! And good luck!

Please, kindly confirm that you have received the torrent attached.

Very cordially,
pz
pzack is offline  
Old 09-21-2022, 08:21 PM   #73
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,026
Karma: 71684510
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
Before sharing that dictionary, is it public domain or copyrighted?

If not public domain you should not be sharing it with other people.

Sent from my Pixel 4a using Tapatalk
PeterT is offline  
Old 09-22-2022, 06:58 AM   #74
Markismus
Guru
Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.Markismus is not required to obey the law of gravity.
 
Markismus's Avatar
 
Posts: 848
Karma: 144987
Join Date: Jul 2013
Location: Netherlands
Device: Cracked HiSenseA5ProCC, cracked OnyxNotePro, lots of cracked Kobo's
@pzack No torrent is attached. If the source pdf-file it is copyrighted, it would be better to sent a link via PM, so that mobileread isn't hosting data derived from copyrighted material.
Markismus is offline  
Old 09-22-2022, 10:39 AM   #75
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 404
Karma: 2146264
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by pzack View Post
I think that I have been trying to drown a fish; the pyglossary conversion of my files/s has not been working. I will admit that there may be "operator error"-myself-to be exact.
What exactly did you try to do, what was the result, and what went wrong, step by step?
Sarmat89 is offline  
Closed Thread

Tags
pyglossary

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to PDF conversion causes all the text to be aligned to the left Swifty4635 Conversion 1 01-16-2022 11:17 PM
Desktop App How do I run PyGlossary on Windows ? Bilingual Kobo Reader 2 07-12-2020 02:54 PM
epub 2 PDF conversion with OCR in PDF possible? hobi2000 Conversion 2 03-25-2019 04:20 AM
PDF conversion keeping pdf page highstream Conversion 3 05-31-2016 12:46 PM
PDF to PDF conversion creates much larger file? rocketcat Conversion 11 09-30-2011 08:37 PM


All times are GMT -4. The time now is 02:46 AM.


MobileRead.com is a privately owned, operated and funded community.