Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-16-2022, 06:45 AM   #61
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Your source is too low quality to rely on automatic procedures. You already have a file with long lines, open it in Notepad++ or like, paste the expressions into "Search" and "Replace" boxes, set the "Regular expression" switch, and press "Replace All". Then find where that expression failed, and replace those manually.

What "spaces" are you talking about?
Sarmat89 is offline  
Old 09-16-2022, 12:49 PM   #62
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Good afternoon M Samart89,

Thank you for your response.

Would you give me some concrete examples of the search and replace commands in notepad++. I have never really worked with a text in notepad.

The spaces are the empty lines between definition text; I don't know why the definition is not consecutive lines of text and if you have to eliminate these lines.

Look for the first beginning(string)headword on the line with the bracket "[" following the string headword; if found insert a tab,otherwise,continue to next line. This would be the english language instruction set(or something similar to this) to be put into code for tab delimiting the text file. I am not sure where you would insert the tab.

What would have been a "high quality" text file?

Are you really suggesting manual modification to this voluminous text file?

Cordially,
dk
pzack is offline  
Advert
Old 09-16-2022, 02:53 PM   #63
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Your file contains OCR errors.

Code:
^([^[]+?) *(?=\[)
replaced with
Code:
\1\t
will insert the tab after the text preceding the first [ in the line, excluding the spaces, and the [ itself.

If you are going to use perl, try
Code:
perl -pe "s:^([^[]+?) *(?=\[):\1\t:" <your-file-here >destination.tsv
Sarmat89 is offline  
Old 09-17-2022, 01:48 PM   #64
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello M. Samart89,

Thank you for your response and the Perl codes. You have me creating a TSV file I think. However, pyglossary, as far as I know, supports csv files for conversion and not tsv. According to Github the tsv extension is not listed as supported extensions in pyglossary.

Any suggestions?

cordially,
pz
pzack is offline  
Old 09-17-2022, 03:49 PM   #65
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Works fine here.
Sarmat89 is offline  
Advert
Old 09-18-2022, 11:34 AM   #66
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
pyglossary conversion of tsv file

Good afternoon M. Samart89,

Pyglossary did indeed use the tsv file. Here are two images included(hope they are attached correctly).

Lots of no tab errors, not the headwords,but just text. And the index .idx file is probably corrupt.

I didn't put the files into koreader to try them thinking it useless to do so.

Please take a look at the images enclosed. Please not the question marks on the index and synonym files.

Cordially,
pz
Attached Thumbnails
Click image for larger version

Name:	py1.jpg
Views:	93
Size:	144.1 KB
ID:	196609   Click image for larger version

Name:	py2.jpg
Views:	91
Size:	80.5 KB
ID:	196610  

Last edited by pzack; 09-18-2022 at 11:37 AM.
pzack is offline  
Old 09-18-2022, 12:12 PM   #67
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
pyglossary tsv file conversion message 2

M. Sarmat89,

Here is another sceen shot of the pyglossary tsv conversion summary.


Cordially,
pz
Attached Thumbnails
Click image for larger version

Name:	py3.jpg
Views:	90
Size:	127.8 KB
ID:	196615  
pzack is offline  
Old 09-18-2022, 12:57 PM   #68
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
You seem to have spurious line breaks in your destination.tsv file. Use the file with long lines obtained in the post 35 to add the tabulation.
Sarmat89 is offline  
Old 09-21-2022, 12:36 PM   #69
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Pyglossary conversion to stardict

Hello M. Sarmat89,

I think that I have been trying to drown a fish; the pyglossary conversion of my files/s has not been working. I will admit that there may be "operator error"-myself-to be exact.

You have been gracious enough to give me some perl code to try and convert the csv and txt files but the indexes are not constructed properly.

I have an .xml file-I am not sure if this is the complete dictionary-but pyglossary wants the formats and extensions supported included here:


Format Extension Read Write
ABBYY Lingvo DSL .dsl X
AppleDict Source .xml X
Babylon .bgl X
Babylon Source .gls X
DictionaryForMIDs X
DICTD dictionary server .index X X
FreeDict .tei X
Gettext Source .po X X
SQLite MDic .m2 Sib .sdb X X
Octopus MDic .mdx X
Octopus MDic Source .txt X X
Omnidic X X
PMD X X
Sdictionary Binary .dct X
Sdictionary Source .sdct X
SQL X
StarDict .ifo X X
Tabfile .txt, .dic X X
TreeDict X
XDXF .xdxf X
xFarDic .xdb

The xml that I have is probably not in an appledictsource format. Can't say for certain. I tried it in pyglossary but pyglossary seems to hang and nothing appears in the window.

There are the stardict and penelope converters but I cannot seem to get someone to tell me how to install them in windows or linux much less work with these applications. Plus, I am not sure that these apps would be successful where pyglossary was not.

I am not a programmer and this puts me at a disadvantage and I am certainly thankful for the time and help(you and M. Markismus) that you have given me.

I am disappointed, to say the least, that I couldn't succeed in getting this dictionary into stardict under koreader. The pdf version is a searchable file but has nowhere near the convenience of use under stardict and koreader.

I don't know what else can be done.

Very cordially,
pz

Last edited by pzack; 09-21-2022 at 12:38 PM.
pzack is offline  
Old 09-21-2022, 12:50 PM   #70
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 897
Karma: 149877
Join Date: Jul 2013
Location: Netherlands
Device: Cracked HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
Well, you could upload the file and have us take a look at it.
Markismus is offline  
Old 09-21-2022, 07:08 PM   #71
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Good evening, M Markismus,

Good to hear from you.

I will try to upload a folder containing several files including txt and xml.

Or, please let me know if it might be easier to upload a torrent file containing all the files.

Cordially,

pz
pzack is offline  
Old 09-21-2022, 07:16 PM   #72
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
up torrent of grand dictionnaire for conversion to stardict

Good evening M Markismus, M. Sarmat89,

Perhaps, you will succeed where I did not. It would give me great pleasure if you can convert this dictionary to stardict for use under Koreader.

Have at it! And good luck!

Please, kindly confirm that you have received the torrent attached.

Very cordially,
pz
pzack is offline  
Old 09-21-2022, 07:21 PM   #73
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,167
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
Before sharing that dictionary, is it public domain or copyrighted?

If not public domain you should not be sharing it with other people.

Sent from my Pixel 4a using Tapatalk
PeterT is offline  
Old 09-22-2022, 05:58 AM   #74
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 897
Karma: 149877
Join Date: Jul 2013
Location: Netherlands
Device: Cracked HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
@pzack No torrent is attached. If the source pdf-file it is copyrighted, it would be better to sent a link via PM, so that mobileread isn't hosting data derived from copyrighted material.
Markismus is offline  
Old 09-22-2022, 09:39 AM   #75
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by pzack View Post
I think that I have been trying to drown a fish; the pyglossary conversion of my files/s has not been working. I will admit that there may be "operator error"-myself-to be exact.
What exactly did you try to do, what was the result, and what went wrong, step by step?
Sarmat89 is offline  
Closed Thread

Tags
pyglossary


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to PDF conversion causes all the text to be aligned to the left Swifty4635 Conversion 1 01-16-2022 10:17 PM
Desktop App How do I run PyGlossary on Windows ? Bilingual Kobo Reader 2 07-12-2020 01:54 PM
epub 2 PDF conversion with OCR in PDF possible? hobi2000 Conversion 2 03-25-2019 03:20 AM
PDF conversion keeping pdf page highstream Conversion 3 05-31-2016 11:46 AM
PDF to PDF conversion creates much larger file? rocketcat Conversion 11 09-30-2011 07:37 PM


All times are GMT -4. The time now is 03:38 PM.


MobileRead.com is a privately owned, operated and funded community.