Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 10-02-2022, 04:08 PM   #136
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 895
Karma: 149877
Join Date: Jul 2013
Location: Netherlands
Device: Cracked HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
If you put the file from archive.com through ABBYY Finereader 15. You can choose to save it as html formatted text, without header/footer, without pictures and without linebreaks or hyphens.
The result of the first 100 pages looks like this.

Simply put, to convert a file to another format the previous file should be generated by an algorithm. Otherwise there are too many odd things. With an algorithm all things are within brackets and nicely recognisable.

Now every entry is encapsulated in following p-tags.
A new entry is bold followed by [a word between brackets] or followed by 'ou' and another word and [a word between brackets].

You can filter out the very large capitals at the beginning of a new letter because they have a rather large font. In this case 61pt.

I'll have a go at it in the coming week, to see how far these simple rules come.

Last edited by Markismus; 10-02-2022 at 04:32 PM.
Markismus is offline  
Old 10-02-2022, 04:38 PM   #137
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,728
Karma: 6690881
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by pzack View Post
M. DOITSU,

Perhaps it bears repeating that nothing is being sold nor distributed for sale nor use in this forum(nor could it be)or anywhere else and is being used strictly and solely and unequivically for personal use.

cordially,
pz
The problem is that "personal use" exemptions don't apply unless you own a copy of the physical books.

In effect you are saying "Because I am not selling or distributing the dictionary I can do what I want." Well, no. Downloading copyrighted content doesn't give you any right to use it unless you can claim a "fair use exemption", which by the way the EU doesn't seem to recognize (see this FAQ answer that makes it clear that the copyright owner must approve). I agree that you didn't know in the beginning that the dictionary was pirated. Regardless, you and MobileRead now know. Section 8 of MobileRead's rules say that no one can help someone use pirated material.

An example: I download a copy of a copyrighted ebook for personal use, where I don't have the physical book and am therefore not format-shifting. In this case I have "pirated" the book and MobileRead's rules rule out assisting me with reformating or otherwise making this downloaded book "better" for my own use. Simply asking can get me permanently banned from MobileRead.

In the end this is a question of whether you think copyright is valid when dealing with electronic copies. I know people who say "No, it isn't. There isn't any harm." I know others (including me) who say "The author gets to choose."

I am not a MobileRead moderator so cannot make any final decisions, but such moderators are here looking at things.
chaley is offline  
Old 10-02-2022, 05:01 PM   #138
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,532
Karma: 26944418
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Moderator Notice

Given this thread hasn't mentioned calibre since Doitsu's post #2, I'm moving it to the Workshop Forum

BR
BetterRed is offline  
Old 10-02-2022, 08:55 PM   #139
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,219
Karma: 145277352
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by pzack View Post
Given the massive quantity of electronic material made available to the public, I, like countless general users of the internet, usually do not have the time or means to track down copyright issues or violations.
Quote:
Originally Posted by pzack View Post
I am astonished that I am being accused of attempted pirating! However, two members called my attention to possible copyright issues and use of complete files in the forum.
It is the downloader's responsibility to check that a book is actually out of copyright or was released to the public domain. In the case of the book in question, the copyright page is in the upload to the Internet Archive leaving no question that this is pirated material. You did read the first few pages of the ebook after you downloaded it?

Quote:
Originally Posted by pzack View Post
This forum is not using nor distributing, nor could it, any file or files that I personally have presented in this forum and what little that I have provided is for my own personnal use. And what I have used is not, knowingly, illegal, pirated or to anything to that effect.

That you want to make an issue of this is your lookout.
I hate to burst your bubble but considering the repetitions of "GRAND LAROUSSE DE LA LANGUE FRANÇAISE" in the small sample of the 6400+ pages in the original, I find it hard to believe that you didn't happen to notice them. And personal use is not an excuse for piracy. Are you claiming the files you finally attached are not well beyond limitations on fair use.

As for making an issue of it? I would prefer if Mobileread was not involved in a copyright infringement case.

Quote:
Originally Posted by pzack View Post
Perhaps it bears repeating that nothing is being sold nor distributed for sale nor use in this forum(nor could it be)or anywhere else and is being used strictly and solely and unequivically for personal use.
For what it is worth, someone uploading a copy of an ebook to the internet does not make it public domain nor does downloading that ebook give you any rights. The Grand Larousse de la Langue Française is a copyrighted work. Period. End of discussion.

This does not come fall into the case where someone is attempting to make an ebook from a physical book they own which, as far as I am aware, is not permitted within the EU. What we have here is a plain and simple act of piracy committed by an incompetent.
DNSB is offline  
Old 10-03-2022, 01:12 PM   #140
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello, M Sarmat89,

I tried the new code on the original text file. The index file had that question mark; some words were found, others not. There were, in some cases, several lines of definition included but from definitions quite long.

Like you said the file would need a lot of manual modification, something that I am not going to devote time too.

Please let me know if you have any other suggestions, maybe you might think of something and I certainly am open to try anything that might have a chance to work.

But a manual manipulation on such a large file I think would be out of the question.

Very cordially,
pz

Last edited by pzack; 10-03-2022 at 01:38 PM.
pzack is offline  
Old 10-03-2022, 01:29 PM   #141
pzack
Connoisseur
pzack began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Aug 2022
Device: kobo sage,elipsa
Hello M Markismus,

Sorry to get back to you a little late as I was finally able to try some new code that M Sarmat89 gave me. Unfortunately, nothing really changed.

I looked at your file and it looks real nice, better than the pdf!

You have put the file into a different format but of course we will need to have something that pyglossary will accept for conversion to stardict.

I look forward to what you might come up with in a weeks time.

On another note, the moderator is moving this thread as it wasn't about or related to Calibre. Frankly, this thread was never about Calibre; I certainly never mentioned Calibre.

I don't know what happens when it is moved and how to find it or how to contact you if I can't connect to the thread.

Would you, at your earliest convenience, explain this to me.

Glad that you are still sticking with helping me with a stardict conversion, thankyou and a thanks to M Sarmat89 who has put a lot of effort, as you have, into this conversion-or rather-attempted conversion up to now.

Very cordially,
pz
pzack is offline  
Old 10-03-2022, 04:53 PM   #142
jmurphy
Connoisseur
jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.
 
Posts: 87
Karma: 1133066
Join Date: Sep 2007
Device: ipaq
Quote:
Originally Posted by pzack View Post
Frankly, this thread was never about Calibre; I certainly never mentioned Calibre.
You posted in the Calibre subforum dedicated to questions about using Calibre to convert from one supported ebook format to another. If your message was never about Calibre, you should not have posted it in a Calibre subforum.

Regarding the moved thread:
Quote:
Originally Posted by pzack View Post
I don't know what happens when it is moved and how to find it
You found it.


jmurphy
jmurphy is offline  
Old 10-03-2022, 06:23 PM   #143
issybird
o saeclum infacetum
issybird ought to be getting tired of karma fortunes by now.issybird ought to be getting tired of karma fortunes by now.issybird ought to be getting tired of karma fortunes by now.issybird ought to be getting tired of karma fortunes by now.issybird ought to be getting tired of karma fortunes by now.issybird ought to be getting tired of karma fortunes by now.issybird ought to be getting tired of karma fortunes by now.issybird ought to be getting tired of karma fortunes by now.issybird ought to be getting tired of karma fortunes by now.issybird ought to be getting tired of karma fortunes by now.issybird ought to be getting tired of karma fortunes by now.
 
issybird's Avatar
 
Posts: 20,193
Karma: 222235366
Join Date: Oct 2010
Location: New England
Device: H2O, Aura One, PW5
Moderator Notice
Closing the thread. MR does not condone pirating copyrighted works nor helping those who pirate. This also serves as a warning; a similar request will result in a ban.
issybird is offline  
Closed Thread

Tags
pyglossary

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to PDF conversion causes all the text to be aligned to the left Swifty4635 Conversion 1 01-16-2022 10:17 PM
Desktop App How do I run PyGlossary on Windows ? Bilingual Kobo Reader 2 07-12-2020 01:54 PM
epub 2 PDF conversion with OCR in PDF possible? hobi2000 Conversion 2 03-25-2019 03:20 AM
PDF conversion keeping pdf page highstream Conversion 3 05-31-2016 11:46 AM
PDF to PDF conversion creates much larger file? rocketcat Conversion 11 09-30-2011 07:37 PM


All times are GMT -4. The time now is 04:55 PM.


MobileRead.com is a privately owned, operated and funded community.