Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-06-2011, 03:50 PM   #1
Hanspl
Junior Member
Hanspl began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2011
Location: South Germany
Device: Sony PRS-T1
Conversion - Input Encoding MacRoman

Hi,

I'm wrestling to convert some really nasty German language PDFs with umlauts äöüÄÖÜß - they contain several different encodings, I've seen the 'ü' converted in one file to four different wrong glyphs but never right...

The bulk of the documents seem to be encoded in MacRoman (code page 10000? see http://en.wikipedia.org/wiki/Mac_OS_Roman ), but this is not selectable in Calibre's Preferences.

Is there a way to select Input Encoding MacRoman, or could it be added to Calibre?

Thanks, Hans
Hanspl is offline   Reply With Quote
Old 12-06-2011, 03:59 PM   #2
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
The encoding list is only common ones. You can type in any you need. Look up the Python encoding documentation to see what it's called.
user_none is offline   Reply With Quote
Advert
Old 12-06-2011, 04:17 PM   #3
Hanspl
Junior Member
Hanspl began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2011
Location: South Germany
Device: Sony PRS-T1
Oh, wow, thanks - it's called 'mac_roman' !

But this doesn't help -- there must be additional problems. It simply does make no difference whatever I choose for Input Encoding, my umlauts are converted constantly wrong.

These are funny PDFs: in all PDF viewers the umlauts are alright. I can convert them to Postscript (pdftops) - still alright (in evince), and back to PDF (ps2pdf) - still alright. But pdftotext and mark/copy/paste and Calibre give me wrong umlauts... I' stumped.

Thank you, Hans
Hanspl is offline   Reply With Quote
Old 12-06-2011, 10:26 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,198
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That will be a pdf that uses non unicode character codes along with an embedded font.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Output profile and Input encoding questions Amalthea Conversion 2 03-24-2011 12:21 PM
Conversion problem from PDB (encoding?) ChristopherTD Conversion 10 02-05-2011 02:59 AM
How to set input encoding for pdb? Licho Conversion 2 01-28-2011 05:47 PM
Looking For MHT Input Conversion Plugin FlooseMan Dave Plugins 4 03-30-2010 05:52 PM
Conversion errors (encoding?) Dave Berk Calibre 3 11-25-2008 02:23 PM


All times are GMT -4. The time now is 08:15 AM.


MobileRead.com is a privately owned, operated and funded community.