TLUG Mailing List

Mailing List Archive
Support open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] iconv / Python / unicode question

To: tlug@example.com

Subject: Re: [tlug] iconv / Python / unicode question

From: Ben Gertzfield <che@example.com>

Date: Tue, 02 Apr 2002 15:28:40 +0900

Content-type: text/plain; charset=us-ascii

In-reply-to: <20020401111432.GA30783@example.com> (FrankBennett's message of "Mon, 1 Apr 2002 20:14:32 +0900")

Organization: Debian GNU/Linux

References: <20020401111432.GA30783@example.com>

Sender: ben@example.com

User-agent: Gnus/5.090006 (Oort Gnus v0.06) XEmacs/21.4 (Civil Service,i386-debian-linux)
>>>>> "Frank" == Frank Bennett <bennett@example.com> writes:

    Frank> Is there a toggle in the python unicode object that will
    Frank> just drop non-conforming characters on the floor?  Failing
    Frank> that, is there __any__ filter that will strip out these
    Frank> blocking characters from a file, so that it can be run
    Frank> through these tools without blowing them up?

When converting to Unicode, pass in 'replace' or 'ignore' for the
errors param to the built-in function unicode():

http://www.python.org/doc/lib/built-in-funcs.html

unicode(object[, encoding[, errors]])

    Return the Unicode string version of object using one of the
    following modes:

    If encoding and/or errors are given, unicode() will decode the
    object which can either be an 8-bit string or a character buffer
    using the codec for encoding. The encoding parameter is a string
    giving the name of an encoding. Error handling is done according
    to errors; this specifies the treatment of characters which are
    invalid in the input encoding. If errors is 'strict' (the
    default), a ValueError is raised on errors, while a value of
    'ignore' causes errors to be silently ignored, and a value of
    'replace' causes the official Unicode replacement character,
    U+FFFD, to be used to replace input characters which cannot be
    decoded. See also the codecs module.

    [snip]

When converting a Unicode string to some other encoding, do the same
when calling your_unistring.encode():

http://www.python.org/doc/lib/string-methods.html

encode([encoding[,errors]])

    Return an encoded version of the string. Default encoding is the
    current default string encoding. errors may be given to set a
    different error handling scheme. The default for errors is
    'strict', meaning that encoding errors raise a ValueError. Other
    possible values are 'ignore' and 'replace'. New in version 2.0.

Ben

-- 
Brought to you by the letters Q and S and the number 11.
"Wuzzle means to mix."
Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/
References:

[tlug] iconv / Python / unicode question
From: Frank Bennett

Prev by Date: Re: [tlug] configuration

Next by Date: [tlug] And they said it was Free?

Previous by thread: Re: [tlug] iconv / Python / unicode question

Next by thread: RE: [tlug] And they said it was Free?

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links