Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] iconv / Python / unicode question
- To: tlug@example.com
- Subject: Re: [tlug] iconv / Python / unicode question
- From: Ben Gertzfield <che@example.com>
- Date: Tue, 02 Apr 2002 15:28:40 +0900
- Content-type: text/plain; charset=us-ascii
- In-reply-to: <20020401111432.GA30783@example.com> (FrankBennett's message of "Mon, 1 Apr 2002 20:14:32 +0900")
- Organization: Debian GNU/Linux
- References: <20020401111432.GA30783@example.com>
- Sender: ben@example.com
- User-agent: Gnus/5.090006 (Oort Gnus v0.06) XEmacs/21.4 (Civil Service,i386-debian-linux)
>>>>> "Frank" == Frank Bennett <bennett@example.com> writes: Frank> Is there a toggle in the python unicode object that will Frank> just drop non-conforming characters on the floor? Failing Frank> that, is there __any__ filter that will strip out these Frank> blocking characters from a file, so that it can be run Frank> through these tools without blowing them up? When converting to Unicode, pass in 'replace' or 'ignore' for the errors param to the built-in function unicode(): http://www.python.org/doc/lib/built-in-funcs.html unicode(object[, encoding[, errors]]) Return the Unicode string version of object using one of the following modes: If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec for encoding. The encoding parameter is a string giving the name of an encoding. Error handling is done according to errors; this specifies the treatment of characters which are invalid in the input encoding. If errors is 'strict' (the default), a ValueError is raised on errors, while a value of 'ignore' causes errors to be silently ignored, and a value of 'replace' causes the official Unicode replacement character, U+FFFD, to be used to replace input characters which cannot be decoded. See also the codecs module. [snip] When converting a Unicode string to some other encoding, do the same when calling your_unistring.encode(): http://www.python.org/doc/lib/string-methods.html encode([encoding[,errors]]) Return an encoded version of the string. Default encoding is the current default string encoding. errors may be given to set a different error handling scheme. The default for errors is 'strict', meaning that encoding errors raise a ValueError. Other possible values are 'ignore' and 'replace'. New in version 2.0. Ben -- Brought to you by the letters Q and S and the number 11. "Wuzzle means to mix." Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/
- References:
- [tlug] iconv / Python / unicode question
- From: Frank Bennett
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] configuration
- Next by Date: [tlug] And they said it was Free?
- Previous by thread: Re: [tlug] iconv / Python / unicode question
- Next by thread: RE: [tlug] And they said it was Free?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links