Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Character encoding stuff
- Date: Fri, 31 May 2013 08:07:01 +0900
- From: Darren Cook <darren@example.com>
- Subject: Re: [tlug] Character encoding stuff
- References: <51A76F77.9030306@imaginatorium.org>
- User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130330 Thunderbird/17.0.5
> (1) In particular, when scraping jigsaw puzzle manufacturer websites, I > want to know what characters I'm looking at. ... I'll mention this as useful for character encoding work, but I don't know if it helps for what you are doing: http://php.net/manual/en/book.intl.php This is a heavy-duty set of functions, the ICU library, developed by IBM originally (IIRC). It is built-in to php 5.4.x, can be added as a pecl module for earlier versions. > But it would be nice to get more than just numbers: stuff like > "Cyrillic", "Punctuation" etc. Is this a tool to use interactively? To satisfy your curiosity? Or you want to normalize/simplify/transliterate, to make your pattern matching simpler? Darren -- Darren Cook, Software Researcher/Developer http://dcook.org/work/ (About me and my work) http://dcook.org/blogs.html (My blogs and articles)
- Follow-Ups:
- Re: [tlug] Character encoding stuff
- From: Stephen J. Turnbull
- References:
- [tlug] Character encoding stuff
- From: Brian Chandler
Home | Main Index | Thread Index
- Prev by Date: [tlug] Character encoding stuff
- Next by Date: Re: [tlug] Dust busters?
- Previous by thread: [tlug] Character encoding stuff
- Next by thread: Re: [tlug] Character encoding stuff
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links