Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- Date: Fri, 29 Jun 2018 15:25:38 +0900
- From: "Stephen J. Turnbull" <turnbull.stephen.fw@example.com>
- Subject: Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- References: <23345.41167.951877.900876@turnbull.sk.tsukuba.ac.jp> <23345.44414.330392.350450@turnbull.sk.tsukuba.ac.jp> <CAKXLc7c-LzgY5AtE8XrZzKUrr206nXtmxdtKQC0q8PkcMjiF7A@mail.gmail.com> <CABHGxq6CkEeQVHy7rjjbTP72mOm_QQ5xtthPuPR-QASYQoS_ag@mail.gmail.com> <07A05935-BBD8-4C13-AEF6-667D653EBE45@brightblack.net> <23346.65438.401753.15741@turnbull.sk.tsukuba.ac.jp> <CABHGxq5mnJgiSxGKEXZ4KYBAuVB0YBUwMqi+duoTk1iSeXj9PQ@mail.gmail.com> <23348.26994.89985.547640@turnbull.sk.tsukuba.ac.jp> <CABHGxq6uUOvhMGi50tev5ckXTo=R+bxykDLOOCeCiiAUVHF=wQ@mail.gmail.com>
Jim Breen writes: > I don't think [fixed-width 3-octet] would be awkward at all. Much > of my recent text-processing work has used UTF-8 throughout and > it's not been a problem. OK. A lot of the issues with Emacs and odd octet widths come from generic memory management where many systems really like power-of-2 alignment, and certain kinds of string matching, which it turns out can be greatly speeded up if you do them 32 or 64 bits at a time :-). > > Python 3 moved to a content-dependent fixed-width type. If your > > string is all ISO-8859-1, it's encoded as an array of octets. If > > it contains even one astral character, it's UTF-32. everything > > else is UCS-2 (aka the subset of UTF-16 excluding surrogates). > > That approach sort-of makes sense, but I'd hate to be maintaining > it. A plausible take, but that kind of code has been very stable in my experience. Once you have the (simple) array of characters accesses and mutations code correct, and the (also simple) widening and narrowing code correct, optimizations tend to be very local and easy to do correctly. Of course you have to do things through the API which slightly limits how efficiently you can access and mutate the underlying storage, but it's still wicked fast compared to Emacs. ;-) > Anyway there'll be no "successor maintainers" for wwwjdic. I'll instruct > my executors to put it on the bonfire, along with my used toothbrushes > and underpants. As Kori Schake[1] likes to say, "Jim, I did not need that visual!" Footnotes: [1] https://twitter.com/deepstateradio
- References:
- [tlug] Kudos to Jim Breen
- From: Stephen J. Turnbull
- [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- From: Stephen J. Turnbull
- Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- From: Kalin KOZHUHAROV
- Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- From: Jim Breen
- Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- From: grb
- Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- From: Stephen J. Turnbull
- Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- From: Jim Breen
- Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- From: Stephen J. Turnbull
- Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- From: Jim Breen
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- Next by Date: Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- Previous by thread: Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- Next by thread: Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links