Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] When is a line feed really a line feed?
- Date: Sat, 04 Dec 2010 15:58:27 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] When is a line feed really a line feed?
- References: <4CF92E9D.3020004@example.com> <1291400314.14141.1408520355@example.com>
David J Iannucci writes: > I'm no authority on this stuff, but I think that \n doesn't refer to an > actual character... I think it is an abstraction referring to whatever > is the line terminator used by the OS at hand (making the other guy's > statement somewhat tautological :-) No, it refers to LF. Indeed the "n" is probably supposed to be mnemonic for "newline", but in every language I know of it means LF. The language definitions (eg, ISO C, Python Language Reference, Emacs Lisp Reference, ...) say so. However, in files that are declared as "text" this will be silently converted by the I/O subsystem to the platform EOL market. That's (mostly) why Unix doesn't need to distinguish text vs. binary files, and (I would guess) why Mac doesn't use CR as a line terminator any more. It's possible that because of platform-specific I/O behavior the interpretation you give is widespread, but technically it's incorrect. > The actual characters are CR (ASCII 13) and LF (ASCII 10). In fact there is a whole pile of such characters, including CR, LF, NL (IIRC ISO 6429 0x85), and Unicode LINE SEPARATOR (U+2028 or U+2029, IIRC, the other one is Unicode PARAGRAPH SEPARATOR). > Mac uses only CR Not since the introduction of Mac OS X, it doesn't. Note: In many modern environments, there is a "universal newlines" mode (Python's name for it) which conforms more or less to UAX #9 (now part of the standard) "The Unicode Line-Breaking Algorithm" regarding parsing of newlines. In summary, *all* of CR, LF, CRLF, LINE SEPARATOR, and PARAGRAPH SEPARATOR are regarded as separating lines. There are also a few relatively unusual characters which Unicode doesn't assign other semantics to that act as line separators, such as ASCII VT (vertical tabulation, ASCII 11) and ASCII FF (form feed, ASCII 12). However these do often get other semantics in applications. So gedit and Emacs also conform, by detecting the EOL convention in use and displaying them as newlines. Emacs at least also treats VT and FF as line breaks, plus additional semantics in some modes. Output of newlines is still hairy, because most environments don't come close to conforming to Unicode (which strongly recommends use of the unambiguous LINE SEPARATOR for hard line breaks and PARAGRAPH separator where you expect the software to provide appropriate line breaks for you at display time). So all user-friendly environments convert to platform convention by default. As Dave observes this can be annoying because it's hard to see what convention is used in the editor. Emacs provides an EOL indicator in the mode line, and if you're worried about mixed EOL conventions, you can specify the coding system as "undecided-unix" to enforce Unix EOL, in which case CF displays as "^M" in the buffer. It becomes *really* obvious which lines have which convention. :-) While I can't necessarily recommend Emacs to everybody, there's a very good chance that you usually use YFE[1], and I've heard that YFE has a similar feature. :-) Footnotes: [1] Your Favorite Editor.
- References:
- [tlug] When is a line feed really a line feed?
- From: Dave M G
- Re: [tlug] When is a line feed really a line feed?
- From: David J Iannucci
Home | Main Index | Thread Index
- Prev by Date: [tlug] Job Post - A Technical Account Manager position for cable & telecom in Tokyo
- Next by Date: Re: [tlug] When is a line feed really a line feed?
- Previous by thread: Re: [tlug] When is a line feed really a line feed?
- Next by thread: Re: [tlug] When is a line feed really a line feed?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links