Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Limits on file numbers in sort -m
- Date: Fri, 30 May 2014 03:52:58 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] Limits on file numbers in sort -m
- References: <CABHGxq7jYkDDLkF8uzzNK8WeU+37t1wgpVhk6VD2HQKyEi7wBw@mail.gmail.com> <CAJMSLH618MfmhL9ufAOfLXxw52i4STpF8dsc_+xe-2GRB3JM8g@mail.gmail.com> <87bnui8sky.fsf@uwakimon.sk.tsukuba.ac.jp> <CABHGxq4NEBMVR8jndiEvcgsGkc_B0f-qcrs2sFjqaAdWH3n9sw@mail.gmail.com> <CAJMSLH6SdSUmvHsjmZBZP-g1graNuPV51vdwLzpPf7ipmz7+zA@mail.gmail.com> <CABHGxq7eCk9Pk1JtNrZuqK_8yv4bt7ftoWwyXqf5P+GKYQH=5w@mail.gmail.com> <87sins7mhy.fsf@uwakimon.sk.tsukuba.ac.jp> <CAJA1Y2b6XyFNsFhDbK+ktgWk0cE5Lzfv9OrhimBH8RyN78yzLQ@mail.gmail.com>
Bruno Raoult writes: > Could you precise again "which kind of application"? One that reads the entire contents of each of several thousand files each of which is 4 million lines long. > A syscall is difficult to track, except when following them (which > is very difficult, but possible). Using buffered I/O is to avoid > syscalls. Exactly my point. > uniq -c does it. It does. The problem is that if Jim merges 100 files 100 times (that's 10,000 files) and then runs uniq -c on each of the 100 merged files, he now has 100 files each of which has a count for each line. If the "real" line in two such files is a duplicate, then he gets two such lines. That's what Jim meant by 3 this <- result of merge 1 4 this <- result of merge 2 Then he wants to merge the two files and get 7 this because there were 7 lines like that in all the different files. (Actually he wants the count after the "real" text of the line, but that's not a big deal.) But he can't, because uniq doesn't know about its own output format. (You can use the -f flags to uniq and sort to ignore the counts for sorting and uniquifying, but you're not quite there because uniq won't add up the counts for identical lines from the 1st pass merges for you.)
- Follow-Ups:
- Re: [tlug] Limits on file numbers in sort -m
- From: Bruno Raoult
- References:
- [tlug] Limits on file numbers in sort -m
- From: Jim Breen
- Re: [tlug] Limits on file numbers in sort -m
- From: 黒鉄章
- Re: [tlug] Limits on file numbers in sort -m
- From: Stephen J. Turnbull
- Re: [tlug] Limits on file numbers in sort -m
- From: Jim Breen
- Re: [tlug] Limits on file numbers in sort -m
- From: 黒鉄章
- Re: [tlug] Limits on file numbers in sort -m
- From: Jim Breen
- Re: [tlug] Limits on file numbers in sort -m
- From: Stephen J. Turnbull
- Re: [tlug] Limits on file numbers in sort -m
- From: Bruno Raoult
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Limits on file numbers in sort -m
- Next by Date: Re: [tlug] Limits on file numbers in sort -m
- Previous by thread: Re: [tlug] Limits on file numbers in sort -m
- Next by thread: Re: [tlug] Limits on file numbers in sort -m
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links