Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Limits on file numbers in sort -m
- Date: Thu, 29 May 2014 22:21:48 +0200
- From: Bruno Raoult <braoult@example.com>
- Subject: Re: [tlug] Limits on file numbers in sort -m
- References: <CABHGxq7jYkDDLkF8uzzNK8WeU+37t1wgpVhk6VD2HQKyEi7wBw@mail.gmail.com> <CAJMSLH618MfmhL9ufAOfLXxw52i4STpF8dsc_+xe-2GRB3JM8g@mail.gmail.com> <87bnui8sky.fsf@uwakimon.sk.tsukuba.ac.jp> <CABHGxq4NEBMVR8jndiEvcgsGkc_B0f-qcrs2sFjqaAdWH3n9sw@mail.gmail.com> <CAJMSLH6SdSUmvHsjmZBZP-g1graNuPV51vdwLzpPf7ipmz7+zA@mail.gmail.com> <CABHGxq7eCk9Pk1JtNrZuqK_8yv4bt7ftoWwyXqf5P+GKYQH=5w@mail.gmail.com> <87sins7mhy.fsf@uwakimon.sk.tsukuba.ac.jp> <CAJA1Y2b6XyFNsFhDbK+ktgWk0cE5Lzfv9OrhimBH8RyN78yzLQ@mail.gmail.com> <87d2ew76yd.fsf@uwakimon.sk.tsukuba.ac.jp>
On Thu, May 29, 2014 at 8:52 PM, Stephen J. Turnbull <stephen@example.com> wrote:
Bruno Raoult writes:One that reads the entire contents of each of several thousand files
> Could you precise again "which kind of application"?
each of which is 4 million lines long.
Exactly my point.
> A syscall is difficult to track, except when following them (which
> is very difficult, but possible). Using buffered I/O is to avoid
> syscalls.
> uniq -c does it.
It does. The problem is that if Jim merges 100 files 100 times
(that's 10,000 files) and then runs uniq -c on each of the 100 merged
files, he now has 100 files each of which has a count for each line.
If the "real" line in two such files is a duplicate, then he gets two
such lines. That's what Jim meant by
3 this <- result of merge 1
4 this <- result of merge 2
Then he wants to merge the two files and get
7 this
because there were 7 lines like that in all the different files.
(Actually he wants the count after the "real" text of the line, but
that's not a big deal.) But he can't, because uniq doesn't know about
its own output format. (You can use the -f flags to uniq and sort to
ignore the counts for sorting and uniquifying, but you're not quite
there because uniq won't add up the counts for identical lines from
the 1st pass merges for you.)I keep your entire post on purpose...So "uniq *" was able to read files, but "sort -m *" was not, right?And a "uniq | sort | uniq" is not possible???I am stupid, I dont understand the issue at all :-(, and I would liketo understand clearly, with output of commands if possible...
br.
--
2 + 2 = 5, for very large values of 2.
- Follow-Ups:
- Re: [tlug] Limits on file numbers in sort -m
- From: Travis Cardwell
- References:
- [tlug] Limits on file numbers in sort -m
- From: Jim Breen
- Re: [tlug] Limits on file numbers in sort -m
- From: 黒鉄章
- Re: [tlug] Limits on file numbers in sort -m
- From: Stephen J. Turnbull
- Re: [tlug] Limits on file numbers in sort -m
- From: Jim Breen
- Re: [tlug] Limits on file numbers in sort -m
- From: 黒鉄章
- Re: [tlug] Limits on file numbers in sort -m
- From: Jim Breen
- Re: [tlug] Limits on file numbers in sort -m
- From: Stephen J. Turnbull
- Re: [tlug] Limits on file numbers in sort -m
- From: Bruno Raoult
- Re: [tlug] Limits on file numbers in sort -m
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Limits on file numbers in sort -m
- Next by Date: Re: [tlug] Limits on file numbers in sort -m
- Previous by thread: Re: [tlug] Limits on file numbers in sort -m
- Next by thread: Re: [tlug] Limits on file numbers in sort -m
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links