TLUG Mailing List

Re: [tlug] Limits on file numbers in sort -m

On Thu, May 29, 2014 at 7:32 AM, Jim Breen <jimbreen@example.com> wrote:

> Regarding the count of occurrences you could pipe the "sort -m ...." into
> "uniq -c". I've always been annoyed by the format of uniq (a space-padded,
> fixed-width count as the first column) but if you can live with that you'll
> be getting to what you want quicker. The pipe to uniq will consume it's
> input buffer very quickly so it's not going to be the case that all of the
> output of sort must stay in memory as long as the process is running. Also
> if duplicates are common, your final output file saved to disk will be
> usefully smaller.

In any case the output from "uniq -c" is not what I want, so since I'd need to
reformat it it's easier to use my own utility. It also give me the
option of turning

this 3
this 4

into

this 7

which I can't do with "uniq -c".

If you pipe the output of sort -m, you will get one line only (your "this 7").

I don't understand how you get 2 lines with uniq -c.

br@lorien:/export/home/br$ cat x
a
a a
c
d
br@lorien:/export/home/br$ cat y
a
a a
b
c
br@lorien:/export/home/br$ sort -m x y | uniq -c
      2 a
      2 a a
      1 b
      2 c
      1 d
br@lorien:/export/home/br$ sort -m x y | uniq -c | sed -e 's/^[ ]*//' -e 's/ /,/'
2,a
2,a a
1,b
2,c
1,d

The last command is to get a csv (if the filename contains special characters, such as comma or double quotes, they should be handled differently, but this could be enough if your filenames are not in this case).

br.

--
2 + 2 = 5, for very large values of 2.