Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] "How to"
- Date: Mon, 12 May 2014 13:21:37 +0200
- From: Bruno Raoult <braoult@example.com>
- Subject: Re: [tlug] "How to"
- References: <CAJA1Y2bTWLWhb0tcuZyeJQDXtAXsGRdyUw_T_Ft7sZ_W6nXhLQ@mail.gmail.com> <CAKXLc7fOK94iWsRP7QkfjaqotYRXfgRSQRtbMeRCT80M4_-b1w@mail.gmail.com> <87sioffz3p.fsf@uwakimon.sk.tsukuba.ac.jp> <CAJA1Y2ZzArvOAstFK2tQE5yo_dgK4TE72GudigW_6XksB_v60Q@mail.gmail.com> <87oaz3fkfm.fsf@uwakimon.sk.tsukuba.ac.jp>
On Mon, May 12, 2014 at 10:52 AM, Stephen J. Turnbull <stephen@example.com> wrote:
Bruno Raoult <braoult@example.com> writes:For two files in the same directory that have the same content but
> On Mon, May 12, 2014 at 5:35 AM, Stephen J. Turnbull <stephen@example.com>wrote:
>>> 1- You have 10,000 files, and you want to find
>>> duplicates. Sometimes, 1 file changes, or you add/remove one, so
>>> you want to find the changes quickly (let say daily). How?
>
> git init; git add .; git commit; while true; do git status; sleep 86400;
> done
> I am not sure tu understand (or maybe my question was not
> clear). Let say you have ./a/b/c/d/file1 and ./a/b/z/file2 in the
> tree. They are binary the same files. My question was to find them.
different names,
git cat-file tree `git cat-file commit HEAD | grep tree | cut -b 5-` \
| sort -f 3 | uniq -D -w 52
(untested; probably requires GNU uniq). To handle recursion is
(recursively ;-) left as an exercise for the reader.If files are in the same dir, why using git?
> So we extracted the data, piped it, and saved in a file. Then? Whatgit ls-files --modified | xargs metadata-extractor-and-updater
> about the next day, when you want to refresh?
If you need to do this in real time, it's a difficult problem.This was not in my initial question.
Of course if (like Kalin) you're dealing with terabytes, this is still
way slow (even if you can compare bytes on the order of once per CPU
cycle, you're still talking about thousands of seconds). You really
need to be able to ensure that files aren't changed behind your back,
and some special handling for files >10GB would be needed. But for
people dealing with files on the order of a CD or less, git should do
the job quickly enough.Changes "behind the back" is not an issue. You just want to find dups, fromtime to time. The second question about metadata is the same in fact.
You offered a solution (that I did not test) using git. I am sure readers will proposealternatives. And this was the target of the question: which solution would be the bestfor such a requisite?
Let say another way: You have your 10,000 pictures. You plug your phone/camera, and,as you are not sure if pics were already imported or not, and you don't want to overwriteanything. You will import them in "Pictures/new-yyyy-mm-dd". After that, you want to findthe new possible dups (I already wrote that the first scan is a special case, therefore already done).
br.
--
To unsubscribe from this mailing list,
please see the instructions at http://lists.tlug.jp/list.html
The TLUG mailing list is hosted by ASAHI Net, provider of mobile and
fixed broadband Internet services to individuals and corporations.
Visit ASAHI Net's English-language Web page: http://asahi-net.jp/en/
--
2 + 2 = 5, for very large values of 2.
- Follow-Ups:
- Re: [tlug] "How to"
- From: Stephen J. Turnbull
- References:
- [tlug] "How to"
- From: Bruno Raoult
- Re: [tlug] "How to"
- From: Kalin KOZHUHAROV
- Re: [tlug] "How to"
- From: Stephen J. Turnbull
- Re: [tlug] "How to"
- From: Bruno Raoult
- Re: [tlug] "How to"
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] What's with this anti-Apple tirade? [was: 2014-05-10 Linux Quiz]
- Next by Date: Re: [tlug] What's with this anti-Apple tirade? [was: 2014-05-10 Linux Quiz]
- Previous by thread: Re: [tlug] "How to"
- Next by thread: Re: [tlug] "How to"
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links