Re: [tlug] distributed file systems

Date: Mon, 15 Feb 2010 16:40:49 +0900
From: Kalin KOZHUHAROV <me.kalin@example.com>
Subject: Re: [tlug] distributed file systems
References: <4d3714b51002141821r1b903f03j7a567122720e9c15@example.com> <20100215035353.GJ24817@example.com> <4d3714b51002142226p338664a8l34a9d918eea2d9bd@example.com>

On Mon, Feb 15, 2010 at 15:26, Sach Jobb <sach@example.com> wrote:
>> While mogilefs can be considered a "file system," it's not a file system
>> like one you'd put on a hard disk or an SD card. First, it's accessed
>
> Indeed technically mogilefs is not a file system in the sense that we
> are used to. However, the general term for this sort of thing, so far
> as I can tell, seems to be "distributed file systems." Even mogilefs
> refers to itself as a "distributed file system." In fact I don't
> actually care much what it's called. I am just trying to use the same
> name that everyone else does.
>
>> My very strong recommendation is that you, for the moment at any
>> rate, drop the idea of using a distributed file system, and instead
>> of describing a proposed (and probably poor) solution to an unknown
>> problem, instead describe the problem itself. It's almost certain that
>> a "distributed file system" in the sense of something like NFS either
>> won't work well for you or be rather insecure or both.
>
> Good point. In fact, if there is a reasonable way to do it that does
> not involve a distributed file system, or whatever you want to call
> it, I would be very interested in that as well.
>
> There is a web application, in fact a few applications, which take a
> input from a large number of users. These are not system users of
> course, but web users. Much of this data goes into a database, which
> is not a problem. There are a lot of associated files, which are at
> least 95% images, but there are also mp3s, flvs, and some misc text
> files. There may also be other stuff that I am not aware of.
>
> This is, thankfully, mostly stored in a single location with a
> directory structure based off a dates. For example:
> ~/files/something/2010/2/10/something/something/myfile.png
>
> Mostly this is just write and forget. But there are some cases where
> there are deletes and edits (the text files).
>
> The updates have to be replicated to the other servers fairly quickly.
> I.E. I don't want to use cron or something triggering a script.
>
> NFS: I refuse to use NFS for anything under any circumstances. In
> fact, it's best not to talk about NFS in front of me. Think Happy
> Gilmore.
If you can limit all your users to submit new data to one server,
and later delete it, yet they'll be able to read from ANY server it
will simplify
the problem a great deal. You'll have one master RW server and many RO
slave servers that sync from the master. If you don't need version control,
you can use a rsync script triggered by a inotify (or dnotfy) to replicate
the changes. Or you can run cron or endless loop script to do:
`rsync -HavP --delete master:/dir1/ /dir1/`, of course with properly setup ssh
PKI authentication.

BTW, how quickly is "fairly quickly" ? A minute, 5 s, 1 s, less?

>> By the way, I'd lay greater than 50% odds that you'd be best served by a
>> version control system.
>
> That is interesting. The code itself is managed that way as part of a
> deployment process (we'll just add the remote servers into the same
> process). It seems a bit strange to me to use version control for
> something that doesn't have any versioning.
>
> How would you invision that working exactly? Someone uploads a file,
> and it triggers a merge? I guess with git I could see how that might
> be possible, but wouldn't it be sexier if I just new that every write
> was being written to the other servers, and the was something in place
> tracking this against the nodes?

If you cannot separate the master/slave transactions, then you might use
a version control system such as git or subversion. User submitted/deleted
content will trigger a commit hook and it should notify the rest of the servers
to update their local repositories. However, I'd suggest running a cron,
or even an endless loop on each server updating the local copy every minute
or so. If you have many (10+) servers, try to distribute the timing, although
both git and subversion/apache are fast enough in practice.


> Certainly some of you out there have had to deal with this sort of
> problem before and I think it's an interesting subject. How have the
> rest of you addressed this sort of problem before?

I haven't seen exactly the same problem, but I use subversion for tasks like
that, almost never looking at the revision history, but utilizing the
efficient delta
transfer algorithm.

Cheers,
Kalin.

Follow-Ups:
- Re: [tlug] distributed file systems
  - From: Sach Jobb

References:
- [tlug] distributed file systems
  - From: Sach Jobb
- Re: [tlug] distributed file systems
  - From: Curt Sampson
- Re: [tlug] distributed file systems
  - From: Sach Jobb

Prev by Date: Re: [tlug] distributed file systems
Next by Date: Re: [tlug] distributed file systems
Previous by thread: Re: [tlug] distributed file systems
Next by thread: Re: [tlug] distributed file systems
Index(es):
- Date
- Thread

Home | Main Index | Thread Index