Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Database frontend in Linux
- Date: Mon, 01 Jun 2009 11:26:18 +0900
- From: Edward Middleton <emiddleton@example.com>
- Subject: Re: [tlug] Database frontend in Linux
- References: <mailman.1.1243047601.6031.tlug@example.com> <BAY108-W32CB26AA19B9A7AE47072DA2570@example.com> <20090530102723.GA7204@example.com> <d8fcc0800905310503r27865f5ex3672fe533c7e724a@example.com>
- User-agent: Thunderbird 2.0.0.21 (X11/20090323)
Josh Glover wrote: > 2009/5/30 Christian Horn <chorn@example.com> >> On Sat, May 23, 2009 at 06:05:13PM +0900, Raedwolf Summoner wrote: >> >>> Pardon my greenhorn status, Christian, but I'm afraid I don't >>> understand the difference [between a search engine and a >>> database]. >>> >> A database would move or copy the data like soundfiles inside of it, >> making the data harder to backup etc. >> > > Yes, this is it in a nutshell. Databases, especially relational ones, > are great for storing data that is related somehow. Search engines are > better at dealing with data that is not itself related, but with > related *metadata*. > I think structured vs unstructured data is the major difference. Databases are better at finding things like "the title of songs on album x". A search engine is better at finding "all things related to x". The other major difference is that databases are generally better for closed world system (i.e. were there is a finite dataset and no result means the thing doesn't exist). Search engines are better for open world situations like the web, were no result means I don't know. An important distinction between a search engine and a database is that a database returns facts[1] where as a search engine returns what appear to be relationships based on data mining (i.e. statistics). A database result tells you what the database knows to be factual correct, a search tells you what is statistically likely to be reliant. [snip] > But there is another problem that is harder to solve, and that is > relevance. PageRank (Google's algorithm for determining which results > bubble up to the top for any given search) is all about relevance. [1] > It cares a lot about how popular a document is, which is determined by > static analysis such as building massive graphs that show how well > linked-to a document is, and feedback loops that ensure that documents > that are clicked on a lot for a given search term move up the result > list. This is why I don't *have* to do anything more than the > following search to get stuff about the Tokyo Linux Users Group: > The problem with page rank is that it doesn't solve the difficult problem of finding relevance , it solves the easier problem of finding popularity. This makes it susceptible to SEO and Google bombing[2]. It also means that unpopular but relevant topics aren't ranked highly. Edward 1. the facts could be wrong but they are explicitly stated. 2. http://en.wikipedia.org/wiki/Google_bomb
- Follow-Ups:
- Re: [tlug] Database frontend in Linux
- From: Edward Middleton
- Re: [tlug] Database frontend in Linux
- From: Josh Glover
- Re: [tlug] Database frontend in Linux
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Amarok 2 sucks
- Next by Date: Re: [tlug] Suse 10 file access without root?
- Previous by thread: Re: [tlug] Amarok 2 sucks
- Next by thread: Re: [tlug] Database frontend in Linux
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links