Advanced Linux Desktop Search -- MetaFS


MetaFS is still in the pre-alpha stage of development. That means not everything discussed here will work, or has even been written yet.

Background

So ... what's the problem, anyway?

Most modern computers use hierarchical filesystems for storing and retrieving data. This method of organization has been in wide and successful use for decades. However, the amount of information we store and access has grown by many orders of magnitude since filesystems were first created.

Pure hierarchies don't work anymore. Users often create immense folder structures for their information, however, such structures require a non-trivial amount of effort to maintain. Furthermore, users sometimes have diffculty remembering how they categorized a particular document, so they have to waste time looking for it. Some users may not create folder structures at all, instead preferring to save everything in one big dumping ground. They will have to search through this dumping ground repeatedly looking for lost information.

Manual searching (or even automated searching without an index) is a time-consuming and frustrating operation. This is especially true with more heterogeneous sets of information, or information that has been only loosely categorized.

But hasn't someone else solved this?

No. There have been other attempts at solving the organizational problem, however, they are either too narrowly-focused (such as pure search tools), or they are not compatible with existing storage APIs.

Some, such as Google's Desktop Search, Apple's Spotlight, or GNOME's Beagle, focus primarily on searching and presenting search results to the user. Instead of building search into the filesystem, they provide it as a separate application. This approach is not helpful to power users, who often would prefer to work mostly within their super-customized xterms and shells. It is also limited in its ability to provide additional metadata (such as the title and artist of a song in MP3 format) to other applications. (Also, the only one of these projects that works on Linux is Beagle.)

Other projects, such as MIT's Haystack Project, throw out the hierarchical filesystem completely, preferring instead to use their own internal structure, which is then exposed to the user and other applications via a proprietary interface. However, in order to be used comprehensively (and thus effectively), these projects would require all applications to be refactored to take advantage of their APIs.

So why is MetaFS better?

MetaFS strikes a balance between these two extremes. It aims to build advanced functionality (such as search) into the filesystem in a way that is transparent to all applications (including a hacker's favorite shell). Users and developers may continue using the applications and APIs they know and love, and need only step into the realm of MetaFS when it suits them.

Details

MetaFS is an enhanced layer of functionality that sits atop the standard Linux filesystem. It provides the usual semantics one would expect from a UNIX filesystem -- inodes, directories, files, (sym)links, and such -- as well as additional information about the files themselves. This additional information comes in the form of either extended attributes, or "services" that appear as virtual files or directories.

Plugins

MetaFS is strongly based on plugins, and is therefore extensible in almost every way imaginable. These plugins fall into a few general categories: MetaFS plugins can mix and match in these categories as appropriate; they are not limited to being a single type of plugin.

Searching and Services

MetaFS will also provide searching functionality through a service plugin. For example, suppose you want to search for all your Beatles MP3s. You need only create a text file containing your search parameters: Then, go into the search service of the file you just created: Inside the search service, you will find symbolic links to all your Beatles MP3s. Furthermore, if you buy and rip a new CD, the new MP3s will automatically appear in the search as they are added to your collection.

Now, suppose you have a .tar.gz file, and you want to check what's inside before extracting it. Services should be nestable, like so: The gz service decompresses, and the tar service looks inside the decompressed tarball.



[ Home ]