printlogo
http://www.ethz.ch/index_EN
Department of Computer Science
 
print
  
English Deutsch

Prof. Donald Kossmann, Jens-Peter Dittrich

Personal Data Spaces

Personal information is the set of files that exist on all your computers and network drives. It is the most valuable asset of your digital life. To manage that information you have to struggle with a jungle of technologies like different file formats and data processing solutions, i.e., email applications, text and word processors, spreadsheet managers, tools for backup, versioning and synchronization, data sharing, search utilities, and many others. Unfortunately, each of these applications invents its own ways of storing, managing, and searching information. As a consequence, today’s users are having a hard time finding the right information at the right time on their workstations. In the following, we present a typical problem today's users are confronted with when managing personal information. After that, we sketch how that problem is solved by establishing Personal Data Spaces and Personal Data Space Management Systems (PDSMS). Finally, we introduce our prototype implementation of a PDSMS named iMeMex (integrated memex).

Example Problem

Consider a simple project management scenario. When managing multiple projects, e.g. industry projects, research projects, PhD students, lectures, and so on you may decide to store documents of big projects in a separate folder on your hard disk or a network drive. For smaller projects you may decide to keep that information as attachments to email messages you exchanged with members of the project team. As small projects evolve, you may decide to keep some information in email messages and some information (possibly the same) in the file system.

Now lets assume that you are interested in the most recent version of some document (e.g., a technical annex) of project X.

With current technology you would first have to understand where to look for the information of project X, i.e. which of the subsystems file system or email server contains that information? After that, you would use the search capabilities offered by that subsystem, i.e. email client or the operating system, to find the versions of that document. If the document has been renamed, this task is quite challenging. Furthermore, the search capabilities offered by the email client and file system may differ considerably. Finally, if the versions of the document are scattered among both the file system and the email server, manual work is needed in order to select the last version of the document. In summary, as a user you must understand where your information is stored and how it is accessed on the different subsystems.

Problem Analysis

With current technology users have to struggle with a number of problems:

1. Multiple hierarchies
Each partition of any of your hard disks as well as every email account has a separate hierachy of folders to organize your messages and files. Even though these hierachies are typically similar, they are not connected. The same hierarchy information gets replicated for every subsystem. There is no unified hierarchy on the user's data.

2. System dependence: On which device is my data?
Copies of data are stored on different devices (e.g., laptop, desktop, iPod, cellular, etc.). Whenever a user deletes a file from one of the machines, she must make sure that she has a backup copy of that file on at least one of her other machines. Otherwise the information is lost. If the user really intends to delete a file (for good), then she must be aware of all devices that store that file. As a consequence, users must care about replication and backup strategies manually, even though this should be a basic service provided by the operating system and IT infrastructure.

3. Application dependence: How do I access my data?
Data is also managed by different applications which also complicates the life of users. As the example from the beginning illustrates, certain data can only be accessed by a limited number of applications and the capabilities of the different applications with regard to search, query, and data processing differ. The example showed the difficulties users have if the search capabilities of the email client and file system vary.
As a consequence, it is hard to query multiple subsystems using a single query and a great deal of manual work is required to assemble results.

4. Loss of Data History and Lineage
Whenever a file is copied, history and lineage information gets lost.
For instance,

A Possible Solution

We argue that a new abstraction is required that overcomes the limitations of current file systems, database systems, and applications in general. We argue that „Personal DataSpaces" are the solution. The term DataSpace has been adopted from a recent paper by Franklin et al. [1]. A Personal DataSpace has the following properties:

Table 1: DBMS vs
Table 1: DBMS vs. PDSMS

iMeMex System

We have implemented a prototype PDSMS named iMeMex (integrated memex) [2]. iMeMex provides a unified platform for personal information management that integrates well into current operating systems like Linux, Mac OS X and Windows. In contrast to other PIM approaches, iMeMex is not yet another application on top of the OS. For the user it seems as if the OS was extended. It is also the first implementation of a PDSMS that we are aware of.

windows

Some Technical Features of iMeMex

Outlook

We are currently finishing implementing the iDM data model. Moreover, we are working on the iMeMex Query Language (iQL). Other work is focussed on implementing distributed DataSpaces as well as exciting new applications on top of iMeMex. We are planning to put a publicly available version of the software (iMeMex 2.0) on our web site http://www.imemex.org in Summer 2006. If you are looking for a semester or master thesis, please take a look at (http://www.dbis.ethz.ch/education/Theses/pim) or contact Jens Dittrich (jens.dittrich @ inf).

imemex

References

[1] Mike Franklin, Alon Halevy, David Maier. From Databases to Dataspaces: A New Abstraction for Information Management. SIGMOD Record, 34(4): 27-33, December 2005.

[2] Jens-Peter Dittrich, Marcos Antonio Vaz Salles, Donald Kossmann, Lukas Blunschi: iMeMex: Escapes from the Personal Information Jungle (Demo Paper). VLDB. September 2005: 1306-1309.

More Info

http://www.imemex.org (project homepage)

http://www.dbis.ethz.ch (Group homepage)

http://www.dbis.ethz.ch/education/Theses/pim (open topics for semester and master theses)

 

Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne graphische Elemente dargestellt. Die Funktionalität der Website ist aber trotzdem gewährleistet. Wenn Sie diese Website regelmässig benutzen, empfehlen wir Ihnen, auf Ihrem Computer einen aktuellen Browser zu installieren. Weitere Informationen finden Sie auf
folgender Seite.

Important Note:
The content in this site is accessible to any browser or Internet device, however, some graphics will display correctly only in the newer versions of Netscape. To get the most out of our site we suggest you upgrade to a newer browser.
More information

© 2012 ETH Zurich | Imprint | 10 February 2006
top