Wednesday, April 8, 2009

Introducing GSS

During my recent work-induced blog hiatus, I've been working on a new software system, called GSS. I've been more than enjoying the ride so far and since we have released the code as open-source, I thought discussing some of the experience I've gained might be interesting to others as well.

GSS is a network, er grid, er I mean cloud service, for providing access to a file system on a remote storage space. It is the name of both a service (currently in beta) for the Greek research and academic community and the open source software used for it, that can also be used by others for deploying such services. It is similar in some ways to services like iDrive, DropBox and drop.io, but it can also be regarded as a more high-level Amazon S3. Its purpose is to let the desktop computer's file system meet the cloud. The familiar file system metaphors of files and folders are used to store information in a remote storage space, that can be accessed from a variety of user and system interfaces, from any place in the world that has an Internet connection. All usual file manager operations are supported and users can share their files with selected other users or groups, or even make them public. Currently there are four user interfaces available, a web-based application, a desktop client, a WebDAV interface and an iPhone web application, in various stages of development. Underlying these user interfaces is a common REST-like API that can be used to extend the service in new, unanticipated ways.


The main focus of this service was to provide the users of the Greek research and academic community with a free, large storage space that can be used to store, access, backup and share their work, from as many computer systems as they want. Since the available user base is close to half a million (although the expected users of the service are projected to the low ten thousands), we needed a scalable system, that would be able to accommodate high network traffic and a high storage capacity at the same time. A Java Enterprise Edition server coupled with a GWT-based web client and a stateless architecture were our solution. In future posts I will describe the system architecture with all the gory details. The exposed virtual file system features file versioning, trash bin support, access control lists, tagging, full text search and more.

All of these features are presented through an API for third party developers to create scripts, applications or even full blown services that will fulfill their own particular needs or serve other niches. This API has a REST-like design and though it will probably fail a formal RESTful definition, it sports many of the advantages of such architectures:

  • system resources such as users, groups, files and folders are represented by URIs
  • GET, HEAD, POST, PUT and DELETE methods on resources have the expected semantics
  • HTTP caching is explicitly supported via Last-Modified, ETag & If-* headers
  • resource formats for everything besides files are simple JSON representations
  • only authenticated requests are allowed, except for public resources

Users are authenticated through the GRNET Shibboleth infrastructure. User passwords are never transmitted to the GSS service. Instead GSS-issued authentication tokens are used by both client and server to sign the API requests after the initial user login. SSL transport can provide even stronger privacy guarantees, but it is not required, nor enabled by default.

The GSS code base is GPL-licensed and therefore anyone can use it as a starting point to implement his own file storage service. We have yet to provide binary downloads, due to the various dependencies, but the build instructions should be enough to get someone started. We are always interested in source or documentation patches, of course (did I mention it's open source?). Most importantly, the REST API will ensure that clients developed for one such service can be reused for every other one.

I will have much more to say about the API in a future post. In the meantime you can peruse the code and documentation, or even try it out yourself. I'd be very interested in any comments you might have.

7 comments:

adamo said...

So what would it take to build my email storage on a GSS infrastructure?

past said...

A special backend that would use the GSS API.

adamo said...

So when do we start?

past said...

We always start kickoff meetings with drinks! Ergo, we need an RFC.

Diomidis Spinellis said...

Great! I'd love to use GSS as an off-site backup drive. I guess I'd need to do some work for encrypting the data and performing incremental updates, but that's why you gave us an API.

past said...

I know many people would like to see such a tool, myself included. It's such a shame a day has only 24 hours...

Anonymous said...
This comment has been removed by a blog administrator.
Creative Commons License Unless otherwise expressly stated, all original material in this weblog is licensed under a Creative Commons Attribution 3.0 License.