Past Midnight: April 2007

Saturday, April 21, 2007

Heresy! Web apps without SQL databases

Apple's Jens Alfke posted a thought-provoking piece the other day about data storage in the web world. In it he questions the conventional wisdom that SQL databases are the one and only solution for that problem space. While discussing the scalability problems faced by twitter.com (built on Ruby on Rails), he ponders over using plain files as a storage format and playing some clever tricks to get more performance in a simple way. The comments on his post are also interesting and the debate is still going.

Although I've been working for seven years in web applications with SQL databases, that's not the first time I've heard of this heretic opinion. As one of his readers accurately points out, Paul Graham's Viaweb (now Yahoo! Store) was (still is?) also using flat files to store its data in a FreeBSD UFS file system. Many large-scale server applications (say, Directory Servers) use storage managers like Berkeley DB to keep their data, without having the SQL query engine or the relational model to depend upon. Also, very highly transactional applications in the financial banking sector frequently use distributed in-memory object caches like Tangosol's (now Oracle's) Coherence as an ultra fast data store, with an RDBMS as a backup.

There is a lesson to be learned here, and it is finely articulated by Poul-Henning Kamp's presentation on Varnish, a web accelerator. Poul-Henning, a veteran FreeBSD kernel developer, created a Squid-killer by, to use his own words, not fighting with the operating system. The architecture of his web proxy is closely aligned with the way a contemporary UNIX-like operating system works, thus avoiding pointless layers of abstraction and architectural mismatch. The same logic accounts for Coherence's performance gains, a good mapping of the data storage architecture (in-memory object maps) to the web application architecture (Java EE or a lightweight alternative). The same way Varnish avoids traditional file I/O by using the OS's virtual memory (directly mapping files into pages via mmap), Coherence avoids the database overhead (SQL query optimizers, table indexes, disk access, etc.) by always dealing with objects stored in memory.

If for a particular web-based application problem we can devise a non-SQL storage solution that maps well to its use case requirements, then why suffer the (financial, support, technical) overhead of a SQL database? If there are no requirements of OLAP functionality or data warehousing, why bother? Remember the KISS principle: Keep It Simple Stupid! Who would knowingly subject himself to the evils of the object-relational mismatch (the Vietnam of Computer Science), if there was another way to do it?

Admittedly, the great thing about SQL databases is that they let you query your data in ways that you may have not contemplated as necessary. In a system with constantly changing requirements for data view (or with requirements for ad-hoc, dynamic viewing) you may well not have any other choice, but to use a SQL DBMS. Perhaps Object-oriented DBMSs can be a solution too, but I believe they are just starting to obtain OLAP and data warehousing capabilities and the performance may not be quite there yet.

Can it be an accident that Google uses Bigtable, a distributed storage system of their own devising for internal use? When distributed/federated databases have major drawbacks (namely cost and vendor lock-in), where do you go when you have to scale? Horizontal partitioning is a solution, but you'd better plan ahead.

In such circumstances, it may be appropriate to consider even heretic solutions.

Update: For those who won't take the time to watch the whole Varnish presentation (hey, it's fun, really!) Kostas Kalevras reminds me that you can find the meat of the presentation in the architect's notes. Apart from Poul-Henning's witty comments, they contain pretty much all the important points made there.

Wednesday, April 11, 2007

On FreeBSD and Debian package management

About a year ago I was contemplating writing a graphical package management tool for FreeBSD. I had been using bpm on and off for a while and while it was close enough to what I needed, the effort to extend it wore me off pretty soon. After all, how much GUI programming in plain C can a decent man endure?

Anyway, after picking a more workable solution on the programming language front, I had sketched up a GUI prototype and was considering the available options for the actual package updating code. My main interest was in binary packages and there were a few available command-line tools and libraries. None of the options seemed to work well enough though. Besides various communication issues among the two tiers of my application, what really drove me off this pet project was the source-based nature of the ports framework. That and the tiny issue of fatherhood, to be precise. I've tried to explain it once, but what I really wanted was something like the Debian system, which I'd argue is a better fit for regular users.

Michel Talon, long time FreeBSD user and developer, appears to have researched the issue more thoroughly while developing pkgupgrade, a tool that would have been another great option for my own project, had it been available last year. Pkgupgrade is written in Python, which would make it even yummier in my opinion. It's not Java, but it's close enough for my taste. Michel's comparison of the FreeBSD package system and the Debian one, comes to the conclusion that binary package management systems are inherently better for casual users than source-based ones:

"But the main factor ensuring reproducibility and reliability of the apt system is working with binary packages. You can be sure at least of the existence of a binary package, and probably that it works, due to the severe testing in the Debian system. There is no guarantee in a source based system. Hence no package management system can be reliable, however sophisticated it is."

The above excerpt may sound a bit harsh, but if you read the whole paper you'll see that the author's arguments are pretty balanced. Now that I've had the chance to use Debian's apt system for a few months, I can say that I wholeheartedly agree with Michel. This is pretty much what I had in mind for FreeBSD last year and I still think this is the direction the project should follow.

The upside for FreeBSD is that I no longer have the time to work on my old Java-based graphical package management tool. Because, let's be honest here, a system tool in Java? How many rotten tomatoes would be coming my way?

Monday, April 9, 2007

Paul Graham on Startups

When I began this blog I made a deal with myself that I would not post more than once or twice a week, but lately I just can't help it. I could use the excuse that I'm on holidays, I suppose. It's the Orthodox Easter over here, and Greece is covered with a smell of roast lamb. It's a sunny weekend, people walk around at a snail's pace, birds singing, flowers blossoming, you know how it is. I just got inside for a minute to check on my e-mail (I know, I know) and I discovered pure gold: Paul Graham's essays have an RSS feed! I don't know how long it has been out there, but I just stumbled upon it, and man, did that made my day! That and the roast lamb, of course.

After subscribing, I found out there were a few recent essays that I had missed. The one I want to particularly mention is Why to Not Not Start a Startup. Since this is a topic I am very interested in, I read it without even blinking. That alone could explain the tears in my eyes, but there is more to it. I've been a Paul Graham fan for quite some time, even though I'm not into Lisp. His writing style is marvelous and his wealth of experience, invaluable. I would like to take this essay and stick it in the faces of various investors we've met over the years. Not that it would change anything, but it would make me feel better, at least. I had been planning to write about a few of his points myself, but since I could never hope to be so eloquent, go read them straight from The Man.

There were moments of discomfort, I must confess. Like when I read the following:

"If you don't think you're smart enough to start a startup doing something technically difficult, just write enterprise software. Enterprise software companies aren't technology companies, they're sales companies, and sales depends mostly on effort."

Ouch. I know it's true, but it always hurts to admit it.

There are other well-established observations in there, like the consulting-to-product business transformation:

"What you can do, if you have a family and want to start a startup, is start a consulting business you can then gradually turn into a product business. Empirically the chances of pulling that off seem very small. You're never going to produce Google this way. But at least you'll never be without an income."

The chances may be small indeed, but, hey, it worked for Joel.

Also on the well-known territory:

"In a good startup, you don't get told what to do very much. There may be one person whose job title is CEO, but till the company has about twelve people no one should be telling anyone what to do. That's too inefficient. Each person should just do what they need to without anyone telling them."

Yep, I can fondly remember why we started it all.

The Love of Programming

Jeremy Allison, of the Samba fame, has posted a piece on the advice he would give a young engineer about to embark on a software development career. Really good stuff. If I were to summarize his post in three words, it would be "love, low-level, open-source". Hmm, that's more like five words, but who counts anyway? I must confess that I am in total agreement with Jeremy, about the necessary qualities he mentions. Particularly the first one, Love.

Occasionally I run across old friends from the university whom I haven't seen for a long time, and they will get to ask me what is it that I do these days. And every time I mumble something that contains the words 'software' or 'programming', they will go 'right, I liked doing that stuff as a student, but after graduating I swore I would never go through that again'. And I will give them a sympathizing smile. I really understand their frustration. Programming is hard. And not very rewarding.

Sure, you can make a decent living and in some rare cases you may even get rich. But how many people will understand your work? Say you design a great presentation framework for web applications, like Stripes. Will your mom ever comprehend its greatness? How would you describe the wonderful effects of annotations and convention-over-configuration? You can't. You might even say that it supports the extranodal lymphoplasmacytoid lymphoma and it wouldn't have made a difference. She would still refer to you at her friends as 'plays with computers'. Lousy job description. Personally, I would prefer being referred to as 'orders other people'. That is an Executive Manager, in mom talk. Or 'gives interviews on TV'. That would be VP of Software Strategy And Stuff. But when developing software, you have no other way to endure the obscurity and the difficulty of the craft, than to grow to love it.

There is a Zen-like sense in Programming, it seems. Knowledge shall set you free. And when searching for knowledge you have to go deep. You have to go low-level. Jeremy suggests that you should get acquainted with the way the system works deep inside. Amen to that. Processors, operating systems, protocols, that sort of thing. Joel Spolsky would agree, apparently. His advice to young students is to learn C, not just high-level computer languages. Steve Yegge has a similar suggestion, to learn Math. Algorithms, probability theory, statistics, etc. My take is you need a solid background in a wide spectrum of disciplines. Don't overspecialize.

The other thing Jeremy suggests is involvement in open-source projects. I guess it should be a no-brainer today, but it was true even ten years ago. Free access on the work of esteemed peers is something rather scarce in other disciplines. But the experience you can get working with the masters is simply invaluable. Not to mention the social skills you develop, while interacting with others. After a while you can learn how to judge the probable outcome of a discussion, simply by the way the issue has been put on the table. And that way you can try to contain a fire before it burns down everything. Good social skills are a requirement for a good future manager. Communities can also help make new friends and build a reputation. Things that can pave your way to a successful career. And maybe, just maybe, you'll avoid being referred to as 'plays with computers'. Sigh.

Saturday, April 7, 2007

The bcm copyright violation

I've read the whole thread on the bcm driver in OpenBSD violating GPL code. It was educational. And sad. For those who haven't heard about the issue, here is the executive summary:

A team of Linux developers began a clean-room implementation of a wireless network device driver for a Broadcom chipset. Their effort produced a chipset specification and a GPL-licensed Linux driver. An OpenBSD developer started implementing a BSD-licensed driver for OpenBSD based on their work. He apparently started with some code of his own and some from the GPL code base and kept rewriting function after function in his own way. Unfortunately, he started committing his work on the OpenBSD CVS tree before replacing every piece of the original code. This constitutes a copyright violation as the Linux developers recently pointed out. The driver was removed from the OpenBSD tree in the midst of accusations among the two camps about proper community behavior.

This is not an unusual thing to happen. What was unusual indeed, is the lack of cooperation between the two parties. Even though the core matter was quickly resolved (in an abrupt way, nevertheless), the accusations regarding the motives and the behavior of the other side went on for quite some time. This is the sort of thing we get to enjoy in the open-source ecosystem. If you are the kind of pervert that I am, of course. It's like a TV show where everyone keeps shouting, without actually hearing what the others are saying. Ah, the joy of mobs! And once again it comes down to the personalities of the people involved. Intolerance, stubbornness, old wounds, all contribute to a fire that keeps growing, burning down the bridges that people had built between their communities.

I can remember other cases where things were handled more professionally. A high-profile one was the JBoss and Geronimo clash. Fortunately, there were people with diplomacy skills on that one. Lawyers even. Hey, don't get me wrong, I know all the jokes about lawyers, but when you are talking about lawyeresque things, like copyright violations, it helps to actually know what you are talking about. Apparently, not many software developers do. Me neither. So, I'd suggest that in such circumstances the best strategy would be to listen first. Be condescending. Assure the other party that you are taking the issue seriously. Avoid any inflammatory vocabulary. Don't bend over, but don't counter-strike either. Remember, you are trying to uphold peace here, not win a war.

That is actually one of the things I like the most about the Apache and FreeBSD communities. The gentleness and the professionalism. There are always the bad apples, no doubt, but the overall impression is wildly positive. No drama queens. Not in the high ranks at least. The sad thing is that you don't get on the front page without some drama. You are instead painted as a 'nice guy'. Dependable but boring. The one that everyone wants to marry, but nobody wants to date. That's pretty much why I replaced my FreeBSD desktop with a Linux box at work. And why I develop on JBoss instead of Geronimo. It's a fame thing. Go with the flow.

However, at home, when no one is watching, I use a Mac. In case you didn't know, Macs are FreeBSD inside with the sexiest graphical interface on top. Which means, dependable and slutty. My kind of girl. I mean system. What's not to like?

Tuesday, April 3, 2007

Open source software in Greece

In a recent blog post, fellow open-source developer Dimitris Andreadis wondered about the sorry state of open-source software development in Greece. Dimitris has been working for JBoss, er Red Hat, for quite some time now and these days he is the Big Cheese of the JBoss Application Server. Therefore, his question is not of the naive kind. As a matter of fact I've been asking myself that same question these last few years and I'm about to tell you what I've come up with. But first some brief history.

My own open-source adventures as a developer began in the Christmas holidays of 2000, if my memory serves me correctly. With the University closed for the holidays, I spent some time during my vacation fiddling with FreeBSD on my laptop. As a Unix enthusiast I had switched a while back from Linux to FreeBSD, but I had been having a problem with the system's support for the Greek locale. The problem was that there was no support, whatsoever. Being young and foolish, I sacrificed a few nights of partying with friends to tame the beast. And tamed it was. I've been submitting various patches ever since, to the FreeBSD project, the Eclipse project, the Apache Lucene project, the Gnokii project and a few others. Aside from one (recent) particular occasion, I've been doing it on my own spare time, without any sort of compensation for my work. Not to mention missing a few parties.

What was the motivation then, you say? Well, obviously, at that time and age I wasn't a world-acknowledged computer programming Giant, yet. Yeah, I still ain't. But the goal had been set. And a few pints of beer could not have stood in my way. I sought recognition from my peers and recognition I received. Not right away of course. It took a few years more than I had imagined, but eventually I ceased to be the frightened newbie and turned into a seasoned veteran, who would help others find their way around the system. Besides the numerous "thank you" notes, I was rewarded with experience. And as you've probably heard, experience does not grow on trees. You have to invest time and effort, in order to get it. You have to sacrifice stuff. You have to say no to party invitations. From pretty girls. More than once. It's cruel, I'm tellin' ya!

So, we have determined that seeking experience, peer recognition, career advancement and having an itch to scratch, can lead an otherwise sane person to open-source software development. Pretend for a moment that you agree with me that there must be plenty of people seeking peer recognition, yada, yada. There is still the issue of effort and sacrifice. I would be tempted to concur with various other commenters in the aforementioned blog that in Greece we like to have it easy. But, let's put that aside for a moment and consider the "scratch an itch" issue. What if there is no itch? My friends who had Windows on their PCs, did not have any locale issues. They regularly cursed at the blue screens of death of course, but that's not exactly an itch, it's more of a gangrene. You just can't scratch it. So you spend your time downloading pirated versions of insanely expensive software instead, fooling yourself into thinking that you are obtaining experience. The error being of course in the direction. Experience mainly comes from diving deep inside a problem space, not sailing along in the surface. If we eliminated software piracy in Greece, most people would not be able to afford many commercial applications on their PCs. They would have to settle for free and open-source equivalents, warts and all. And then they would make the greatest discovery of all: they would be able to try and fix them.

If I'm coming across as a bit disappointed, it's probably because it's getting really late and I'm feeling sleepy. Don't pay much attention to me, instead see how good things appear to be in an excellent study by professor and open-source developer Diomidis Spinellis. In his paper "Global Software Development in the FreeBSD Project" he presents a world map with markers on the cities where the project's international team of developers live. You can see there that Greece is represented with a couple of markers, whereas Spain is not. Neither is Portugal. Nor Ireland. Even Russia is not so hot either, considering its size. And don't get me started on South America or Asia.

So, I'd say things aren't very rosy, but they are not that bad after all. It's just a matter of perspective. Glass half full or half empty? Dusk or dawn?

Now speaking of dawn, if you'll excuse me...