Freedom The Open Source Way Contribute Articles or News to OSForgeOSForge HomeLogout from Forums
Contacting OSForgeOSForge HomeAbout OSForge
  

Root
Contribute News
Learning Corner
Linux Distributions
Linux Common FAQ's
Discussion Forums
Community Gallery
Links Directory
Search OSForge
Networking
Industry Updates
Linux & Open Source
Opinions
Press Release
Programming
Security
Web Development

White Paper
Likewise Software Receives Ready for IBM Tivoli Validation
Plat'Home Unveils Final Results of “Will Linux Work?” Contest
Zenoss Announces Record Quarterly Customer Growth amid Struggling Economy
Latest Open-Xchange Server Edition Simplifies Integration, Easily Customizable
Cluster Resources Works With IBM to Provide Moab Hybrid Cluster on iDataPlex
Cluster Resources to Deomonstrate Moab Hybrid Cluster on Windows HPC Server 2008
Cluster Resources to Provide Moab Hybrid Cluster Solution on New Cray CX1(TM)
Plat'Home Unveils Winners of “Will Linux Work?” Contest
Zenoss Core Recognized as Best Open Source Network Monitoring Solution

View More

How to keep open source and GPL strong ?
By : Ewdison Then [www] Find more article by Ewdison Then on Opinions
Sunday the 9th, September 2001 at 08:07 PM (EDT)
Send this Story to a Friend Readers TalkBack (0) - 647 Reads

Printer Friendly Page Printable format
Send this Story to a Friend Foward to Email

The history of Unix is judged a tragedy by those who regret how a great operating system failed to establish itself as a unified standard. The fate of Unix is well known. Split by commercial vendors into into a babel of incompatible versions, today the various Unixes compete only with each other for a piece of the dwindling Unix market.

Could history repeat itself with Linux? It is true that the GNU General Public License (GPL) gives Open Source developers the explicit right to mutate any GPL'd project, thus opening the door to a potentially injurious "code fork." But, in a seeming paradox, Open Source developers regard the right to fork as a mechanism that protects against the actual possibility of a fork. At the same time, they view the absence of that right in the Sun Community Source License (SCSL), used for Java, Solaris, and Jini, earns derision for the SCSL as a feeble substitute for the more permissive GPL.

This poses a challenge for Linux implementors in the business world: they probably have to work hard to fight customer fears that GNU/Linux will fragment into a hundred incompatible versions because there's no single big corporation in charge. And here I come, saying isn't it wonderful that Open Source licenses guarantee everyone the right to do just that.

Sound contradictory? OK, here's the quick and dirty answer. The detailed one comes later:

  • Linux won't fork because the fork-er has to do too much work for no payoff: any worthwhile improvements will be absorbed into the main branch while the rest of the fork will turn be discarded/ignored as pointless.
  • The above phenomenon occurs with Linux, even though it hasn't with earlier projects, precisely because of the effect of Linux's source-code license.
NOTABLE PAST INSTANCES OF FORKING

1. Unix --> dozens of proprietary mutant corporate Unixes

If you've read up on Unix history, you know that Unix was a freak product of AT&T's Bell Labs division, around 1969. I'll omit most of the long story, but the most important fact to know is that AT&T was then operating under a Department of Justice anti-trust judgment (which expired around 1980) prohibiting it from entering the computer/software business. So, it could not legally sell Unix, but instead sold source-code licenses (and occasionally also the right to use the trademarked name "Unix") to (1) universities, such as U.C. Berkeley, and (2) companies such as IBM, Apple, DEC, Data General, SGI, SCO, HP, etc.

Those companies bought the right to make their own Unixes: IBM released AIX. Apple did A/UX. DEC did Ultrix, OSF/1, and Digital Unix (later renamed "Compaq Unix" and now "Compaq Tru64"). Data General did DG/UX, SGI did IRIX, HP did HP/UX, and SCO did XENIX which eventually mutated into SCO UNIX. And we could cite others, but I'll spare you.

The point is that these were the jokers who ruined Unix. Every one of them marketed his mutant Unix as "Unix plus" -- everything the other guys have and more. Needing to create differentiators, they deliberately made their Unixes incompatible while giving lip service to "standards".

For customers, this was simply a mess, and Microsoft drove right through the disunity like a Sherman tank. It is the classic instance of forking that sticks in people's minds.

2. BSD --> FreeBSD, NetBSD, OpenBSD, BSD OS, MachTen, NeXTStep (which has recently mutated into Apple Macintosh OS X Server), and SunOS (now called Solaris)

As I mentioned above, antitrust-limited AT&T, not being able to sell Unix itself, gave out very cheap Unix source-code licenses to universities including U.C. Berkeley. UCB took the lead in the academic world: Having access to the source code, they quickly realized that they could rewrite it to make it much better, and slowly did so. Their rewrite was dubbed "BSD" (Berkeley Software Distribution), and they were glad to share it with anyone similarly having an AT&T Unix source licence.

And their work was generally a great deal better than Bell Labs', partly because it benefitted from worldwide peer review in a very open-source-like fashion. Over quite a few years, they gradually replaced almost all of the AT&T work, without (at first) really intending to.

One fine day, grad student Keith Bostic came to the BSD lead developers, inspired by Richard M. Stallman's (remember him?) GNU Project, and suggested replacing BSD's remaining AT&T work to create a truly free BSD. Dreading the confrontation likely to result with AT&T, they tried to stall by assigning Bostic the difficult part of this task, rewriting some key BSD utilities. This back-fired when he promptly did exactly that. So, they grumbled but then completed the job, and tried to prevent AT&T from noticing what they had done.

AT&T did notice, panicked, and sued. That, too, is a long story best omitted. Under the stress of the lawsuit, freeware BSD split into three camps (FreeBSD, NetBSD, and OpenBSD). But there were also several proprietary branches, made possible because U.C. Berkeley's "BSD License" allowed creation of those: Sun Microsystems' SunOS, Tenon Intersystems' MachTen, BSDI's BSD OS, and NeXT Computer's NeXTStep OS all came out for sale without public access to source, and were all based on the Berkeley BSD source code.

Note the distinction: If you write a program and release the source code under the GNU General Public License (GPL), other people who sell or otherwise release derived works that incorporate your work must release their source code under GPL conditions. The same is not true if you release your work under the BSD License: Anyone else can create a variant form of your work and refuse to release his source-code modifications. (In other words, he is allowed to create proprietary variants.)

A word about the three free BSD variants: All three were splinters from a now-dead project called 386BSD. All have talked about re-merging in order to save duplication of effort, but they now persist as separate projects because they've specialized: FreeBSD aims for the highest possible stability on Intel x86 CPUs, NetBSD tries to run on as many different CPU types as possible, and OpenBSD aims to have the tightest security possible. In other words, the 386BSD project remains forked because there are compelling reasons that make this a win for everyone.

Also, where possible, these three sister projects collaborate on tough tasks -- and they also collaborate with GNU/Linux programmers. Some of the best hardware drivers in the Linux kernel are actually BSD drivers. There's a high level of compatibility among the three BSDs and between them and GNU/Linux: Unlike the proprietary Unix vendors, BSD and GNU/Linux programmers have an incentive to eliminate incompatibility and support standards.

3. emacs --> Lucid emacs --> xemacs other proprietary emacses, now --> mostly forgotten GNU emacs

To call emacs a "text editor" is a bit like calling the QE2 a "boat". This program is nominally a simple text-handling program designed to process macros (thus the name, which stands for "editing macros"), but you can spend all your computing time inside it and accomplish just about anything that can be done on a computer.

It was written by legendary programmer Richard M. Stallman (remember him?), back in the days when everyone Stallman dealt with assumed you would share your source code, and did so. So, the only license for early versions of emacs was an implied "Do whatever you want".

Unfortunately for Stallman, what a number of companies wanted to do with his work was make proprietary branches of it. This came as a surprise and disappointment to him, and was one of the reasons he wrote the GNU GPL and started the GNU Project (to create a permanently free Unix-like OS called "GNU").

All but one of the proprietary emacs variants have now died: Lucid Corporation, before it dissolved, sold "Lucid Emacs" to Sun Microsystems, which decided to confusingly rename it "xemacs" (confusing because there's no special support for the X Window System that's not also in other emacses), and eventually gave it to some programmers who decided to re-release its source code as free software under Richard M. Stallman's GNU GPL.

Meanwhile, Stallman continued to maintain the original emacs as "GNU emacs", and placed all his new work under the GNU GPL specifically so that the Lucid debacle could not recur.

People often ask why the xemacs people do not re-merge their work with Stallman's version, to save duplication of effort. The obstacles are both personal and technical, and sometimes difficult to distinguish. First, Stallman is very much an autocrat. Perhaps only a truly stubborn and difficult person could have accomplished as much as he has, building an entire free-software world from nearly nothing. Also, the xemacs source code was written according to object-oriented principles, making it possible for multiple programmers to easily divide the responsibility. GNU emacs, by contrast, is in classic procedural code, and is quite likely so large that only a genius-level programmer could hope to maintain it. That one-in-a-billion programmer is, of course, Richard M. Stallman.

4. NCSA httpd --> Apache Web server

These days, the world's standard Web server package is the Apache package, maintained by the all-volunteer Apache Group. (That is not to say that they don't make money: When it comes to Web consulting, members of the Apache Group such as Brian Behlendorf have practically a license to print cash, when it comes to Web consulting because of their well-earned fame.)

But, before there was an Apache, you ran either the University of Illinois at Urbana-Champlain National Center for Supercomputing Applications' "NCSA httpd" (HyperText Transport Protocol daemon) or the Geneva-based CERN center's "CERN httpd". The NCSA daemon was smaller and faster, while the CERN one was famous mostly for association with the creator of the Web, Tim Berners-Lee, who worked as a researcher at CERN.

CERN's httpd (later called "W3C httpd") was always public-domain software (i.e., nobody owned it). It's no longer maintained -- a dead project. It's unclear what NCSA httpd's license was originally, but when that project died (1996) its license was a "free for non-commercial usage only" one.

In any event, the story is that an on-line group of programmers who had been producing patches (modifications) for the NCSA httpd eventually decided that they'd produced their own variant in 1995, forking the code. "Apache" was originally just Brian Behlendorf's temporary code name for the project, but fellow developers then pointed out the name's appropriateness ("a-patchy" server = "apache"; get it?), and it stuck.

In any event, this is an instance of why and how open-source projects fork benignly, for good reason: Development at NCSA had stalled after the package's original creator, Rob McCool, left the Center. If that happened to a proprietary product, it would just die, leaving all its users in the lurch. However, because the product was so useful, the Apache Project forked the source code and kept driving it forward. It now dominates all Web servers, regardless of their marketing and development budgets.

5. gcc --> pgcc --> egcs --> gcc

Here's an odd one. Richard M. Stallman (remember him?) founded in 1984 the GNU Project, which produced the immensely important GNU C Compiler ("gcc"). gcc is designed to work on just about any remotely feasible computer, not just the Intel x86 series. So, it might just have been other priorities that delayed improved Intel support. Specifically, well into 1997, the best gcc could do for code optimization on Intel was to set the compiler for 486 chips. People pleaded with Stallman for Pentium optimization, but he stubbornly ignored them.

So, an ad-hoc Pentium Compiler Group (including participation from the same CYGNUS Corporation that was just bought by Red Hat Software, Inc.) first developed a very fast gcc-variant called "pgcc" (Pentium gcc), and then as a peace offering to Stallman developed "egcs" (Experimental GNU Compiler System), intended to be merged back into gcc.

For whatever reason, Stallman's Free Software Foundation (developers of the GNU Project) continued to act as if egcs didn't exist. So, GNU/Linux distributions began to emerge based on egcs, and the free-software world began to mostly ignore gcc.

This can be seen as a variant on the Apache experience. The ability to fork means that progress will not be impeded by a developer not wanting to move forward: Somebody else can, as gracefully as possible, assume the leadership role and (if necessary) fork the project.

However, this necessity was averted in the egcs case. In April 1999, the FSF re-merged egcs into the (would-be) main gcc branch, and handed over all future development to the egcs team, thereby resolving the conflict.

6. glibc --> Linux libc --> glibc

This is a nearly mirror-image case. Any Unix relies extremely heavily on a library of essential functions called the "C library". For the GNU Project, Richard M. Stallman's (remember him?) GNU Project wrote the GNU C Library, or glibc, starting in the 1980s. When Linus and his fellow programmers started work on the GNU/Linux system (using Linus' "Linux" kernel), they looked around for free-software C libraries, and chose Stallman's. However, they decided that Stallman's library (then at version 1-point-something) wasn't moving quickly enough, felt they could adapt it for the Linux kernel themselves, and so decided to fork off their own version, dubbed "Linux libc". Their effort continued through versions 2.x, 3.x, 4.x, and 5.x, but in 1997-98 they noticed something disconcerting: Stallman's glibc, although it was still in 1-point-something version numbers, had developed some amazing advantages. Its internal functions were version-labelled so that new versions could be added without breaking support for older applications, it did multiple language support better, and it supported multiple execution threads.

The GNU/Linux programmers decided that, even though their fork seemed a good idea at the time, it had been a strategic mistake. Adding all of Stallman's improvements to their mutant version would be possible, but it was easier just to re-standardize onto glibc. So, glibc 2.0 and above have been slowly adapted as the standard C Library by GNU/Linux distributions.

The version numbers were a minor problem: The GNU/Linux guys had already reached 5.4.47, while Stallman was just hitting 2.0. They probably pondered for about a millisecond asking Stallman to make his next version 6.0 for their benefit. Then they laughed said "This is Stallman we're talking about, right?", and decided out-stubborning Richard was not a wise idea. So, the convention is that Linux libc version 6.0 is the same as glibc 2.0.

7. Sybase --> Microsoft SQL Server

Woody Allen has a saying that "The lion may lie down with the lamb, but the lamb won't get much sleep". Much the same can be said of companies that enter "industry alliances" with Microsoft Corporation. One of the several slow-learner corporations to make this mistake was Sybase Corporation, publisher of the Sybase Structured Query Language (SQL) database package for numerous Unixes and NetWare. As part of the alliance, Microsoft sold Sybase to its customers, relabelled as Microsoft SQL Server, and got access to Sybase's source code under non-disclosure agreement.

Then, predictably, Microsoft broke the alliance when it had learned all it could from Sybase, and reintroduced Microsoft SQL Server as its own product in competition with Sybase. I do not know if current MS SQL Server versions are rewritten from scratch or retain Sybase code under license terms, so this may not be a legitimate case of forking (let alone open source), but it's similar enough I thought I should mention it.

ANALYSIS: WHY OPEN-SOURCE FORKING IS BOTH RARE AND BENIGN

You, the reader, can fork any Open Source project at any time. This is absolutely not cause for alarm. Let's prove it: Get a copy of the current Linux kernel from ftp://ftp.kernel.org/. Rename it. Call it Fooware OS. Send out messages to everywhere you can think of, announcing that Fooware OS has splintered off from Linux, and great things are expected of it.

Wait for reactions. Wait some more. Listen to the clock ticking. Sort your lint collection. Open up the source code tree, think about what you might do with it, and wonder where you're going to find the time.

Well, that's a little unfair: You're probably not a programmer. Let's imagine that you are. You're a ninja programmer with mighty code-fu, a drive to succeed, and a disciplined team of programmer henchmen. So, you don't just listen to the clock tick, but get some really good work done. You improve the heck out of the kernel, in fact.

And then the Linux people smile broadly, and quite sincerely tell you "Thank you very much." Like effective programmers the world over, they know programming is difficult work. They are constructively lazy. That is, they're not proud, and are glad to use other people's work -- when that's allowed.

Oh, you forgot that your work was under the GPL, didn't you? By forking off and working on a GPL'ed work (the Linux kernel), you consented to issuing your improvements under the GPL also, for other people to freely use. So, you only thought you were creating Fooware OS; in fact, you were creating a better Linux.

That's why forking is uncommon in open-source code, and even more so in (specifically) GPL'ed code: The improvements one group makes in its would-be "fork" are freely available to the main community.

But, as we have seen from the mostly non-GPL examples above, forking is nonetheless not only always an option, but is a vital safety valve in case the existing developers (1) stop working on the project, or (2) decide to stand in the way of progress. The fact that this can occur is A Good Thing.

A third reason for forking also exists, and may hit the GNU/Linux community eventually: specialization. You may recall that this is what ultimately happened with the three free BSD variants -- although stress from the clash-of-the-titans AT&T v. U.C. Berkeley lawsuit arguably made that situation unique.

That is, somebody may eventually propose to the Linux kernel team some extension that's simply outside the scope of the project, and yet builds enough support behind it, and has enough reason for existing, that it proceeds anyway. In that case, Linux will fork -- and it will be a good thing, because then there will be two strong projects instead of one, each concentrating on an important niche that the other cannot fill.

If that happens, the forks would undoubtedly share code and information exactly as the BSD variants do, to prevent duplication of effort, and because it makes sense to do so. And the world will be richer for both the fork and the sharing.


  
Reader Rating from 1-5

 

Poor very 

1

2

3

4

5
 very Excellent

Talkback

Post Your Talkback | View All Talkback (0 Posted)


 Currently there are no Talkback posted on "How to keep open source and GPL strong ?", Click here to be the first to post a talkback.


 
Scroll Up

   About | Term of Use | Privacy | Contact us | Tell a Friend | Advertise  

OSForge News RSS Feed