Could history repeat itself with Linux? It is true that the GNU General
Public License (GPL) gives Open Source developers the explicit right to mutate
any GPL'd project, thus opening the door to a potentially injurious "code fork."
But, in a seeming paradox, Open Source developers regard the right to fork as a
mechanism that protects against the actual possibility of a fork. At the same
time, they view the absence of that right in the Sun Community Source License (SCSL),
used for Java, Solaris, and Jini, earns derision for the SCSL as a feeble
substitute for the more permissive GPL.
This poses a challenge for Linux implementors in the business world: they
probably have to work hard to fight customer fears that GNU/Linux will fragment
into a hundred incompatible versions because there's no single big corporation
in charge. And here I come, saying isn't it wonderful that Open Source licenses
guarantee everyone the right to do just that.
Sound contradictory? OK, here's the quick and dirty answer. The detailed one
comes later:
- Linux won't fork because the fork-er has to do too much work for no
payoff: any worthwhile improvements will be absorbed into the main branch
while the rest of the fork will turn be discarded/ignored as pointless.
- The above phenomenon occurs with Linux, even though it hasn't with earlier
projects, precisely because of the effect of Linux's source-code license.
NOTABLE PAST INSTANCES OF FORKING
1. Unix --> dozens of proprietary mutant corporate Unixes
If you've read up on Unix history, you know that Unix was a freak product of
AT&T's Bell Labs division, around 1969. I'll omit most of the long story, but
the most important fact to know is that AT&T was then operating under a
Department of Justice anti-trust judgment (which expired around 1980)
prohibiting it from entering the computer/software business. So, it could not
legally sell Unix, but instead sold source-code licenses (and occasionally also
the right to use the trademarked name "Unix") to (1) universities, such as U.C.
Berkeley, and (2) companies such as IBM, Apple, DEC, Data General, SGI, SCO, HP,
etc.
Those companies bought the right to make their own Unixes: IBM released AIX.
Apple did A/UX. DEC did Ultrix, OSF/1, and Digital Unix (later renamed "Compaq
Unix" and now "Compaq Tru64"). Data General did DG/UX, SGI did IRIX, HP did HP/UX,
and SCO did XENIX which eventually mutated into SCO UNIX. And we could cite
others, but I'll spare you.
The point is that these were the jokers who ruined Unix. Every one of them
marketed his mutant Unix as "Unix plus" -- everything the other guys have and
more. Needing to create differentiators, they deliberately made their Unixes
incompatible while giving lip service to "standards".
For customers, this was simply a mess, and Microsoft drove right through the
disunity like a Sherman tank. It is the classic instance of forking that
sticks in people's minds.
2. BSD --> FreeBSD, NetBSD, OpenBSD, BSD OS, MachTen, NeXTStep (which has
recently mutated into Apple Macintosh OS X Server), and SunOS (now called
Solaris)
As I mentioned above, antitrust-limited AT&T, not being able to sell Unix
itself, gave out very cheap Unix source-code licenses to universities including
U.C. Berkeley. UCB took the lead in the academic world: Having access to the
source code, they quickly realized that they could rewrite it to make it much
better, and slowly did so. Their rewrite was dubbed "BSD" (Berkeley Software
Distribution), and they were glad to share it with anyone similarly having an
AT&T Unix source licence.
And their work was generally a great deal better than Bell Labs', partly
because it benefitted from worldwide peer review in a very open-source-like
fashion. Over quite a few years, they gradually replaced almost all of the AT&T
work, without (at first) really intending to.
One fine day, grad student Keith Bostic came to the BSD lead developers,
inspired by Richard M. Stallman's (remember him?) GNU Project, and suggested
replacing BSD's remaining AT&T work to create a truly free BSD. Dreading the
confrontation likely to result with AT&T, they tried to stall by assigning
Bostic the difficult part of this task, rewriting some key BSD utilities. This
back-fired when he promptly did exactly that. So, they grumbled but then
completed the job, and tried to prevent AT&T from noticing what they had done.
AT&T did notice, panicked, and sued. That, too, is a long story best omitted.
Under the stress of the lawsuit, freeware BSD split into three camps (FreeBSD,
NetBSD, and OpenBSD). But there were also several proprietary branches, made
possible because U.C. Berkeley's "BSD License" allowed creation of those: Sun
Microsystems' SunOS, Tenon Intersystems' MachTen, BSDI's BSD OS, and NeXT
Computer's NeXTStep OS all came out for sale without public access to
source, and were all based on the Berkeley BSD source code.
Note the distinction: If you write a program and release the source code
under the GNU General Public License (GPL), other people who sell or otherwise
release derived works that incorporate your work must release their source code
under GPL conditions. The same is not true if you release your work under the
BSD License: Anyone else can create a variant form of your work and refuse to
release his source-code modifications. (In other words, he is allowed to create
proprietary variants.)
A word about the three free BSD variants: All three were splinters from a
now-dead project called 386BSD. All have talked about re-merging in order to
save duplication of effort, but they now persist as separate projects
because they've specialized: FreeBSD aims for the highest possible stability on
Intel x86 CPUs, NetBSD tries to run on as many different CPU types as possible,
and OpenBSD aims to have the tightest security possible. In other words, the
386BSD project remains forked because there are compelling reasons that make
this a win for everyone.
Also, where possible, these three sister projects collaborate on tough tasks
-- and they also collaborate with GNU/Linux programmers. Some of the best
hardware drivers in the Linux kernel are actually BSD drivers. There's a high
level of compatibility among the three BSDs and between them and GNU/Linux:
Unlike the proprietary Unix vendors, BSD and GNU/Linux programmers have an
incentive to eliminate incompatibility and support standards.
3. emacs --> Lucid emacs --> xemacs other proprietary emacses, now --> mostly
forgotten GNU emacs
To call emacs a "text editor" is a bit like calling the QE2 a "boat". This
program is nominally a simple text-handling program designed to process macros
(thus the name, which stands for "editing macros"), but you can spend all your
computing time inside it and accomplish just about anything that can be done on
a computer.
It was written by legendary programmer Richard M. Stallman (remember him?),
back in the days when everyone Stallman dealt with assumed you would share your
source code, and did so. So, the only license for early versions of emacs was an
implied "Do whatever you want".
Unfortunately for Stallman, what a number of companies wanted to do with his
work was make proprietary branches of it. This came as a surprise and
disappointment to him, and was one of the reasons he wrote the GNU GPL and
started the GNU Project (to create a permanently free Unix-like OS called
"GNU").
All but one of the proprietary emacs variants have now died: Lucid
Corporation, before it dissolved, sold "Lucid Emacs" to Sun Microsystems, which
decided to confusingly rename it "xemacs" (confusing because there's no special
support for the X Window System that's not also in other emacses), and
eventually gave it to some programmers who decided to re-release its source code
as free software under Richard M. Stallman's GNU GPL.
Meanwhile, Stallman continued to maintain the original emacs as "GNU emacs",
and placed all his new work under the GNU GPL specifically so that the Lucid
debacle could not recur.
People often ask why the xemacs people do not re-merge their work with
Stallman's version, to save duplication of effort. The obstacles are both
personal and technical, and sometimes difficult to distinguish. First, Stallman
is very much an autocrat. Perhaps only a truly stubborn and difficult person
could have accomplished as much as he has, building an entire free-software
world from nearly nothing. Also, the xemacs source code was written according to
object-oriented principles, making it possible for multiple programmers to
easily divide the responsibility. GNU emacs, by contrast, is in classic
procedural code, and is quite likely so large that only a genius-level
programmer could hope to maintain it. That one-in-a-billion programmer is, of
course, Richard M. Stallman.
4. NCSA httpd --> Apache Web server
These days, the world's standard Web server package is the Apache package,
maintained by the all-volunteer Apache Group. (That is not to say that they
don't make money: When it comes to Web consulting, members of the Apache Group
such as Brian Behlendorf have practically a license to print cash, when it comes
to Web consulting because of their well-earned fame.)
But, before there was an Apache, you ran either the University of Illinois at
Urbana-Champlain National Center for Supercomputing Applications' "NCSA httpd" (HyperText
Transport Protocol daemon) or the Geneva-based CERN center's "CERN httpd". The
NCSA daemon was smaller and faster, while the CERN one was famous mostly for
association with the creator of the Web, Tim Berners-Lee, who worked as a
researcher at CERN.
CERN's httpd (later called "W3C httpd") was always public-domain software
(i.e., nobody owned it). It's no longer maintained -- a dead project. It's
unclear what NCSA httpd's license was originally, but when that project died
(1996) its license was a "free for non-commercial usage only" one.
In any event, the story is that an on-line group of programmers who had been
producing patches (modifications) for the NCSA httpd eventually decided that
they'd produced their own variant in 1995, forking the code. "Apache" was
originally just Brian Behlendorf's temporary code name for the project, but
fellow developers then pointed out the name's appropriateness ("a-patchy" server
= "apache"; get it?), and it stuck.
In any event, this is an instance of why and how open-source projects fork
benignly, for good reason: Development at NCSA had stalled after the package's
original creator, Rob McCool, left the Center. If that happened to a proprietary
product, it would just die, leaving all its users in the lurch. However, because
the product was so useful, the Apache Project forked the source code and kept
driving it forward. It now dominates all Web servers, regardless of their
marketing and development budgets.
5. gcc --> pgcc --> egcs --> gcc
Here's an odd one. Richard M. Stallman (remember him?) founded in 1984 the
GNU Project, which produced the immensely important GNU C Compiler ("gcc"). gcc
is designed to work on just about any remotely feasible computer, not just the
Intel x86 series. So, it might just have been other priorities that delayed
improved Intel support. Specifically, well into 1997, the best gcc could do for
code optimization on Intel was to set the compiler for 486 chips. People pleaded
with Stallman for Pentium optimization, but he stubbornly ignored them.
So, an ad-hoc Pentium Compiler Group (including participation from the same
CYGNUS Corporation that was just bought by Red Hat Software, Inc.) first
developed a very fast gcc-variant called "pgcc" (Pentium gcc), and then as a
peace offering to Stallman developed "egcs" (Experimental GNU Compiler System),
intended to be merged back into gcc.
For whatever reason, Stallman's Free Software Foundation (developers of the
GNU Project) continued to act as if egcs didn't exist. So, GNU/Linux
distributions began to emerge based on egcs, and the free-software world began
to mostly ignore gcc.
This can be seen as a variant on the Apache experience. The ability to fork
means that progress will not be impeded by a developer not wanting to move
forward: Somebody else can, as gracefully as possible, assume the leadership
role and (if necessary) fork the project.
However, this necessity was averted in the egcs case. In April 1999, the FSF
re-merged egcs into the (would-be) main gcc branch, and handed over all future
development to the egcs team, thereby resolving the conflict.
6. glibc --> Linux libc --> glibc
This is a nearly mirror-image case. Any Unix relies extremely heavily on a
library of essential functions called the "C library". For the GNU Project,
Richard M. Stallman's (remember him?) GNU Project wrote the GNU C Library, or
glibc, starting in the 1980s. When Linus and his fellow programmers started work
on the GNU/Linux system (using Linus' "Linux" kernel), they looked around for
free-software C libraries, and chose Stallman's. However, they decided that
Stallman's library (then at version 1-point-something) wasn't moving quickly
enough, felt they could adapt it for the Linux kernel themselves, and so decided
to fork off their own version, dubbed "Linux libc". Their effort continued
through versions 2.x, 3.x, 4.x, and 5.x, but in 1997-98 they noticed something
disconcerting: Stallman's glibc, although it was still in 1-point-something
version numbers, had developed some amazing advantages. Its internal functions
were version-labelled so that new versions could be added without breaking
support for older applications, it did multiple language support better, and it
supported multiple execution threads.
The GNU/Linux programmers decided that, even though their fork seemed a good
idea at the time, it had been a strategic mistake. Adding all of Stallman's
improvements to their mutant version would be possible, but it was easier just
to re-standardize onto glibc. So, glibc 2.0 and above have been slowly adapted
as the standard C Library by GNU/Linux distributions.
The version numbers were a minor problem: The GNU/Linux guys had
already reached 5.4.47, while Stallman was just hitting 2.0. They probably
pondered for about a millisecond asking Stallman to make his next version 6.0
for their benefit. Then they laughed said "This is Stallman we're talking
about, right?", and decided out-stubborning Richard was not a wise idea. So, the
convention is that Linux libc version 6.0 is the same as glibc 2.0.
7. Sybase --> Microsoft SQL Server
Woody Allen has a saying that "The lion may lie down with the lamb, but the
lamb won't get much sleep". Much the same can be said of companies that enter
"industry alliances" with Microsoft Corporation. One of the several slow-learner
corporations to make this mistake was Sybase Corporation, publisher of the
Sybase Structured Query Language (SQL) database package for numerous Unixes and
NetWare. As part of the alliance, Microsoft sold Sybase to its customers,
relabelled as Microsoft SQL Server, and got access to Sybase's source code under
non-disclosure agreement.
Then, predictably, Microsoft broke the alliance when it had learned all it
could from Sybase, and reintroduced Microsoft SQL Server as its own product in
competition with Sybase. I do not know if current MS SQL Server versions are
rewritten from scratch or retain Sybase code under license terms, so this may
not be a legitimate case of forking (let alone open source), but it's similar
enough I thought I should mention it.
ANALYSIS: WHY OPEN-SOURCE FORKING IS BOTH RARE AND BENIGN
You, the reader, can fork any Open Source project at any time. This is
absolutely not cause for alarm. Let's prove it: Get a copy of the current Linux
kernel from ftp://ftp.kernel.org/. Rename
it. Call it Fooware OS. Send out messages to everywhere you can think of,
announcing that Fooware OS has splintered off from Linux, and great things are
expected of it.
Wait for reactions. Wait some more. Listen to the clock ticking. Sort your
lint collection. Open up the source code tree, think about what you might do
with it, and wonder where you're going to find the time.
Well, that's a little unfair: You're probably not a programmer. Let's imagine
that you are. You're a ninja programmer with mighty code-fu, a drive to succeed,
and a disciplined team of programmer henchmen. So, you don't just listen to the
clock tick, but get some really good work done. You improve the heck out of the
kernel, in fact.
And then the Linux people smile broadly, and quite sincerely tell you "Thank
you very much." Like effective programmers the world over, they know programming
is difficult work. They are constructively lazy. That is, they're not proud, and
are glad to use other people's work -- when that's allowed.
Oh, you forgot that your work was under the GPL, didn't you? By forking off
and working on a GPL'ed work (the Linux kernel), you consented to issuing your
improvements under the GPL also, for other people to freely use. So, you only
thought you were creating Fooware OS; in fact, you were creating a better Linux.
That's why forking is uncommon in open-source code, and even more so in
(specifically) GPL'ed code: The improvements one group makes in its would-be
"fork" are freely available to the main community.
But, as we have seen from the mostly non-GPL examples above, forking is
nonetheless not only always an option, but is a vital safety valve in case the
existing developers (1) stop working on the project, or (2) decide to stand in
the way of progress. The fact that this can occur is A Good Thing.
A third reason for forking also exists, and may hit the GNU/Linux community
eventually: specialization. You may recall that this is what ultimately happened
with the three free BSD variants -- although stress from the clash-of-the-titans
AT&T v. U.C. Berkeley lawsuit arguably made that situation unique.
That is, somebody may eventually propose to the Linux kernel team some
extension that's simply outside the scope of the project, and yet builds enough
support behind it, and has enough reason for existing, that it proceeds anyway.
In that case, Linux will fork -- and it will be a good thing, because
then there will be two strong projects instead of one, each concentrating on an
important niche that the other cannot fill.
If that happens, the forks would undoubtedly share code and information
exactly as the BSD variants do, to prevent duplication of effort, and because it
makes sense to do so. And the world will be richer for both the fork and the
sharing.