Part II: Choosing and installing the DB engine in your Linux box:
6. Databases for Linux
7. PostgreSQL
8. mySQL
9. Oracle
10. Which one should you use?
11. Next issue
There are several databases for Linux. Right now the most used databases are PostgreSQL, mySQL and Oracle. Others not so often used but compatible are mSQL, Informix and DB2. In this installment we will talk a little about each one, how to get them and which one is best tailored to your needs, assuming that you have already designed your application and outlined your database.
Usually a database is installed in your Linux box and runs as a daemon process to service requests such as inquiries and updates. Each database has its own daemons and they work differently from one flavor to another. For example, Oracle uses a daemon to monitor the database (called Server Manager) and another one for each database (called listener), PostgreSQL only has one daemon for all its databases (called postmaster). These services are just regular processes running on the server in a specific port, like the httpd daemon. Usually you must shut them down with utilities that come with the database; if you kill the running processes or shut down the server without stopping the services, you may damage the database files.
A database can be running on the same server as the application, or on a dedicated box. If the application is complex and the database is often updated, it'd be a good idea to separate the database and the application server to improve performance. If the application is simple you can install the database in the same server. From your code you can connect to the database regardless its location.
When you separate the application and the database servers, that's called a two-tier architecture. Other architectures are called n-tier where n denotes the quantity of independent layers that your system contains.
Databases save data files in different ways. For example, PostgreSQL and mySQL save them in a directory with the database name. Oracle requires you to create dedicated partitions for parts the database: the data, the programs and the C libraries. There are also utilities (shell or command line tools) with which you can do administration tasks from the console window. You'll see them in detail in each database overview.
Most databases require special groups and users to be created during the installation, in order to do all the administrative tasks, so you don't have to do them as root, for security purposes. Normally you will create a specific account for the database -database superuser- and login with that account whenever you want to maintain the database.
Also, let me tell you that mySQL and PostgreSQL are open source databases (under GPL), while Oracle and miniSQL (or mSQL) are not. You can download a single-user non-commercial version of Oracle 8i at the Oracle Technology Network (OTN). You must create an account, but it's free, and you can download mSQL for free if you register in their website.
And, before proceeding, the following structure is utilized for each product:
i) A brief history of...: this section tells you a little bit about
how the database was born
ii) Where to get it: links to web and ftp
sites where you can download the full version (for databases released under GPL)
or the personal version (for commercial databases)
iii) How to install
it: a summary of steps to follow before and during installation. Remember,
this does not replace the installation instructions that come with the database,
it's just an overview to let you know how it will be and what you will
need.
iv) Starting the daemon: this part explains how to start the
different background processes to make the database run. It also includes some
routines to automate the startup.
v) Creating a database: the basic
syntax to create a database
vi) Running the database shell: how to run
the database command prompt where you can type and execute commands to maintain,
update, inquiry and monitor the database and its elements (tables, rules,
sequences, etc.)
vii) The good and the bad side: good and bad things
in a few words.
viii) Comments: these are comments based on my own
experiences
Then, I finish this second part with a recommendation of which database you should use, why, and some recommended books. Enjoy-!
i) A brief history of PostgreSQL: PostgreSQL is a powerful database
engine that runs in almost all Unix and Linux flavors.
In 1986, at
University of Berkeley, California, Michael Stonebreaker started
developing an object relational database server -Postgres- with a team of C
programmers. Then, Andrew Yu and Jolly Chen took over the project
and added SQL capabilities. Yu and Chen left the University, but kept
maintaining and enhancing Postgres (helped by volunteers from different
countries), and then renamed it to PostgreSQL.
PostgreSQL is an open
source software and has gained lots of fans -like myself- from all over the
world.
ii) Where to get it: You can download it for free, from the PostgreSQL website or
from the official ftp site. The latest
release is 7.0.3 and they're working on the relelase 7.1, which
promises to have a lot of enhancements. If you're getting Postgres from a Linux
distribution and it's older version than 7.x, I highly recommend you to upgrade
it since it is much more stable and reliable.
iii) How to install
it: After downloading the compressed file (it usually comes in tar/gz
format) in a tmp directory, you can follow basic directions specified here. The tar/gz
file comes with a file called INSTALL -that contains detailed
instructions.
iv) Starting the daemon: As I mentioned before, you must start the Postgres daemon, called postmaster. The program that launches the daemon is called pg_ctl and is located under the Postgres bin directory.
If you followed the installation instructions, you should now have a Postgres superuser (su) account. After logging in as the Postgres su you must export the environment variables so the postmaster will know where to look for programs, libraries and data. After that, you can start the postmaster with the pg_ctl program by typing:
$ cd /usr/local/pgsql/bin
$ ./pg_ctl -D /usr/local/pgsql/data
start
In order to automate the exports and the postmaster startup, I'd suggest you to modify your .bash_profile script for the Postgres superuser.
The postmaster will start in the default port 5432. The -D option indicates where the database and configuration files reside. These files contain information about the default port, the current postmater pid, library locations, and more. Of course, start is the parameter to start the daemon. If you do a ps -ef, you will actually see two processes running: the pg_ctl and the postmaster.
For security reasons, you should never start the daemon as root.
This process will take care of all the database requests, even if you are connecting to different databases, but they all should be stored under /usr/local/pgsql/data.
If you ever need to shutdown the server or the database server, don't ever do a ps -ef and kill the running processes. Instead, from the Postgres superuser session, do
$ ./pg_ctl -D /usr/local/pgsql/data stop
This will safely shut down the services.
v) Creating a
database: Let's create a test database called 'testdb'. Log in as the
Postgres superuser and type:
$ cd /usr/local/pgsql/bin
$ createdb
testdb
CREATE DATABASE
then type
$ cd /usr/local/pgsql/data/base
$ ls -l
You will notice a new directory, called testdb. If you look under this directory, you will see all the files (those beginning with pg) that Postgres needs in order to manage the database you just created.
vi) Running the database shell: Now that you've already created
the database, let's create a table.
For database administration purposes, all
databases come with a shell, which can be called from the Linux console. From
this shell you can do all the administration and maintenance tasks. Postgres
comes with psql (very similar to the one Oracle has).
Log in as
Postgres superuser, and type
$ cd /usr/local/pgsql/bin
$ psql testdb
to connect to your database. The generic syntax would
be
$ psql <dbname>
When the db shell prompt is ready,
testdb=# create table contact (Code char(10) primary key,Last_Name char(50), First_Name char(50),Address char(100),Phone char(30));
(Don't forget the last semicolon to indicate that the command has ended. You can enter several lines -for easier reading- and execute the whole statement by typing g. You can press the up arrow to recall old commands so you don't have to type them again, in version 7.0 and higher)
This will create the table contact. To continue with the example from
the Part I, create the rest of the tables with the specified layouts.You can get
help for the shell commands by typing h, or by inquiring the Postgres manual
(you can download it from the postgres website
documentation section)
Go to another session and login as root. Then,
# cd /usr/local/pgsql/data/base/testdb
# ls -l | more
You will notice some new files. These files are the tables that you just created from the shell. Postgres stores the tables as files under the directory with the database's name, along with all the indexes, relationships, stored procedures, etc. For example, the table files are named with <table_name>, and the indexes and primary keys with <table_name>_pkey.
vii) The good side and the bad side of Postgres:
-The Good side:
includes production database features (transactions, triggers, views, foreign
keys, locking). Very solid and has lots of user groups.
-The Bad side: no
database replication, full text indexing, maximum of 8 Kb per row (they're
extending it to 32 Kb in the 7.1 version) and loses performance with many
concurrent connections.
viii) Comments: In my opinion, Postgres is the best open source relational database engine for developing solid web-based applications. It's fast, reliable, easy to configure, administer and maintain, and, as with any other open source software, it has worldwide support. It handles transactions, rules, triggers and even memory cursors.
I am currently developing all my web-based applications with Postgres 7.0.3.
I highly recommend it if you want to implement an application that involves
complex database operations such as transactions and masive global
updates.
i) A brief history of mySQL: There is not much information about how
mySQL was born. All I know is that Michael 'Monty' Wideneus was leaded a project
more than 10 years ago and he needed an application to interact with a database.
He tried miniSQL (mSQL) but after playing with it for a while he noticed that it
wasn't fast and flexible enough for his needs. He decided to make a new SQL
interface with the same API interface mSQL uses, but with source code that could
be modified by anyone. Other collaborators enhanced and improved on his work. It
hasn't been discovered why there is a 'my' before the SQL.
mySQL is an
open source database and has been installed on plenty of Linux web servers all
over the world. Notice that most web-hosting companies offer mySQL support. It's
becoming a standard in web-hosting services.
ii) Where to get it: You can download mySQL for free, from the
mySQL official website. The latest stable
version is 3.23. mySQL also comes with some Linux distributions (Mandrake, RedHat, Stormix, SuSe and TurboLinux) but versions may
vary.
iii) How to install it: You have different formats to choose
from while browsing or downloading the installation documentation in the documentation section of the
website. You should read these documents.
After the initial
installation, don't forget to apply the necessary patches (you can
find them in the website, along with the installation instructions).
Once the
installation is done, you should run a utility to configure the way mySQL will
run in your box. Run the configure utility:
$ ./configure
(Do a ./configure --help for syntax help). You will be able to
configure the port, the socket and the path to the mySQL libraries and
configuration files.
After that, you have to create grant tables that mySQL
uses internally, by executing the mysql_install_db utility.
iv) Starting the daemon: mySQL has only one daemon that services all the requests. This daemon can be started in several ways. If you want to start it manually, you can do so by typing:
$ ./safe_mysqld --user=<user> &
or simply
$ ./mysqld &
(safe_mysqld will try to determine first the best options to run mysqld). This will start the server with the parameters specified by the configure utility. mySQL usually runs in port 3306, but you can change that if you want to. You can test if the server is running with the mysqladmin utility
$ ./mysqladmin version
$ ./mysqladmin variables
If you want to start/stop the server automatically, you can use the mysql.server script
$ ./mysql.server start
$ ./mysql.server stop
and add it conveniently in your Linux startup files, or place it in your database superuser ./bash_profile script. To shutdown the server manually, type:
$ ./mysqladmin -u root shutdown
All these programs are usually in the mySQL installation /bin directory.
v) Creating a database: To create a database, you must run the mysqladmin utility. This utility does not only create databases, it also does other administrative tasks:
$ ./mysqladmin create testdb
will create a database called testdb. To get help in mysqladmin commands
$ ./mysqladmin --help
mySQL comes with other many utilities to maintain the database, you can take a look at them in the manual, Chapter 14. They're all scripts under the /bin directory.
mySQL usually stores its database files under /usr/local/mysql/data. You'll see there the database and log files.
vi) Running the database shell: the mysql command line tool is called -of course- mysql. You can run it by typing
$ mysql <database name>
When the shell prompt is ready,
testdb=# create table contact (Code char(10) primary key,Last_Name char(50), First_Name char(50),Address char(100),Phone char(30))g
('g' denotes the end of the command. It will execute the sentence that
you just typed. You can press the up/down arrow keys to recall old
commands).
To get help for the shell commands, simply type -? or
--help.
vii) The good side and the bad side of mySQL:
-The Good side:
excellent performance and reliability, very popular, worldwide support and lots
of user groups. Supports database replication.
-The Bad side: no
transactions, rules, views and subselects. Poor table locking mechanisms.
viii) Comments: mySQL is a simple and fast database engine, supports
large databases, is very popular in web-hosting companies and easy to maintain.
It's also open source and has a lot of user groups. However, it loses some
standard SQL features. For example, it does not have transaction capabilities
(commit/rollback), the table/record locking is quite different (it uses global
variables as flags. As a hacker, I don't like the idea of global
variables much) and is not 100% with SQL ANSI standards. I had to review
and modify all of my stored procedures for table creation, when I moved one of
my applications from mySQL to Oracle and PostgreSQL. It was a quick job, but
still I had to go script by script and check them line by line, the column
types, the index and primary key definition syntax, etc.
The transaction safe
table handler is included in the mySQL TODO list. There is a workaround if you
install the Berkeley DB (Berkeley
DataBase, from the BSD creators), which is an add-on to create BDB tables
within mySQL. I've never tried it, so I can't tell you how well it
works.
mySQL does have database replication, it maintains a secondary backup
database if the primary fails and the configuration is simple.
From my point
of view, if you want to develop an application that does not involve multiple
table updates or complex transactions, mySQL is your best choice, since it's
quite fast, easy to configure, stable and popular in web-hosting companies
-you'll have no problems finding a web-hosting with mySQL. However, if your
application requires multiple or simultaneous table operations, tricky reports
that need temporary workspace to do subqueries to get the final result, or
interaction with external or cross-platform databases, you should consider PostgreSQL or Oracle.
i) A brief history of Oracle: Back in 1977, Larry Ellison and
two partners founded Software Development Laboratories, and decided to
build a new relational database engine, inspired by an IBM white paper. This
project was called Oracle (meaning a source of wisdom) and they enhanced it with
SQL capabilities. Soon they changed the company name to what it is today, Oracle
Corp.
Oracle is currently one of the best databases for both business and
Internet applications. It was originally designed for Unix, DEC and IBM
mainframes, and then extended to OS/2, Linux and Windows NT environments.
They've also developed financial applications (Oracle Relational Financial
Applications) that are used by big corporations like Sony, and also a complete
development solution for e-business called Oracle Internet
Platform.
Oracle is not open source software, it is
expensive and mostly intended for big companies. It's extremely solid, reliable,
and incredibly stable on Unix based platforms.
ii) Where to get it: You can download a single user-single
server Oracle 8i for free, from the Oracle Technology Network (OTN). You
must register but the registration is free, they just send you invitations to
seminars and news about products from time to time. You can also buy a CD that
comes with an Administration book and additional documentation, from MacMillan Software.
iii)
How to install it: If you downloaded it from the OTN, it should come with a
HTML file containing installation instructions. Otherwise you can browse the OTN website for installation
instructions. If you bought the CD, the document is located in
/mnt/cdrom/doc/lin.8xx/index.htm (xx is the version, i/e mine is lin.815
for 8.1.5).
Oracle installation is very picky, read the installation
instructions carefully. I installed Oracle about 15 times when I started playing
with it!
Please remember that Oracle 8.x
requires:
- Kernel 2.2.x
- Any window manager that supports Motif v1.2 (gnome is ok)
- Glibc v2.1 or higher
- The Java Runtime lib (JRE 1.1.6 v5 or higher)
- From 300 to 900 Mb of free disk space (depending on the version)
- 128 Mb RAM (Oracle recommends 256
Mb)
Also, Oracle needs at least two mount points in your box: one for the programs and libraries, and the other for the database files.
* Attention RedHat users! There is a script that you have to run during the installation called runIns.sh, located under the /install directory. For RedHat users, this script won't work, you must execute another version of runIns.sh script which is located under a different directory (it varies in every version of 8.x). Please refer to your installation documentation for further information, or go to the OTN website, but be aware of this compatibility issue.
This is only a brief description of how to install Oracle. The documentation is quite extensive so I'd suggest you to read it from the book or print it from the web prior to the installation.
iv) Starting the daemon: You must start two daemons for Oracle. One of
them is called the Server Manager, which administers the main services; the
other one is called listener and satisfies all the database requests.
You
should have created a Oracle superuser account. After logging in as the Oracle
su you must export the environment
variables so the Server Manager will know where to look for programs,
libraries and data.
Then, you can start the Server Manager by typing:
$ dbstart
(the server manager will
start)
and you can start the 'listener' (or Net8 listener) by typing:
$ lsnrctl start
(the listener will
start)
The listener will start in the default port 1521. To see all the listener parameters
$ lsnrctl services and
$ lsnrctl
status
To automate the server manager and listener
startup, you can include the statements in the .bash_profile
script for the Oracle superuser.
If you like to start the daemon processes
from the root login, you can add this to the .bash_profile script for the
root user
su - oracle8i -c "dbstart"
su - oracle8i -c "lsnrctl
start"
(being oracle8i the Oracle superuser)
There are a lot of configuration files involved in the database warm and start up, it'd be tedious to list them all in this article. I can tell you the most important ones:
- oratab
- sqlnet.ora
- names.ora
- oraenv (for Bourne, or coraenv for C shells)
- initsid.ora
You should keep an eye on them.
To shutdown the server manager and the
listener, from the Oracle superuser session, type:
$ lsnrctl stop
(the listener will stop)
$
dbshut
(the server manager will stop)
This will
safely shut down the services.
v) Creating the database: To create a database, you will need to
use the GUI assistant called dbassist. First, logout your X-Window session and
login again as the Oracle superuser. Second, from your character based Oracle
superuser session, type:
$ dbassist
&
and switch to the X-Window session (it may take a while to start). You will
see a GUI assistant that will guide you through the database creation. It's
quite intuitive, all you have to do is follow the instructions.
Throughout
the creation process the assistant will ask you for the database type: OLTP (On
Line Transaction Process), DSS (Decision Support System) or Hybrid (takes
advantage of both worlds). I usually select Hybrid unless I'm making a very
specific application, so I recommend you to do the same. Then, simply follow the
rest of the instructions. It takes a while to create the database depending on
your hardware and free disk space, so be patient !
vi) Running the database shell: The database shell for Oracle is
called Sqlplus.
Make sure that the server manager and listener are started,
log in as the Oracle superuser and type
$ sqlplus <username>/<password>@<service
name>
When the db shell prompt is ready,
testdb=# create table contact (Code char(10) primary key,Last_Name char(50), First_Name char(50),Address char(100),Phone char(30))
Sqlplus doesn't have a command history, so by pressing the arrow keys will only make garbage on the screen ...
For further information about sqlplus, please go to the OTN website.
Oracle usually
stores the database files under a specific directory with the database name.
What Oracle recommends is two mount points, like:
/u01 for
Oracle programs and libraries
/u02 for Oracle database files
The database files would be located
under:
/u02/oracle/oradata/<database name>: control files for
database
/u02/oradata/<database name>: actual database files
You will notice files with .dbf extension under both directories. The ones under /oradata are the tables you create, plus all the indexes, serializers, relationships, stored procedures, etc.
vii) The good side and the bad side of Oracle:
-The Good side:
Excellent performance, reliability, scalability, robustness in any kind of
environment (Internet, intranet, extranet). Other Oracle products enhance the
database server and make it a complete development environment. Plenty of
documentation and user groups. I think it's the best database server out
there.
-The Bad side: Commercial and expensive (although it's worth the
price). Very tricky to configure and maintain (you must really know Oracle to
administer it).
viii) Comments: Oracle is a powerful, reliable, excellent fault-tolerance database engine. eBay, Amazon and other high traffic websites use it for their database needs, that shows you how solid it is. It handles transactions, triggers, cursors, database replication and distributed databases. From my point of view, Oracle is the best database for Unix and Sun architectures out there. Unfortunately, it's quite expensive, is not open source and is very hard to configure, maintain and get free support. To be an Oracle dba you have courses that last a year or so and cost good money. I highly recommend this database engine only if you are working for a large company that can afford to buy it and hire an Oracle dba. If you're not, you should consider PostgreSQL.
10. Which database should you use?:
Well, it all depends on what you want to do. If you are developing a complex application (for example, an accounting module or a distribution logistic system) I would recommend you to choose Postgres, or, if the company you are working for can afford it, Oracle. Postgres is a serious RDBMS (Relational DataBase Management System) and has almost all Oracle features, but is open source and free, and is always open to modifications if you want to customize the engine.
The key for complex web-based applications are the transaction handler, the locking procedures and the triggers or rules. In a hostile high traffic environment like the Internet, connection problems, possible attacks and power outages are daily situations, and the database must be solid and robust enough to maintain integrity and avoid any kind of data corruption.
Postgres' performance may go down with a considerable number of concurrent connections, but it'll always keep the database safe.
Oracle, on the other hand, is so solid that the performance as well as the data integrity will remain the same no matter how many concurrent connections you have. On the down side, it is pretty tricky to configure (flexibility comes with complexity), expensive, and requires -for a serious installation- good hardware. But the wonderful thing is... if you develop an application for Postgres, you can easily migrate it to Oracle with no major modifications (the only problem would be transferring the data from Postgres to Oracle, but that's another chapter...).
If you want to implement a system that doesn't require transactions,
but must have quick database response with many concurrent connections in small
or large databases, mySQL is your best call. It's open source so you can also
customize it to fit your needs, easy to install-configure-maintain, and stable
and fast in Internet environment with large databases. It's also safe since it
supports database replication. And, you won't have any problem getting a
web-hosting or dedicated server with mySQL support.
Another choice, if you can is to mix both databases (mySQL and Postgres). You
can design an application that uses the two databases
miniSQL is the original version of mySQL, I've never installed or used it, but it lost popularity and is not as enhanced as mySQL.
O'Reilly networks just released a new website called LAMP (Linux-Apache-Mysql-Perl|PHP|Python). It has everything related to the mentioned open source technologies, you can go there to find out more about these architectures.
To finish this second installment, I would suggest you to take a look at these interesting books to have a better idea when choosing your favorite database:
"Oracle 8: The Complete Reference" (George B. Koch,Kevin Loney) -
Osborne/McGraw-Hill
"Postgresql: Introduction and Concepts" (Bruce
Momjian) - Addison/Wesley
"Postgresql Programmer's Guide" (PostgreSQL
Development Team,Thomas Lochart) - iUniverse.com
"MySQL" (Paul DuBois)
- MacMillan Publishing Co.
"MySQL and mSQL" (Randy Jay Yarger,Tim
King,George Reese,Andy Oram) - O'Reilly & Associates
Or, you can go to phpbuilder.com and see an excellent and exhaustive comparison between mySQL and Postgres, by Tim Perdue.
Summary:
-Choosing the database engine carefully for your application is a must-do
before you start programming
-PostgreSQL and Oracle are the most recommended
databases engines for complex applications. PostgreSQL is free under GPL; Oracle
is not
-mySQL is the best choice for fast, reliable web-based applications
and has a lot of related websites and user groups.
The next issue will be 'Interfacing Perl with a Database (I)'. Part three of this series will go over DBD-DBI architectures, how to get and install them, and the code to connect, retrieve and save data to the database.
Marcelo "Taz" writes for the Linux.com Develop Section. He's an active member of the Florida Linux User Xchange (please visit www.flux.org), and, when not hacking, he likes to hang out with friends and travel with his soulmate Carina.
