Tuesday, January 24, 2012

How to Access a PostgreSQL Database from Any Language

If you’re a system administrator, chances are you use programs that interface with databases every day. One great advantage of open source software is that you can modify your applications’ code to customize it for your needs. If your application uses PostgreSQL on the back end, it’s not hard to access your database from a variety of languages. Here’s how to get started, whether your code is in C, C++, PHP, Tcl, Python, or Perl.
If you’re a complete neophyte when it comes to programming, this process probably isn’t for you. However, if you have at least basic knowledge of SQL and of at least one programming language, I’ll walk you past the first hurdles. You can also turn to the PostgreSQL manual for specific information. While you’re probably running Linux, the same program code should work on other operating systems, as long the same client interfaces are available there. My examples rely on Postgres 9.1, the latest stable release, being already installed and tested. My setup has the server running on a different machine from the client, but on the same LAN; in such a setup, you can easily connect to the server with the command pgsql -U postgres -h x.x.x.x.
Once you know your server accepts connections, make sure you have the rights to install software on the client machine. Also make sure that the database you’re working with is for testing only, so as not to accidentally corrupt anything crucial.

C Interface

C is a popular language when it comes to interfacing with a database. Many database systems are themselves written in C. The language offers speed and flexibility, so if you want to write a client interface, be it console-only or GUI-based, and you don’t want to hear about browsers, C might be your best choice.
The software you need to interface C with Postgres is named libpq. It comes bundled with the PostgreSQL source tree. If you installed the database system from binaries rather than from source, you can always get libpq separately, but keep in mind that you need the -dev package as well (or -devel, depending on your Linux distribution). On Debian and derivatives, the command to install it is # aptitude install libpq-dev. On Red Hat-based systems like CentOS, you can find libpq in the postgresqlxx-libs package, where xx is the major and minor version number. So if you want to interface with a PostgreSQL 9.1 database, you will need to install postgresql91-libs, or postgresql-devel if you’re running Fedora 16. The difference in the names is because RHEL/CentOS users use mostly the Postgres repository, while Fedora users have more up-to-date packages and need a third-party repository only if they’re after an older version. Long story short, any system that supports PostgreSQL will have libpq available, one way or another.
libpq is the API engine for not only C, but also C++, Perl, and Tcl. It gives you the basic functions to connect, query, and alter a database. Most of the functions that you’ll use frequently have names that start with “PQ,” such as PQconnectdb or PQerrormessage. For more examples, turn to the PostgreSQL documentation, or look in src/test/examples. In your C program files, include libpq-fe.h and add the appropriate linker flag, which is -lpq when compiling.
The first thing you’ll want to do is connect to an existing database. PQconnectdb() takes an argument in the form of const char *conninfo that contains the keywords you want, such as dbname=[database_name], and you use this pairing format with the other keywords as well. The most used ones are host, hostaddr (numeric form, to avoid useless DNS lookups), port, user, password, and sslmode. If you don’t provide any arguments, the default options will be used, which is probably not what you want. Suppose your database server has the address 192.168.0.101, the username is postgres, and the database name is testdb1. A first attempt to connect might look like this:
#include 
#include
int main(int argc,char argv[])
{
const char *conninfo;
PGconn *conn;
if (argc > 1)
conninfo = argc[1];
else
{
printf("Not enough arguments, exiting...");
return 1;
}
conn = PQconnectdb(conninfo);
/*Check to see how I did */
if(PQstatus(conn) = CONNECTION_OK)
printf("Connection succeeded.\n");
else
{
/*Do something to deal with the error*/
}
}
Save this code as testlibpq, compile it, and use it like this:
$ testlibpg "hostaddr=192.168.0.101 user=postgres dbname=testdb1"
If everything goes well, you should see “Connection succeeded.” on the screen. That’s gratifying, but so far not very practical. To get anything useful, you need to query the database – but since I started with connecting, I should tell you that in order to disconnect, you call PQfinish, which returns a void and takes *conn as the sole argument.
PQexec, the function for executing queries, returns a PGresult and takes *conn and const char *command as arguments. In our example, let’s just declare a PGresult variable and send a command to the server. I leave to you the part where you check whether the connection is there, and error checking.
PGresult *res;
res = PQexec(conn, "SELECT * FROM mydatabase");
PQclear(res);
Needless to say, the code presented here is nowhere near complete. Its purpose is to show you what the library offers, not to offer ready-to-use code. Nevertheless, with this code, res contains the result of the query, and you can parse it at will. PQresultStatus lets you know how the command went, and returns a status of the form PGRES_COMMAND_OK or PGRES_FATAL_ERROR. You can find a comprehensive list of the exec functions on the PostgreSQL project’s website.
Let’s move on to some more useful functions. For example, PQntuples, given res as an argument, returns the number of columns in the table as an int. It takes a PGresult object as an argument if the query’s status is PGRES_TUPLES_OK, and returns an integer value. PQnfields gives you the number of columns in each row. PQfname returns the column name associated with a number, while PQfnumber does the opposite. To get the value of a single cell, pass to PQgetvalue the usual PG result, along with the cell’s column and row numbers.
All this is good, but PQexec isn’t a genius when it comes to handling multiple SQL commands. Because it returns only one structure, if you have more than one command, it will take into consideration only the results from the last command. Another drawback is that PQexec waits for a command until it’s completed, so you must take extra care when it comes to blocking execution. You can use a series of functions if these shortcomings affect you, such as PQsendQuery and PQgetResult, and use them intelligently for async-friendly code.

C++ Interface

The official PostgreSQL client API for C++ is called libpqxx, but it doesn’t come bundled with the PostgreSQL source. Instead, you can install it from your distribution’s repository or just download it. The README file says you can use libpqxx on Linux, BSD, Solaris, Irix, HP-UX, AIX, and Windows, with Cygwin. You must install libpq before installing libpqxx, as the latter is based on the former.
In C++, the namespace is named pqxx, and you must include a header with that name. Here’s a simple connection and query:
#include 
#include

using namespace std;

int main()
{
pqxx::connection conn;
pqxx::work w(conn);
pqxx::result res = w.exec("SELECT 1");
w.commit();

cout << res[0][0].as() << endl;
}
This program will print 1 if everything goes as it should. Use and abuse the try/catch keywords to defend against connection errors. If you want to use argv[x], as I did in the C example, you can concatenate the string like this:
pqxx::result res = w.exec("SELECT" + w.quote(argv[1]));
Again, not very useful so far, so how do you actually access results that a query returns? Since res, which stores the data, is an array of tuples, you can use for loops to iterate through the elements:
for (int rownr=0; rownr < res.size(); ++rownr)
{
const result::tuple row = res[rownr];
for (int colnr=0; colnr < row.size(); ++colnr)
{
const result::field = row[colnr];
cout << field.c_str() << \t;
}
cout << endl;
}
Remember that I used using namespace std;; if you don't use it, alter the lines with cout and endl accordingly.
libpqxx offers a lot of options when it comes to string conversion. The most interesting functions are from_string and to_string; the first basically takes a string argument and a T & obj, which is an existing built-in type, and the second takes only the T & obj, converting it to a string. The complete documentation on this is a useful read.

Connections, Queries, and Transactions

For the rest of this section, I recommend you insert using namespace pqxx;, as I did when writing the code, in order to save some typing, so you won't have to type pqxx::.... With that in mind, let's open a connection to an existing database and perform a transaction and then a query. You will notice the code does about the same as the C code above.
I can use a constructor that takes only one argument, again a string, which will define the connection options (dbname, user, and so on). Since the default username is postgres, I won't include it in my string this time.
connection Conn("dbname=testdb1 hostaddr=192.168.0.101");
I now have an open connection, provided everything went well (again, take care of exceptions!). This, however, is not enough for executing queries – you need to open a transaction by using a transactor, which will save you hassle and time when a connection drops off on you. A transactor is a functor, and functors are handy when dealing with problems that create connections, when you need to know the state of said connection. You can't get that information with simple functions or function pointers, given the way the language is designed.
A functor is a function object, so you can treat it as a function that is state-aware. Let me explain. The operator() in C++ can be overloaded, so you can put any number of elements inside the parentheses. You could say that a functor is a class that defines operator(). If you want a function that adds two values (given as arguments), you would have to hard-code those values. Functors take care of the hardcode limitation, allowing you to use a constructor and add a new object that "overrides" the first values to be added, thus giving the developer more flexibility.
Now that you know what a functor is, let's see how to use the transactor.
// I already have a connection open, as above
class Transaction : public transactor<>
{
public:
void operator () (transaction<> & t)
{
t.exec ("INSERT INTO mytable VALUES(val1,val2)");

}

void on_abort (const string & msg)
{
cout << "Transaction failed with message: " << msg << endl;
}
};
//Inside main()
conn.perform (Transaction());
As you can see, coding queries in C++ isn't hard. You can define transactions and properly format the results you get from the queries (think stringstream), and the rest is easy to accomplish, using loops.

PHP Interface

PHP is widely used for database integration with web pages. The language offers a simple way for programmers to interface with databases.
Make sure you have PHP installed and its PostgreSQL module, which is named php5-pgsql on Debian and derivatives and php-pgsql if you're using CentOS 5. If you compile from source, use the --with-pgsql=[DIR] configure option, where DIR is the directory where Postgres lies. You can use phpinfo() on a web page to check whether PostgreSQL support is enabled and what options have what values. Note that the examples below work with PHP 5, and not necessarily with older versions.
It's more straightforward to open a connection, make a query, and print out the results using PHP than it is with C/C++:
xxx")
or die('Could not connect: ' . pg_last_error());
$query = 'SELECT * FROM mytable';
$result = pg_query($query) or die('Query failed: ' . pg_last_error());
echo "
\n";
while ($line = pg_fetch_array($result, null, PGSQL_ASSOC)) {
echo "\t
\n";
foreach ($line as $col_value) {
echo "\t\t


\n";
}
echo "\t

\n";
}
echo "

$col_value
\n";
//I printed the result in a nice table; thanks to the PHP manual
pg_free_result($result);
pg_close($dbconn);
?>
As you can see, connecting, making a query, and using the results are all a breeze. However, beware when you get user/password information interactively from users. A SQL injection attack might gain a malicious individual access to your database, so make sure you keep informed about security issues.
When working with arrays, and you will need them sooner or later, you can start with the pg_fetch_array() function and move from there:
//I presume I have a connection and a result/query that returns something suitable
$arr = pg_fetch_array($result);
echo $arr[0];
Look at other pg_fetch* functions and use one that fits the task you need to do. pg_num_fields and pg_num_rows can be useful when you want to alter the way you display the results from a query depending on their size.

Tcl Interface

The Tcl language is known for its string processing capabilities, so it might be a good choice if you want powerful parsing of query results. Another cool feature is security, because PL/Tcl (yes, we're talking about an integrated Tcl interface provided by PostgreSQL) has no means to access the inner workings of the database server. If, however, you do need more capabilities, you can look at PL/TclU, where U stands for untrusted. You need superuser access to the database to use it, and the usual extra care is recommended.
Because PL/Tcl is integrated, you don't need to install anything before you use it, but you do need to add to your code the line
CREATE EXTENSION pltcl #or pltclu
Since PL/Tcl is integrated into the Postgres system, it's more useful when it comes to functions and triggers rather than queries and transactions. Generally speaking, languages starting with PL/ are more useful for extending some capabilities instead of providing a complete database-interfacing environment. The general syntax to create your own functions takes the form
CREATE FUNCTION name (argtypes) RETURNS type AS $$
#here goes the function body
$$ LANGUAGE pltcl;
So, for instance, here's a simple add function:
CREATE FUNCTION tcl_add (integer, integer) RETURNS integer AS $$
return $1 + $2
$$ LANGUAGE pltcl;
If you use STRICT at the end of the line with the LANGUAGE instruction, you don't have to check for null input. If you don't want to use STRICT, use some if blocks and the argisnull keyword (the manual has more on this).
Using values from the database is easy. Say you have a table with computer specs and you want a function to see whether the hardware in question is obsolete and should be changed. You could use code like this:
CREATE TABLE machine (
name text,
cpuspeed integer #Mhz
memory integer #MB
hard_drive integer #GB
);
CREATE FUNCTION isObsolete (machine) RETURNS boolean AS $$
if {$1(cpuspeed) < 266 && $1(memory) < 256 && $1(hard_drive) < 2} {
return "t"
}
return "f"
$$ LANGUAGE pltcl;
As a small mnemonic, remember the string "spi" (server programming interface) for functions you can use to select, insert, and update objects in a Postgres database. spi functions, such as spi_exec, spi_prepare, spi_execp, and quote, are somewhat similar to PQ functions in libpq. spi_exec takes a SQL command and executes it, and returns the number of rows it processed. You can specify the max number of rows to be returned, and save the values of the selected columns inside Tcl variables, as follows.
spi_exec "SELECT count(*) AS cnt FROM mytable"
spi_prepare prepares and stores a query for later use. When you're ready to use it, you can do that with spi_execp. spi_prepare returns a query ID, so that spi_execp knows what to execute. Finally, quote is useful when single quotes and/or escape chars give you nothing but trouble. Here's an example:
"SELECT $myvar AS ret"
# myvar contains the string "won't"
# when expanding, the parser will give you a nice error message.
"SELECT '[ quote $val ]' AS ret" # this will do just fine

Python Interface

To use PL/Python, make sure you have the following line in the database code you want to work with:
CREATE EXTENSION plpythonu
The final "u" in the above statement stands for untrusted, as we saw with PL/Tcl. Since PostgreSQL 7.4 it's the only available variant. With it, your code can execute with the privileges of a DBA, which again means you must take extra steps to ensure everything is right. PL/Python supports Python 2 and 3, with 2 as the default, which is to say that the plpythonu extension refers to Python 2. If you need one or the other explicitly, use plpython2u or plpython3u.
Function creation in PL/Python has the exact same syntax as the Tcl example, except that you substitute pltcl with plpythonu. Since I don't want to be redundant, I won't repeat the code here. Of course, the function body is simply Python code, as you can see in this reiteration of the add function:
CREATE FUNCTION pyadd (x integer, y integer)
RETURNS integer
AS $$
return x + y
$$ LANGUAGE plpythonu;
Before you move on, I recommend you see the subchapter in the manual (42.3.1 and following) on data mapping between PostgreSQL and Python. I guarantee this will save you lots of headaches later.
Using Postgres with Python and even the function names will seem familiar if you read the Tcl section above. Here, the module is called plpy, it's imported automatically, and it offers two functions, execute and prepare:
myobj = plpy.execute("SELECT * FROM mytable", 10)
After you execute this statement, you can access myobj by indexes, as you'd do with any array-type variable. Just as in Tcl, plpy.execute can limit the number of rows it returns; I specified 10.
Some other functions you might find useful are easy to understand because they're named appropriately: plpy.debug(message), plpy.log(message), plpy.info(message), plpy.notice(message), plpy.warning(message), and plpy.error(message). Quoting functions, similar to those in the Tcl example, include quote_nullable(string) and quote_ident(string).

Perl Interface

Perl's string-processing capabilities are second to none, but it's also adept at handling complex query results. Remember to use CREATE EXTENSION plperl in your database code.
Creating a function with PL/Perl is just as simple as it would be with Tcl or Python; just replace the last line with $$ LANGUAGE plperl;. Here's a simple function that returns the bigger of two integers:
CREATE FUNCTION perlmax (integer, integer) RETURNS integer AS $$
if ($_[0] > $_[1]) { return $_[0]; }
return $_[1];
$$ LANGUAGE plperl;
If you choose to use STRICT, and I personally recommend it, the function will not get executed at all if it receives null input, and you'll have to write some conditional code to take care of that possibility:
CREATE FUNCTION perl_max (integer, integer) RETURNS integer AS $$
my ($x, $y) = @_;
if (not defined $x) {
return undef if not defined $y;
return $y;
}
return $x if not defined $y;
return $x if $x > $y;
return $y;
$$ LANGUAGE plperl;
Remember spi_exec from PL/Tcl? In PL/Perl it's called spi_exec_query and, you guessed it, it does the same thing with the same arguments, the limiter on the rows returned being optional. However, the manual warns us of shortcomings when having huge result sets, so the query I used as an example for Tcl becomes:
$myobject = spi_exec_query('SELECT * FROM mytable', 10);
#Let's get the value of row $i corresponding to mycolumn
$myrow = $myobject->{rows}[$i]->{mycolumn};
#Here's the total number of rows resulted from a query
$nrows = $myobject->{processed};
#What's the command's status?
$rp = $myobject->{status};
As you would expect, the same logging, error message, and quoting functionalities we've seen before are present here as well: elog(level, message) emits a log/warning/error message, quote_literal(string) and quote_ident(string) do what you'd expect, looks_like_number(string) returns true if the arguments seems like a string, and is_array_ref(arg) returns true if the argument can be treated as an array reference.
Had enough? I'll stop here. This should be enough to get you started and to give you an idea of how simple program interfaces to a PostgreSQL database can be.

No comments:

Post a Comment