2012-06-27

The "Cursed" NULL in postgres

Comparison to NULL

In postgres, NULL is treat as a speical value, that which is not equal to any other value, which means the expression NULL = NULL yields false.
It can be verified by using the following query

SELECT NULL


SELECT n
FROM unnest(ARRAY(NULL,1,2,3,4,5)) n
WHERE n = NULL

The query returns empty set, because no element equals to NULL even NULL itself.
If you think this experiment is not convincing enough, then you can try this:

CASE NULL


SELECT
	n,
	CASE WHEN n = NULL THEN 'NULL' ELSE 'NOT NULL' END
FROM unnest(ARRAY(NULL,1,2,3,4,5)) n

This query should yield “ , NOT NULL” which means NULL does not equal to NULL.

To test whether or whether not a value equals to NULL, you should use IS NULL or IS NOT NULL

So if you replace the n = NULL with n IS NULL in previous 2 statements, you will get expected result:

SELECT NULL


SELECT
	n,
	CASE WHEN n IS NULL THEN 'NULL' ELSE 'NOT NULL' END
FROM unnest(ARRAY(NULL,1,2,3,4,5)) n
WHERE n IS NULL

NULL in Crosstab

In most cases, the NULL special character doesn’t hurt much, since we always can alter our expression to fix the problem. But if you’re using table functions to create pivot, and there is NULL in your columns, then you will find NULL is a cursed value, which brought a lot of trouble to you.

Postgres provide tablefunc extension, which can provide a series functions called “crosstab#”. And with these functions, you can convert a set of rows into a pivot table.

Function crosstab accept 2 sql queries. First query should yield 3 columns: row in pivot table, column in pivot table and value in pivot table.
Second query should yield a series of value which defines the columns of the pivot table.
crosstab function will group the rows yield by 1st query by 1st column. Then map each row in the group to a column by comparing the 2nd column in the row yield by 1st query with the value generated by 2nd query, if the value is equal then the 3rd column of the row yield by 1st query will be placed in the column defined by value of the 2nd query.
It if a very convenient feature provided by postgres, and it works perfect in most cases.

But in our case, we met a problem that we have NULL value in the 2nd column of 1st query, which means we have NULL value in pivot table columns!

And the crosstab decide which column the value should be placed to by comparing the whether the value is equal!

And NULL never equals to NULL!

BAM!!!

As a result, the column of NULL in the pivot table is always empty!

CASE WHEN

To solve the problem, we should play a little trick in the first query, we should translate all the NULL into a “normal” value.
Here is our 1st query, and we want to get a pivot table with period as row axis, rating as column axis and volume of the order as content.

Original 1st query


SELECT
	Periods.period,
	Profiles.rating,
	SUM(Orders.volume)
FROM
	orders
		LEFT OUTTER JOIN Periods ON (Periods.id = Orders.period_id)
		LEFT OUTTER JOIN Profiles ON (Profiles.id = Orders.profile_id)
GROUP BY
	Periods.period,
	Profiles.rating
ORDER BY
	Periods.period,
	Profiles.rating

And we will get NULL in Profiles.rating, so we can translate NULL as 0 in rating. To achieve this we can use CASE WHEN statement.

Original 1st query with CASE WHEN


SELECT
	Periods.period,
	CASE WHEN Profiles.rating IS NULL THEN 0 ELSE Profiles.rating,
	SUM(Orders.volume)
FROM
	orders
		LEFT OUTTER JOIN Periods ON (Periods.id = Orders.period_id)
		LEFT OUTTER JOIN Profiles ON (Profiles.id = Orders.profile_id)
GROUP BY
	Periods.period,
	Profiles.rating
ORDER BY
	Periods.period,
	Profiles.rating

COALESCE

The solution works fine. But in personal perspective, I don’t like it, because it repeat the statement and is not concise. But luckily, the value we should deal with the the special value NULL and postgres has provided a group of functions to deal with NULL.

What we want is the function COALESCE, which accept a group of value as arguments, and returns the first not null value.
So we can simplify our statement with this super function:

Original 1st query with COALESCE


SELECT
	Periods.period,
	COALESCE(Profiles.rating, 0)
	SUM(Orders.volume)
FROM
	orders
		LEFT OUTTER JOIN Periods ON (Periods.id = Orders.period_id)
		LEFT OUTTER JOIN Profiles ON (Profiles.id = Orders.profile_id)
GROUP BY
	Periods.period,
	Profiles.rating
ORDER BY
	Periods.period,
	Profiles.rating

In the statement, if the rating is not null, the COALESCE function will return the actual value if the rating is NULL, then the COALESCE will find the next not null value, which must be 0.

Besides COALESCE function, there is another function called NULLIF, which might mislead you to a totally wrong way just as what I had.
According to postgres document, the function might behave in a totally opposite way than you expected.

The NULLIF function returns a null value if value1 equals value2; otherwise it returns value1. This can be used to perform the inverse operation of the COALESCE

postgres9.2 Doc]

2012-06-02

Practice►Mac

Practice

Install specific version of tool with HomeBrew

HomeBrew is a convenient package manager for Mac user. For some reason I prefer Home Brew to Mac Ports.
Brew has a younger package repository since it has shorter history comparing to MacPorts. Younger repository means less options. And sometime it is hard for you to install the old-fashioned tool with brew.

Brew uses git to manage its formula repository, so you can list with git.
Typically, the repo is located at /usr/local. But since this path can be changed, so it is safer to reference this path via brew.
Brew call the repo path as prefix, so you can reference the path with brew --prefix
You can use the following shell command to enter the brew repo.

1	cd $(brew --prefix)

Since brew load formula from local, so before we install the app with brew, we need to ensure the repo is updated. We can use the following command to update the brew repo:

# Update brew
brew update
# update with git
cd $(brew --prefix) && git pull --rebase

To install specific version of the app, we need to checkout the specific version of the formula, we can get the versions and related git revision by brew versions command, and checkout specific version, then install the app:

brew versions postgresql
# Output:
# 9.1.3    git checkout e088818 /usr/local/Library/Formula/postgresql.rb
# 9.1.2    git checkout dfcc838 /usr/local/Library/Formula/postgresql.rb
# 9.1.1    git checkout 4ef8fb0 /usr/local/Library/Formula/postgresql.rb
# 9.0.4    git checkout 2accac4 /usr/local/Library/Formula/postgresql.rb
# 9.0.3    git checkout b782d9d /usr/local/Library/Formula/postgresql.rb
# 9.0.2    git checkout 2c3b88a /usr/local/Library/Formula/postgresql.rb
# 9.0.1    git checkout b7fab6c /usr/local/Library/Formula/postgresql.rb
# 9.0.0    git checkout 1168d8f /usr/local/Library/Formula/postgresql.rb
# 8.4.4    git checkout c32bea0 /usr/local/Library/Formula/postgresql.rb
# 8.4.3    git checkout 9b2ef7c /usr/local/Library/Formula/postgresql.rb
# 8.4.1    git checkout 0495cf5 /usr/local/Library/Formula/postgresql.rb
# 8.4.0    git checkout a82e823 /usr/local/Library/Formula/postgresql.rb
git checkout a82e823 /usr/local/Library/Formula/postgresql.rb
brew install postgresql

If we cannot find the specific version that we want (Such as Postgres 8.3.11). don’t be disappointed, we can try to search the version repository.
Some of the old-fashioned tool which is not included in brew’s master repo might be provided in version repository.

Begin from Brew 0.9 provide the multiple repository support, user can use brew tap command to register alternative repositories besides the master repo. There are quite a some interesting alternative repos, such as versions and games.
These official alternative repos can be found on github

The formulas in alternative repositories cannot be used directly, but luckily the official ones are included in the search result.


brew search postgresql
# Output:
# postgresql
# homebrew/versions/postgresql8    homebrew/versions/postgresql9

In the search result, we can see there are 2 formulas are displayed with a path rather than just the formula name, which means these formulas are in a alternative repo.
The path to the formula follows the convention: <github username>/<repository name without "homebrew-">/<formula name>.
So homebrew/versions/postgresql8 means the file is located at https://raw.github.com/Homebrew/homebrew-versions/master/postgresql8.rb

To install it, we can install it directly or tap the repo first:


# Install directly
brew install homebrew/versions/postgresql8
# Tap
brew tap homebrew/versions
brew install postgres8

ThoughtWorkshop

Digital Bigs in my thought