2012-06-27

The "Cursed" NULL in postgres

Comparison to NULL

In postgres, NULL is treat as a speical value, that which is not equal to any other value, which means the expression NULL = NULL yields false.
It can be verified by using the following query

SELECT NULL


SELECT n
FROM unnest(ARRAY(NULL,1,2,3,4,5)) n
WHERE n = NULL

The query returns empty set, because no element equals to NULL even NULL itself.
If you think this experiment is not convincing enough, then you can try this:

CASE NULL


SELECT
	n,
	CASE WHEN n = NULL THEN 'NULL' ELSE 'NOT NULL' END
FROM unnest(ARRAY(NULL,1,2,3,4,5)) n

This query should yield “ , NOT NULL” which means NULL does not equal to NULL.

To test whether or whether not a value equals to NULL, you should use IS NULL or IS NOT NULL

So if you replace the n = NULL with n IS NULL in previous 2 statements, you will get expected result:

SELECT NULL


SELECT
	n,
	CASE WHEN n IS NULL THEN 'NULL' ELSE 'NOT NULL' END
FROM unnest(ARRAY(NULL,1,2,3,4,5)) n
WHERE n IS NULL

NULL in Crosstab

In most cases, the NULL special character doesn’t hurt much, since we always can alter our expression to fix the problem. But if you’re using table functions to create pivot, and there is NULL in your columns, then you will find NULL is a cursed value, which brought a lot of trouble to you.

Postgres provide tablefunc extension, which can provide a series functions called “crosstab#”. And with these functions, you can convert a set of rows into a pivot table.

Function crosstab accept 2 sql queries. First query should yield 3 columns: row in pivot table, column in pivot table and value in pivot table.
Second query should yield a series of value which defines the columns of the pivot table.
crosstab function will group the rows yield by 1st query by 1st column. Then map each row in the group to a column by comparing the 2nd column in the row yield by 1st query with the value generated by 2nd query, if the value is equal then the 3rd column of the row yield by 1st query will be placed in the column defined by value of the 2nd query.
It if a very convenient feature provided by postgres, and it works perfect in most cases.

But in our case, we met a problem that we have NULL value in the 2nd column of 1st query, which means we have NULL value in pivot table columns!

And the crosstab decide which column the value should be placed to by comparing the whether the value is equal!

And NULL never equals to NULL!

BAM!!!

As a result, the column of NULL in the pivot table is always empty!

CASE WHEN

To solve the problem, we should play a little trick in the first query, we should translate all the NULL into a “normal” value.
Here is our 1st query, and we want to get a pivot table with period as row axis, rating as column axis and volume of the order as content.

Original 1st query


SELECT
	Periods.period,
	Profiles.rating,
	SUM(Orders.volume)
FROM
	orders
		LEFT OUTTER JOIN Periods ON (Periods.id = Orders.period_id)
		LEFT OUTTER JOIN Profiles ON (Profiles.id = Orders.profile_id)
GROUP BY
	Periods.period,
	Profiles.rating
ORDER BY
	Periods.period,
	Profiles.rating

And we will get NULL in Profiles.rating, so we can translate NULL as 0 in rating. To achieve this we can use CASE WHEN statement.

Original 1st query with CASE WHEN


SELECT
	Periods.period,
	CASE WHEN Profiles.rating IS NULL THEN 0 ELSE Profiles.rating,
	SUM(Orders.volume)
FROM
	orders
		LEFT OUTTER JOIN Periods ON (Periods.id = Orders.period_id)
		LEFT OUTTER JOIN Profiles ON (Profiles.id = Orders.profile_id)
GROUP BY
	Periods.period,
	Profiles.rating
ORDER BY
	Periods.period,
	Profiles.rating

COALESCE

The solution works fine. But in personal perspective, I don’t like it, because it repeat the statement and is not concise. But luckily, the value we should deal with the the special value NULL and postgres has provided a group of functions to deal with NULL.

What we want is the function COALESCE, which accept a group of value as arguments, and returns the first not null value.
So we can simplify our statement with this super function:

Original 1st query with COALESCE


SELECT
	Periods.period,
	COALESCE(Profiles.rating, 0)
	SUM(Orders.volume)
FROM
	orders
		LEFT OUTTER JOIN Periods ON (Periods.id = Orders.period_id)
		LEFT OUTTER JOIN Profiles ON (Profiles.id = Orders.profile_id)
GROUP BY
	Periods.period,
	Profiles.rating
ORDER BY
	Periods.period,
	Profiles.rating

In the statement, if the rating is not null, the COALESCE function will return the actual value if the rating is NULL, then the COALESCE will find the next not null value, which must be 0.

Besides COALESCE function, there is another function called NULLIF, which might mislead you to a totally wrong way just as what I had.
According to postgres document, the function might behave in a totally opposite way than you expected.

The NULLIF function returns a null value if value1 equals value2; otherwise it returns value1. This can be used to perform the inverse operation of the COALESCE

postgres9.2 Doc]

ThoughtWorkshop

Digital Bigs in my thought

The "Cursed" NULL in postgres

Comparison to NULL

NULL in Crosstab

CASE WHEN

COALESCE