Use Postgres Multiple Schema Database in Rails

Postgres provided a very interesting feature called “Schema” in addition to other “normal” database features, which provide a extra layer between database and tables. So with schema, you can have tables with same name in one database, if they are in different schemas.
To me schema is not a good idea! I assume “table-space” or even “namespace” could be a better name. In fact, there are a number of people agree that schema is not a good name:

“Schema” is such a terrible name for this feature. When most people hear the term “schema” they think of a data definition of some sort. This is not what PostgreSQL schemas are. I’m sure the PostgreSQL devs had their reasons, but I really wish they would have named it more appropriately. “Namespaces” would have been apropos.

Anywho, the easiest way for me to describe PostgreSQL schemas (besides telling you that they are, indeed, namespaces for tables) is to relate them to the UNIX execution path. When you run a UNIX command without specifying its absolute path, your shell will work its way down the $PATH until it finds an executable of the same name.


And you can find more here

And there is a popular routine is to use the postgres schema for sub-domains. For example, you’re a BSS provider, you rend your BBS apps to different organizations. To the organizations. they want to have its own BBS app instance running independently, the most important is that data should be stored into separated spaces, and could be accessed from its own domain name. But to you, for administration, you want they share the same backend management console.
In this case the best way to solve the problem is to store the data owned by different subsystem into different schemas. But store all the administration data into a single schema or even in public schema.
The same guy Jerod has a post described how to build this kind of system in details. There are a bunch of posts described how to build the system like this, which could be found by googling easily.
And there is even a ruby gem called apartment from Brad Robertson to support this kind of system

This idea looks fancy, but unless you 10000% certain that the sub-systems will keep its independent status and without any collaboration forever.
Or it sooner or later, you will find the “fancy idea” become a horrible idea.
When time goes by, there could be more and more and more features that required to add collaboration between sub-systems. Such as provide a unified authentication mechanism, so user can logged in once and switch between different systems easily. Or administrator might ask for a unified statistics graph for all sub-systems.
All these requirements are related to cross-schema query! To be honest, cross query in some cases could be painful!
And it brings trouble to all aspects in your system, such as data migration, test data generation, etc.

That’s what exactly happens in my current project!
My current project is Rails 3 project, the codebase is brand new but built on a legacy multiple schema postgres database. And for some reason,
we must keep the multiple schema design unchanged. But our goal is to unify the separated subsystem into a more closed-collaborated system.

Since ActiveRecord in Rails doesn’t include the native support to this fancy feature. Which means you will met problem during migration, or even preparing test data with factory-girl.

Postgres allows to locate the table in different schemas with full qualified name like this <schema name>.<table name>.<column name>. The schema name is optional, when you omitted the schema name, Postgres will search the table in a file-system-path-like order called “search-path“.

And you can set and query current search path with Postgres SQL statements:

Query and Set search_path
1
2
3
SHOW search_path;
SET search_path TO <new_search_path>;

Since ActiveRecord won’t add the full qualified schema name in front of the table name when it translate the ARel into SQL statements. So we can only support the multiple schema database with the search_path.

Basically, it is a very natural idea that you can use the following ruby code to make ActiveRecord make query on different schemas:

Select Schema
1
2
3
4
5
6
7
8
9
def add_schema_to_search_path(schema)
ActiveRecord::Base.connection.execute "SET search_path TO #{schema}, public;"
end
def restore_search_path
ActiveRecord::Base.connection.execute "SET search_path TO public;"
end

This two methods work perfect when querying things from the database. But sooner or later, you will run into big trouble when you try to write data into database.

In db migration or use factory_girl to generate test fixtures, you might found that the data you insert in different schemas finally goes into the first non-public schema. But all the query still works perfect!

We found this problem occurs when the following conditions are satisfied:

  1. Query are happened in a Transaction.
  2. You insert data into multiple non-public schemas.
  3. You user SET search_path TO SQL statement to switch between schema rather than explicitly using full-qualified table name.

And the most interesting thing is that:

  1. All the SELECT queries are executed on schemas correctly
  2. If you use SHOW search_path; to query current search path, you will got correct search path value.
  3. All data are inserted into first non-public schema that you actually wrote data into. So which means it you try to insert data into public schema, it won’t go wrong. Or you switched to a non-public schema, but actually you doesn’t insert any rows, it also won’t be impacted.

To solve this problem, I spent 2 nights and 2 days to digged into the source code of ActiveRecord gem and pg gem (the Postgres database adapter).
And finally I solved the problem by using the attribute on PostgreSQLAdapter.

Basically, instead of using the SQL query, you should use the PostgreSQLAdapter#schema_search_path and PostgreSQLAdapter#schema_search_path= to get and set the search path.
And if you dig into the source code, you will find the two methods does the exact same thing as we did except it assigned one more instance variable @schema_search_path.

methods on PostgreSQLAdapter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Sets the schema search path to a string of comma-separated schema names.
# Names beginning with $ have to be quoted (e.g. $user => '$user').
# See: http://www.postgresql.org/docs/current/static/ddl-schemas.html
#
# This should be not be called manually but set in database.yml.
def schema_search_path=(schema_csv)
if schema_csv
execute("SET search_path TO #{schema_csv}", 'SCHEMA')
@schema_search_path = schema_csv
end
end
# Returns the active schema search path.
def schema_search_path
@schema_search_path ||= query('SHOW search_path', 'SCHEMA')[0][0]
end

The most interesting thing is if you search the reference to @schema_search_path, you will find it is only used as a local cache of current search_path in the adapter, and it is initialized with the value from the query SHOW search_path; if it is nil, and then keep the value as the cache!
This implementation is buggy and caused the problems described before!

If we use the SQL query to set the search path rather than calling schema_search_path=, we won’t set the @schema_search_path at sametime, ideally this value will remain nil by default. Then transaction or other object in ActiveRecord call schema_search_path to get current search path. The first time, the variable @@schema_search_path is nil, and will be initialized by the value from query SHOW search_path; and then won’t changed any more, since in the future this query won’t be executed any longer.
As a result, the schema will be switched successfully for the first time, but failed in the following.

Which means at current stage, if you want to change search_path, the only correct way is to use PostgreSQLAdapter#schema_search_path=, and PLEASE PLEASE ignore the warning "This should be not be called manually but set in database.yml." in the source code! It is really a misleading message!

I understand current implementation is for performance consideration, but caching the value is absolutely not a good idea when you cannot keep things in sync and the sync is critical in some cases.
I’m planning to fix this issue in rails codebase and create a pull request to rails maintainer. Wish they could accept this fix. Or at least they should change the warning message.

And besides of using the out-dated and mysterious PgTools mentioned in a lot of posts (I saw a lot of people mentioned this class, but I cannot find it anywhere even I from google or github. It is really a mystery). I create a new utility module called MultiSchema.

You can use it as the utility class in the old-fashioned way:

Procedure usage
1
2
3
MultiSchema.with_in_schemas :except => :public do
# Play around the data in one schema
end

Or you can use it in a DSL-like way:

DSL usage
1
2
3
4
5
6
7
8
9
class SomeMigration < ActiveRecord::Migration
include MultiSchema
def change
with_in_schemas :except => :public do
# Play around the data in one schema
end
end
end

with_in_schemas method accept both symbol and string, and you can pass single value, array or hash to it.

  • with_in_schemas yield all user schemas in the database
  • with_in_schemas :only => %w(schema1 schema2) populates all given schemas.
  • with_in_schemas :except => %w(schema1 schema2) populates all except given schemas.
  • with_in_schemas :except => [:public] is equivalent to with_in_schemas :except => ['public']
  • with_in_schemas :only => [:public] is equivalent to with_in_schemas :only => :public and equivalent to with_in_schemas :public
  • with_in_schemas :except => [:public] is equivalent to with_in_schemas :except => :public

Handle dynamic argument in dynamic language

In dynamic language, most language doesn’t provide the function overload mechanism like static language does, which means you cannot define functions with same name but different arguments, and the language itself won’t help you to dispatch the call according to the arguments.
So you have to deal with the overload by yourself in dynamic languages.

In dynamic language, since you have to dispatch the call by yourself, so you can play some tricks during process the arguments. The most common trick is grammar sugar, which can help you simplify the code and increase the readability. And it is a very important feature when you’re building DSL.

Syntax sugar in argument means you can omit something unnecessary or unimportant parameters in specific context, then the function will try to infer and complete the parameters. For example, developer should be able to omit the password parameter if the system enable authentication mechanism. These kind of tricks are very common in dynamic language frameworks, such as jQuery.

Here is a list of some common grammar sugar cases:

  • Set default value for omit parameter, such as for jQuery animation function jQuery.fadeIn([duration] [, callback] ), you can omit the parameter duration, which is equivalent to provide duration as 400
  • Provide different type of value for a same parameter, still the jQuery animation function jQuery.fadeIn([duration] [, callback] ), you can provide number for duration, or you can provide string “fast” and “slow” as the value of duration.
  • Provide a single value rather than a complex hash, such as function jQuery ajax function jQuery.ajax(settings), you can provide a url string, which is equivalent to provide a hash{url: <url string>}
  • Pass a single element instead of a array of the element.

After we analyze the cases, we will find that all the grammar sugar is to allow user to provide exactly same information but in different formats. Since all the data are same piece of information, so it should be possible to unify all the information into one format! Which means to support grammar sugar, the major problem is to unify the type of parameter.
Besides unify the type, another important problem is to handle the null values, such asnull and undefined in javascript, nil in ruby, etc.

Here is a piece of ruby code that I parse the argument by unifying the parameter type:

code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def apply_to_items(options = nil)
options = unify_type(options, Hash) { |items| {:only => items} }
options[:only] = unify_type(options[:only], Array) { |item| item.nil? ? list_all_items : [item] }
options[:except] = unify_type(options[:except], Array) { |item| item.nil? ? [] : [item] }
options[:only] = unify_array_item_type(options[:only], String) { |symbol| symbol.to_s }
options[:except] = unify_array_item_type(options[:except], String) { |symbol| symbol.to_s }
target_items = options[:only].select { |item| options[:except].exclude? item }
target_items.each do |item|
yield item
end
end
private
def list_all_items
# return all items fetched from database or API
# ...
end
def unify_type(input, type)
if input.is_a?(type)
input
else
yield input
end
end
end

And here is the test for the code:

title
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
require 'spec_helper'
describe Module do
before do
extend Moudle
end
it "should populate all items" do
visited = []
apply_to_items do |item|
visited << item
end
visited.should =~ %w(public another_item)
end
describe "should populate the provided items" do
it "provide as string array" do
visited = []
apply_to_items(%w(another_item)) do |item|
visited << item
end
visited.should =~ %w(another_item)
end
it "provide as symbol array" do
visited = []
apply_to_items([:another_item]) do |item|
visited << item
end
visited.should =~ %w(another_item)
end
it "provide as string item" do
visited = []
apply_to_items('another_item') do |item|
visited << item
end
visited.should =~ %w(another_item)
end
it "provide as symbol item" do
visited = []
apply_to_items(:another_item) do |item|
visited << item
end
visited.should =~ %w(another_item)
end
it "provide as string array in hash" do
visited = []
apply_to_items(:only => %w(another_item)) do |item|
visited << item
end
visited.should =~ %w(another_item)
end
it "provide as symbol array in hash" do
visited = []
apply_to_items(:only => [:another_item]) do |item|
visited << item
end
visited.should =~ %w(another_item)
end
it "provide as string item in hash" do
visited = []
apply_to_items(:only => 'public') do |item|
visited << item
end
visited.should =~ %w(public)
end
it "provide as symbol item in hash" do
visited = []
apply_to_items(:only => :public) do |item|
visited << item
end
visited.should =~ %w(public)
end
end
describe "should except the not used items" do
it "except as string item in hash" do
visited = []
apply_to_items(:except => 'public') do |item|
visited << item
end
visited.should =~ %w(another_item)
end
it "except as symbol item in hash" do
visited = []
apply_to_items(:except => :public) do |item|
visited << item
end
visited.should =~ %w(another_item)
end
it "except as string array in hash" do
visited = []
apply_to_items(:except => %w(public)) do |item|
visited << item
end
visited.should =~ %w(another_item)
end
it "except as symbol array in hash" do
visited = []
apply_to_items(:except => :public) do |item|
visited << item
end
visited.should =~ %w(another_item)
end
end
end

The algorithm in previous code is language independent, so ideally, it could be reused in any language, such as java script or python.

The "Cursed" NULL in postgres

Comparison to NULL

In postgres, NULL is treat as a speical value, that which is not equal to any other value, which means the expression NULL = NULL yields false.
It can be verified by using the following query

SELECT NULL
1
2
3
4
5
SELECT n
FROM unnest(ARRAY(NULL,1,2,3,4,5)) n
WHERE n = NULL

The query returns empty set, because no element equals to NULL even NULL itself.
If you think this experiment is not convincing enough, then you can try this:

CASE NULL
1
2
3
4
5
6
SELECT
n,
CASE WHEN n = NULL THEN 'NULL' ELSE 'NOT NULL' END
FROM unnest(ARRAY(NULL,1,2,3,4,5)) n

This query should yield “ , NOT NULL” which means NULL does not equal to NULL.

To test whether or whether not a value equals to NULL, you should use IS NULL or IS NOT NULL

So if you replace the n = NULL with n IS NULL in previous 2 statements, you will get expected result:

SELECT NULL
1
2
3
4
5
6
7
SELECT
n,
CASE WHEN n IS NULL THEN 'NULL' ELSE 'NOT NULL' END
FROM unnest(ARRAY(NULL,1,2,3,4,5)) n
WHERE n IS NULL

NULL in Crosstab

In most cases, the NULL special character doesn’t hurt much, since we always can alter our expression to fix the problem. But if you’re using table functions to create pivot, and there is NULL in your columns, then you will find NULL is a cursed value, which brought a lot of trouble to you.

Postgres provide tablefunc extension, which can provide a series functions called “crosstab#”. And with these functions, you can convert a set of rows into a pivot table.

Function crosstab accept 2 sql queries. First query should yield 3 columns: row in pivot table, column in pivot table and value in pivot table.
Second query should yield a series of value which defines the columns of the pivot table.
crosstab function will group the rows yield by 1st query by 1st column. Then map each row in the group to a column by comparing the 2nd column in the row yield by 1st query with the value generated by 2nd query, if the value is equal then the 3rd column of the row yield by 1st query will be placed in the column defined by value of the 2nd query.
It if a very convenient feature provided by postgres, and it works perfect in most cases.

But in our case, we met a problem that we have NULL value in the 2nd column of 1st query, which means we have NULL value in pivot table columns!

And the crosstab decide which column the value should be placed to by comparing the whether the value is equal!

And NULL never equals to NULL!

BAM!!!

As a result, the column of NULL in the pivot table is always empty!

CASE WHEN

To solve the problem, we should play a little trick in the first query, we should translate all the NULL into a “normal” value.
Here is our 1st query, and we want to get a pivot table with period as row axis, rating as column axis and volume of the order as content.

Original 1st query
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT
Periods.period,
Profiles.rating,
SUM(Orders.volume)
FROM
orders
LEFT OUTTER JOIN Periods ON (Periods.id = Orders.period_id)
LEFT OUTTER JOIN Profiles ON (Profiles.id = Orders.profile_id)
GROUP BY
Periods.period,
Profiles.rating
ORDER BY
Periods.period,
Profiles.rating

And we will get NULL in Profiles.rating, so we can translate NULL as 0 in rating. To achieve this we can use CASE WHEN statement.

Original 1st query with CASE WHEN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT
Periods.period,
CASE WHEN Profiles.rating IS NULL THEN 0 ELSE Profiles.rating,
SUM(Orders.volume)
FROM
orders
LEFT OUTTER JOIN Periods ON (Periods.id = Orders.period_id)
LEFT OUTTER JOIN Profiles ON (Profiles.id = Orders.profile_id)
GROUP BY
Periods.period,
Profiles.rating
ORDER BY
Periods.period,
Profiles.rating

COALESCE

The solution works fine. But in personal perspective, I don’t like it, because it repeat the statement and is not concise. But luckily, the value we should deal with the the special value NULL and postgres has provided a group of functions to deal with NULL.

What we want is the function COALESCE, which accept a group of value as arguments, and returns the first not null value.
So we can simplify our statement with this super function:

Original 1st query with COALESCE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT
Periods.period,
COALESCE(Profiles.rating, 0)
SUM(Orders.volume)
FROM
orders
LEFT OUTTER JOIN Periods ON (Periods.id = Orders.period_id)
LEFT OUTTER JOIN Profiles ON (Profiles.id = Orders.profile_id)
GROUP BY
Periods.period,
Profiles.rating
ORDER BY
Periods.period,
Profiles.rating

In the statement, if the rating is not null, the COALESCE function will return the actual value if the rating is NULL, then the COALESCE will find the next not null value, which must be 0.

Besides COALESCE function, there is another function called NULLIF, which might mislead you to a totally wrong way just as what I had.
According to postgres document, the function might behave in a totally opposite way than you expected.

The NULLIF function returns a null value if value1 equals value2; otherwise it returns value1. This can be used to perform the inverse operation of the COALESCE

A pitfall in jQuery form serialization

Today, I was so surprised that I got an empty string when I call the serialize method on a jQuery wrapped form.
The html is written in Haml:

Html
1
2
3
4
5
6
7
%form#graph-option.horizontal-form
%fieldset
%label{ :for=>'start-date'} Start Date
%select#start-date
%label{ :for=>'end-date'} End Date
%select#end-date
%button#submit-option.btn.large-btn

And the script is written in coffee-script:

Script
1
2
3
4
5
6
7
$ ->
$('#submit-option').click ->
option = $('#graph-option').serialize()
$.post '/dashboard/graph', option, (data) ->
renderCharts data

When I execute the script, I got 500 error. And the reason is that the option is empty.
I believe this must be caused by a super silly mistake, so I try to call serialize methods on Twitter Bootstrap website, and I still got empty string!!!!

After half an hour debugging, I just realize that I forgot to assign the name to all the input elements. And according to html specification, the browser uses the name of the elements to identify whom the value belongs to.
So when the name is omitted, the serailizeArray method in jQuery returns an empty array, as a result, the serialize method returns empty string.

According to my experience, it is easy to identify this problem, if the html is in html-like format, such as erb. But it is really hard to identify this issue if the page is written in haml, because in haml, id is used much more often.
To fix this problem, we need to specify the name explicitly for each form element.

Here is the fixed haml code:

Html
1
2
3
4
5
6
7
%form#graph-option.horizontal-form
%fieldset
%label{ :for=>'start-date'} Start Date
%select#start-date{ :name=>'start-date' }
%label{ :for=>'end-date'} End Date
%select#end-date{ :name=>'end-date' }
%button#submit-option.btn.large-btn

Name trap in Rails

I’m a newbie to Rails, and my past few projects are all rails based, including MetaPaas, Recruiting On Rails and current SFP.
I was amazed by the convenience of Rails, and also hurt by its “smartness”.
The power of Rails was described in quite a lot of posts, so I wanna to share some failure experiences.
Actually I have felt into quite a number of pitfalls in Rails, and here is one of the most painful ones.

To explain the problem easier, I just simplify the scenario:
I have a model called “Candidate”, which holds a “Status” to store the status of the candidate, so I have the code like this:

Candidate and Status
1
2
3
4
5
6
7
8
9
10
11
class Candidate < ActiveRecord::Base
has_one status
# other definition
end
class Status < ActiveRecord::Base
belongs_to candidate
# other definition
end

For some reason, I change the relationship between Candidate and Status. It is changed from one-to-one to one-to-many.
So I changed the has_one to has_many:

Candidate and Status
1
2
3
4
5
6
7
8
9
10
11
class Candidate < ActiveRecord::Base
has_many status
# other definition
end
class Status < ActiveRecord::Base
belongs_to candidate
# other definition
end

I thought it is an easy modification, but the app fails to run, even I have done the database migration.
It said rails cannot find a constant named “Statu”!

After my first sight on this error message, I believed it is caused by a typo, I must mistyped “Status” as “Statu”.
So I full-text search the whole project for “Statu”, but I cannot find any.

This error message is quite weird to me, since I have no idea about where the word “Statu” come from!
After spent half an hour on pointless trying, I suddenly noticed that the word “status” is end with “s”, and according to Rails’ convention, rails must think “status” is the plural form of “statu”. So according to the convention again, it try to find a class named “Statu”.
And we should use the plural form noun as name for one-to-many field, since that holds an array rather than a single object.

So after changing status to statuses, the problem solved.

Convention based system is powerful, a lot magic just happened there. But also the magic things are hard to debug when some special case breaks the convention presumption.

Negative Index in Coffee-Script

Coffee Script borrowed quite a lot syntax patterns from both Ruby and Python, especially from Ruby.
So people, like me, usually tends to write coffee-script in ruby way.

In ruby, we can retrieve the element in an array in reversed order by using a negative index, which means array[-1] returns the last element in the array. This grammar sugar is really convenient and powerful, so we can omit the code like this array[array.length - 1].

But for some reason, coffee-script doesn’t inherit this syntax. To me, it is weird. But after analyze this problem in detail, I found the reason.
Coffee script announce it has a golden rule: “coffee-script is just javascript”. So all the coffee script must be able to compiled into javascript.

Let’s try to analyze how the coffee script is compiled into javascript:

Coffee Script
1
2
3
array = [1..10]
second = array[1]
last = array[-1] // Psudocode

Obviously, the previous code should be compiled as following:

JavaScript
1
2
3
4
var array, last, second;
array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
second = array[1];
last = array[array.length - 1];

The negative index should be processed specially, so we should check the index is negative or not while compiling. This translation seems easy but actually not, since we can and usually use variable as the index.

Variable as index
1
2
3
4
5
array = [1..10]
index = 1
second = array[index]
index = -index
last = array[index]

In the previous code, because we use the variable as index, which cannot be verified in compile-time, which means we need to compile the array reference code as following:

Compile Result
1
2
3
4
5
6
var array, index, last, second;
array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
index = 1;
second = index >=0 ? array[index] : array[array.length + index];
index = -index;
last = index >=0 ? array[index] : array[array.length + index];

So every time we reference the array, we need to check whether the index is negative or not. This approach absolutely hurts the performance a lot, which in basically unacceptable.
So that’s why coffee-script doesn’t support the negative index.

Eigenclass in ruby

To me, “Eigenclass” is a weird name. Here is the definition of “Eigenclass” from wikipedia:

A hidden class associated with each specific instance of another class.

“Eigen” is a Dutch word, which means “own” or “one’s own”. So “Eigenclass” means the class that class owned by the instance itself.

To open the eigenclass of the object, Ruby provide the following way:

Open Eigenclass
1
2
3
4
5
6
7
foo = Foo.new
class << foo
# do something with the eigenclass of foo
end

Since the in most cases, the purpose that we open a eigenclass is to define singleton methods on specific object. So Ruby provide an easy way to define the singleton method on specific instance:

Shorten saying
1
2
3
4
5
6
7
foo = Foo.new
def foo.some_method
# do something
end

Since “static method” or “class method” is actually the singleton method of a specific class. So this statement is usually used to declare the “class method”.

Besides this simpler statment, we also can open the eigenclass of the class to achieve the same result.
We can write this:

Open eigenclass of the class
1
2
3
4
5
6
7
8
9
10
11
class Foo
class << self
# define class methods
end
# define instance methods
end

Since we’re in the class block, so the “self” indicates the Foo class instance. So we can use class << self; end to open the eigenclass of the class.

Space Pitfall in coffee-script

Coffee Script had fixed quite a lot of pitfalls in Javascript. But on another hand it also introduced some other pitfalls, the most common one is the space.

Space in function declaration

Read the following code:

Show Message:Coffee
1
2
3
4
show = message ->
console.log message
show "space pitfall"

This is a quite simple script, but it failed to run. And if you might also feel confused about the error message: “message is not defined”

What happened to the code? We indeed had declared the message as argument of function show. To reveal the answer, we should analyze the compiled javascript.
Here is the compiled code:

Show Message:JS
1
2
3
4
5
6
7
8
// Generated by CoffeeScript 1.3.1
var show;
show = message(function() {
return console.log(message);
});
showe("space pitfall");

Look the fun declaration, you will see it is not a function declaration as we want but a function call.
The reason is that we omitted the parentheses around the argument and we add a new space between message and ->. So the coffee-script compiler interpret message as a function call with a function as parameter.

Soltuion
To fix this problem, we can remove the space between message and -> to enforce coffee-script compiler interpret them as a whole.

Show Message:Fix
1
2
3
4
show = message->
console.log message
show "space pitfall"

Best Practise
To avoid this pitfall, my suggestion is never omit the parentheses around the arguments, even there is only one argument.
And also including the function call, even coffee-script allow to omit the parentheses. Since you won’t able to chain the method call if you omit the parentheses.
So never omit parentheses, unless you are very certain that there is no any ambiguity and you won’t use method chain.

The space in array index

Since coffee script doesn’t support the negative index. So we should use following code as negative index:

Last Hero:Coffee
1
2
3
heros = ["Egeal Eye", "XMen", "American Captain", "IronMan"]
lastHero = heros[heros -1]
console.log lastHero

This piece of code is also failed to run, and the error message is “property of object is not a function”.
Quite wield right?
Let’s see what is behind the scene, here is the compiled code:

Last Hero:JS
1
2
3
4
5
6
7
8
// Generated by CoffeeScript 1.3.1
var heros, lastHero;
heros = ["Egeal Eyr", "XMen", "American Captain", "IronMan"];
lastHero = heros[heros.length(-1)];
console.log(lastHero);

Same problem, heros.length -1 is interpreted as heros.length(-1) instead of heros.length -1.
To fix this problem, we should write the code in following way:

Last Hero:Fix1
1
2
3
heros = ["Egeal Eye", "XMen", "American Captain", "IronMan"]
lastHero = heros[heros - 1]
console.log lastHero

Or

Last Hero:Fix2
1
2
3
heros = ["Egeal Eye", "XMen", "American Captain", "IronMan"]
lastHero = heros[heros-1]
console.log lastHero

Both solution is try to enforce the compiler divid the component in correct way.

And unfortunately, there is no way to avoid this problem, the only thing you can do is always be aware the spaces in expression.

How to print multiple line string on bash

To display some pre-formatted text onto screen, we need the following 2 capabilities:

Construct Multiple Text

There are 2 ways to construct multiple line strings:

  • String literal

    String Literal
    1
    2
    3
    4
    5
    text = "
    First Line
    Second Line
    Third Line
    "
  • Use cat

    cat
    1
    2
    3
    4
    5
    6
    text = $(cat << EOF
    First Line
    Second Line
    Third Line
    EOF
    )

For some reason, echo command will eat all the line break in the text, so we should use printf instead of echo.
And printf supports back-slash-escape, so we can use \n to print a new-line on screen.

Dynamic Singleton Methods in Ruby

Today, I pair with Ma Wei to refactor a piece of pre-existed code. We try to eliminate some “static methods” (in fact, there is no real static method in ruby, I use this term to describe the methods that only depends on its parameters other than any instance variables).

The code is like this:

Recruiter.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class Recruiter
def approve! candidates
Candidate.transaction do
candidates.each do |candidate|
candidate.status.approve!
end
end
end
def reject! candidates
Candidate.transaction do
candidates.each do |candidate|
candidate.status.reject!
end
end
end
def revoke! candidates
Candidate.transaction do
candidates.each do |candidate|
candidate.status.revoke!
end
end
end
# ...
# Some other methods similar
end

As you can see the class Recruiter is used as a host for the methods that manipulate the array of candidates, which is a strong bad smell . So we decide to move these methods to their context class.

In Java or C#, the solution to this smell is quite obvious, which could be announced as “Standard Answers”:

  1. Mark all methods static.
  2. Create a new class named CandiateCollection.
  3. Change the type of candidates to CandidateCollection.
  4. Mark all methods non-static, and move it to CandidateCollection class.
    If you use Resharper or IntelliJ enterprise version, then the tool can even do this for you.

But in ruby world, or even in dynamic language world, we don’t like to create so many classes, especially these “strong-typed collection”. I wish I could inject these domain related methods to the array instance when necessary, which is known as “singleton methods” in ruby.
To achieve this, I might need the code like this:

Singleton Methods
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def wrap_array array
def array.approve!
# ...
end
def array.reject!
# ...
end
# ...
# Some other methods similar
array
end

With the help of this wrap_array method, we can dynamic inject the method into the array like this:

call wrap_array
1
wrap_array(Candidate.scoped_by_id(candidate_ids)).approve!

This is cool, but still not cool enough. We still have problems:

  1. All the business logic is included in the wrap method. It is hard to maintain.
  2. Where should we declare this wrap method? In class Array or another “static class”?

The answer to the 1st question is easy, our solution is encapsulate these logics into a module:

Module CandidateCollection
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
module CandidateCollection
def approve!
# ...
end
def reject!
# ...
end
def revoke!
# ...
end
# ...
# Some other methods similar
end
end

By encapsulate the logic into a module, then we can extract it into a single file, so the logic could be organized in the way as we want.
Now we need to solve the second problem and reuse the module we just created.

To achieve this, we wrote the following code:

Array.to_candidate_collection
1
2
3
4
5
6
7
8
class Array
def to_candidate_collection
class << self
include CandidateCollection
end
self
end
end

In the code, we re-opened the class Array, and define a new method called to_candidate_collection, which is used to inject domain methods into a generic array.
So we can have the following code:

Call to_candidate_collection
1
Candidate.scoped_by_id(candidate_ids).to_candidate_collection.approve!

Now our refactoring is basically completed.

But soon, we realize that is pattern is really powerful and should be able to be reused easily. So we decide to move on.
We want to_candiate_collection be more generic, so we can dynamically inject any module, not just CandidateCollection.
So we wrote the following code:

dynamic_inject
1
2
3
4
5
6
7
8
class Array
def dynamic_inject module
class << self
include module
end
self
end
end

So we can have the code like this:

call dynamic_inject
1
Candidate.scoped_by_id(candidate_ids).dynamic_inject(CandidateCollection).approve!

The code looks cool, but failed to run.
The reason is that we opened the meta class of the instance, which means we enter another level of context, so the parameter module is no longer visible.
To solve this problem, we need to flatten the context by using closure. So we modified the code as following:

dynamic_inject version 2
1
2
3
4
5
6
7
8
9
class Array
def dynamic_inject module
metaclass = class << self; self; end
metaclass.class_eval do
include module
end
self
end
end

The code metaclass = class << self; self; end is very tricky, we use this statement to get the meta class of the array instance.
Then we call class_eval on meta class, which then mixed-in the module we want.

Now the code is looked nice. We can dynamically inject any module into “Array” instance.
Wait a minute, why only “Array”? We’d like to have this capability on any object!
Ok, that’s easy, let’s move the method to Kernel module, which is mixed-in by Object class.

dynamic_inject version 3
1
2
3
4
5
6
7
8
9
module Kernel
def dynamic_inject module
metaclass = class << self; self; end
metaclass.class_eval do
include module
end
self
end
end

Now we can say the code looks beautiful.

NOTICE:
Have you noticed that we have a self expression at the end of the dynamic_inject method as return value.
This statement is quite important!
Since we will get “undefined method error” when calling Candidate.scoped_by_id(candidate_ids).dynamic_inject(CandidateCollection).approve! if we missed this statement.
We spent almost 1 hour to figure out this stupid mistake. It is really a stupid but expensive mistake!


Instead of these tricky ways, for Ruby 1.9+, it is okay to use extend method to replace the tricky code.
The extend method is the official way to do “dyanamic inject” as described before.