2013-02-22

Manage configuration in Rails way on node.js by using inheritance

Application is usually required to run in different environments. To manage the differences between the environments, we usually introduce the concept of Environment Specific Configuration.
In Rails application, by default, Rails have provided 3 different environments, they are the well known, development, test and production.
And we can use the environment variable RAILS_ENV to tell Rails which environment to be loaded, if the RAILS_ENV is not provided, Rails will load the app in development env by default.

This approach is very convenient, so we want to apply it to anywhere. But in node.js, Express doesn’t provide any configuration management. So we need to built the feature by ourselves.

The environment management usually provide the following functionalities:

Allow us to provide some configuration values as the default, which will be loaded in all environments, usually we call it common.
Specific configuration will be loaded according to the environment variable, and will override some values in the common if necessary.

Rails uses YAML to hold these configurations, which is concise but powerful enough for this purpose. And YAML provided inheritance mechanism by default, so you can reduce the duplication by using inheritance.

Inheritance in Rails YAML Configuration


development: &defaults
  adapter: mysql
  encoding: utf8
  database: sample_app_development
  username: root
test:
  <<: *defaults
  database: sample_app_test
cucumber:
  <<: *defaults
  database: sample_app_cucumber
production:
  <<: *defaults
  database: sample_app_production
  username: sample_app
  password: secret_word
  host: ec2-10-18-1-115.us-west-2.compute.amazonaws.com

In express and node.js, if we follow the same approach, comparing to YAML, we prefer JSON, which is supported natively by Javascript.
But to me, JSON isn’t the best option, there are some disadvantages of JSON:

JSON Syntax is not concise enough
Matching the brackets and appending commas to the line end are distractions
Lack of flexility

As an answer to these issues, I chose coffee-script instead of JSON.
Coffee is concise. And similar to YAML, coffee uses indention to indicate the nested level. And coffee is executable, which provides a lot of flexibilities to the configuration. So we can implement a Domain Specific Language form

To do it, we need to solve 4 problems:

Allow dev to declare default configuration.
Load specific configuration besides of default one.
Specific configuration can overrides the values in the default one.
Code is concise, clean and reading-friendly.

Inspired by the YAML solution, I work out my first solution:

Configuration in coffee script


_ = require('underscore')
config = {}
config['common'] =
  adapter: "mysql"
  encoding: "utf8"
  database: "sample_app_development"
  username: "root"
config['development'] = {}
config['test] =
  database:"sample_app_test"
config['cucumber'] =
  database:"sample_app_cucumber"
config['production'] =
  database:"sample_app_production"
  username:"sample_app"
  password:"secret_word"
  host:"ec2-10-18-1-115.us-west-2.compute.amazonaws.com"
_.extend exports, config.common
specificConfig = config[process.env.NODE_ENV ?'development']
if specificConfig?
  _.extend exports, specificConfig

YAML is data centric language, so its inheritance is more like “mixin” another piece of data. So I uses underscore to help me to mixin the specific configuration over the default one, which overrides the overlapped values.

But if we jump out of the YAML’s box, let us think about the Javascript itself, Javascript is a prototype language, which means it had already provide an overriding mechanism natively. Each object inherits and overrides the value from its prototype.
So I worked out the 2nd solution:

Prototype based Configuration


config = {}
config['common'] =
  adapter: "mysql"
  encoding: "utf8"
  database: "sample_app_development"
  username: "root"
config['development'] = {}
config['development'].__proto__ = config['common']
config['test] =
  __proto__: config['common']
  database:"sample_app_test"
config['cucumber'] =
  __proto__: config['test']
  database:"sample_app_cucumber"
config['production'] =
  __proto__: config['common']
  database:"sample_app_production"
  username:"sample_app"
  password:"secret_word"
  host:"ec2-10-18-1-115.us-west-2.compute.amazonaws.com"
process.env.NODE_ENV = process.env.NODE_ENV?.toLowerCase() ?'development'
module.exports = config[process.env.NODE_ENV]

This approach works, but looks kind of ugly. Since we’re using coffee, which provides the syntax sugar for class and class inheritance.
So we have the 3rd version:

Class based configuration


process.env.NODE_ENV = process.env.NODE_ENV?.toLowerCase() ? 'development'
class Config
  adapter: "mysql"
  encoding: "utf8"
  database: "sample_app_development"
  username: "root"
class Config.development extends Config
class Config.test extends Config
  database: "sample_app_test"
class Config.cucumber extends Config
  database: "sample_app_cucumber"
class Config.common extends Config
  database: "sample_app_production"
  username: "sample_app"
  password: "secret_word"
  host: "ec2-10-18-1-115.us-west-2.compute.amazonaws.com"
module.exports = new Config[process.env.NODE_ENV]()

Now the code looks clean, and we can improve it a step further if necessary. We can try to separate the configurations into files, and required by the file name:

Class based configuration


# config/config.coffee
configName = process.env.NODE_ENV = process.env.NODE_ENV?.toLowerCase() ? 'development'
SpecificConfig  = requrie("./envs/#{configName}")
module.exports = new SpecificConfig()
# config/envs/commmon.coffee
class Common
  adapter: "mysql"
  encoding: "utf8"
  database: "sample_app_development"
  username: "root"
module.exports = Common
# config/envs/development.coffee
Common = require('./common')
class Development extends Common
module.exports = Development
# config/envs/test.coffee
Common = require('./common')  
class Test extends Common
  database: "sample_app_test"
module.exports = Test
# config/envs/cucumber.coffee
Test = require('./common')
class Cucumber extends Test
  database: "sample_app_cucumber"
module.exports = Cucumber
# config/envs/production.coffee
Common = require('./common')  
class Production extends Common
  database: "sample_app_production"
  username: "sample_app"
  password: "secret_word"
  host: "ec2-10-18-1-115.us-west-2.compute.amazonaws.com"
module.exports = Production

2012-08-15

Programming►Ruby

Programming

pitfall when return string in via json in rails

Today we met a weird problem when return a string via json.

Here is the coffee script code:

Front End


$.post serverUrl, data, (status) ->
	console.log status

And here is our controller:

Backend Action


def action
	# do some complex logic
	render json: "success"
end

Code looks perfect, but we found that the callback is never called! When we check the network traffic, you will found that server does send its response “success”, but the callback is not called!

After spending half an hour to struggle against the jQuery, we finally find the problem!

The reason is that success is not a valid json data! A valid json string should be quoted with “”, or JSON parser will treat it as token, like true or false or nil.

So to fix the problem, we need to change our action code:

Fixed Backend Action


def action
	# do some complex logic
	render json: '"success"'
end

This is really a pitfall, since the wrong code looks so nature!

2012-08-05

Programming►Ruby

Programming

Pretty Singleton in RoR app

Thanks to Ruby powerful meta programming capability and Rails delegate syntax, we can easily write graceful singleton class which makes the class works like a instance.

In traditional language such as C#, usually we write singleton code like this:

Singleton in C##

class Foo
{
	// Singleton Declaration
	private static readonly Foo instance;
	pubilc static Foo Instance
	{
		get
		{
			if(instance == null)
			{
				instance = new Foo();
			}
			return instance;
		}
	}
	// Define instance behaviors
	// ...
}

The previous approach works fine but the code that uses Foo will be kind of ugly. Every time when we want to invoke the method Bar on Foo, we need to write Foo.Instance.Bar() rather than more graceful way Foo.Bar().
To solve this problem we need implement the class in this way:

Class Delegation in C##


class Foo
{
	// Singleton Declaration
	// ...
	// Define instance behaviors
	public void Bar()
	{
		// Bar behaviors
		// ...
	}
	public static void Bar()
	{
		Instance.Bar();
	}
	public string Baz
	{
		get { /* Getter behavior */	}
		set { /* Setter behavior */	}
	}
	public static string Baz
	{
		get { return Instance.Baz;	}
		set { Instance.Baz = value;	}
	}
}

This approach simplified the caller code but complicated the declaration. You can use some trick such as Code-Snippet or code generating technology such as Text Template or CodeSmith to generate the dull delegation code. But it is still not graceful at all.

If we write same code in ruby, things become much easier, great thanks to Ruby’s powerful meta programming capability.

Singleton in Ruby

# foo.rb
class Foo
	extend ActiveSupport::Autoload
	autoload :Base
	include Base
	autoload :ClassMethods
	extend ClassMethods
end
# foo/base.rb
module Foo::Base
	# Define instance behaviors
	# ...
end
# foo/class_methods.rb
module Foo::ClassMethods
	# Singleton Declaration
	def instance
		@instance ||= new
	end
	delegate *Foo::Base.instance_methods, :to => :instance
end

So in ruby solution we just use one statement delegate *Foo::Base.instance_methods, :to => :instance then delegate all methods defined in base to instance.

Besides this solution, there is also another kind of cheaper but working solution:

Singleton in Ruby

# foo.rb
class Foo
	autoload :Base
	include Base
	extend Base
end
# foo/base.rb
module Foo::Base
	# Define instance behaviors
	# ...
end

Two different approaches make the code behaves slightly different, but anyway they both works.

2012-07-17

Programming►Ruby

Programming

Use Postgres Multiple Schema Database in Rails

Postgres provided a very interesting feature called “Schema” in addition to other “normal” database features, which provide a extra layer between database and tables. So with schema, you can have tables with same name in one database, if they are in different schemas.
To me schema is not a good idea! I assume “table-space” or even “namespace” could be a better name. In fact, there are a number of people agree that schema is not a good name:

“Schema” is such a terrible name for this feature. When most people hear the term “schema” they think of a data definition of some sort. This is not what PostgreSQL schemas are. I’m sure the PostgreSQL devs had their reasons, but I really wish they would have named it more appropriately. “Namespaces” would have been apropos.

Anywho, the easiest way for me to describe PostgreSQL schemas (besides telling you that they are, indeed, namespaces for tables) is to relate them to the UNIX execution path. When you run a UNIX command without specifying its absolute path, your shell will work its way down the $PATH until it finds an executable of the same name.

Jerod Santoblog.jerodsanto.net/2011/07/building-multi-tenant-rails-apps-with-postgresql-schemas

And you can find more here

And there is a popular routine is to use the postgres schema for sub-domains. For example, you’re a BSS provider, you rend your BBS apps to different organizations. To the organizations. they want to have its own BBS app instance running independently, the most important is that data should be stored into separated spaces, and could be accessed from its own domain name. But to you, for administration, you want they share the same backend management console.
In this case the best way to solve the problem is to store the data owned by different subsystem into different schemas. But store all the administration data into a single schema or even in public schema.
The same guy Jerod has a post described how to build this kind of system in details. There are a bunch of posts described how to build the system like this, which could be found by googling easily.
And there is even a ruby gem called apartment from Brad Robertson to support this kind of system

This idea looks fancy, but unless you 10000% certain that the sub-systems will keep its independent status and without any collaboration forever.
Or it sooner or later, you will find the “fancy idea” become a horrible idea.
When time goes by, there could be more and more and more features that required to add collaboration between sub-systems. Such as provide a unified authentication mechanism, so user can logged in once and switch between different systems easily. Or administrator might ask for a unified statistics graph for all sub-systems.
All these requirements are related to cross-schema query! To be honest, cross query in some cases could be painful!
And it brings trouble to all aspects in your system, such as data migration, test data generation, etc.

That’s what exactly happens in my current project!
My current project is Rails 3 project, the codebase is brand new but built on a legacy multiple schema postgres database. And for some reason,
we must keep the multiple schema design unchanged. But our goal is to unify the separated subsystem into a more closed-collaborated system.

Since ActiveRecord in Rails doesn’t include the native support to this fancy feature. Which means you will met problem during migration, or even preparing test data with factory-girl.

Postgres allows to locate the table in different schemas with full qualified name like this <schema name>.<table name>.<column name>. The schema name is optional, when you omitted the schema name, Postgres will search the table in a file-system-path-like order called “search-path“.

And you can set and query current search path with Postgres SQL statements:

Query and Set search_path

1
2
3

SHOW search_path;
SET search_path TO <new_search_path>;

Since ActiveRecord won’t add the full qualified schema name in front of the table name when it translate the ARel into SQL statements. So we can only support the multiple schema database with the search_path.

Basically, it is a very natural idea that you can use the following ruby code to make ActiveRecord make query on different schemas:

Select Schema


def add_schema_to_search_path(schema)
  ActiveRecord::Base.connection.execute "SET search_path TO #{schema}, public;"
end
def restore_search_path
  ActiveRecord::Base.connection.execute "SET search_path TO public;"
end

This two methods work perfect when querying things from the database. But sooner or later, you will run into big trouble when you try to write data into database.

In db migration or use factory_girl to generate test fixtures, you might found that the data you insert in different schemas finally goes into the first non-public schema. But all the query still works perfect!

We found this problem occurs when the following conditions are satisfied:

Query are happened in a Transaction.
You insert data into multiple non-public schemas.
You user SET search_path TO SQL statement to switch between schema rather than explicitly using full-qualified table name.

And the most interesting thing is that:

All the SELECT queries are executed on schemas correctly
If you use SHOW search_path; to query current search path, you will got correct search path value.
All data are inserted into first non-public schema that you actually wrote data into. So which means it you try to insert data into public schema, it won’t go wrong. Or you switched to a non-public schema, but actually you doesn’t insert any rows, it also won’t be impacted.

To solve this problem, I spent 2 nights and 2 days to digged into the source code of ActiveRecord gem and pg gem (the Postgres database adapter).
And finally I solved the problem by using the attribute on PostgreSQLAdapter.

Basically, instead of using the SQL query, you should use the PostgreSQLAdapter#schema_search_path and PostgreSQLAdapter#schema_search_path= to get and set the search path.
And if you dig into the source code, you will find the two methods does the exact same thing as we did except it assigned one more instance variable @schema_search_path.

methods on PostgreSQLAdapter


# Sets the schema search path to a string of comma-separated schema names.
# Names beginning with $ have to be quoted (e.g. $user => '$user').
# See: http://www.postgresql.org/docs/current/static/ddl-schemas.html
#
# This should be not be called manually but set in database.yml.
def schema_search_path=(schema_csv)
  if schema_csv
    execute("SET search_path TO #{schema_csv}", 'SCHEMA')
    @schema_search_path = schema_csv
  end
end
# Returns the active schema search path.
def schema_search_path
  @schema_search_path ||= query('SHOW search_path', 'SCHEMA')[0][0]
end

The most interesting thing is if you search the reference to @schema_search_path, you will find it is only used as a local cache of current search_path in the adapter, and it is initialized with the value from the query SHOW search_path; if it is nil, and then keep the value as the cache!
This implementation is buggy and caused the problems described before!

If we use the SQL query to set the search path rather than calling schema_search_path=, we won’t set the @schema_search_path at sametime, ideally this value will remain nil by default. Then transaction or other object in ActiveRecord call schema_search_path to get current search path. The first time, the variable @@schema_search_path is nil, and will be initialized by the value from query SHOW search_path; and then won’t changed any more, since in the future this query won’t be executed any longer.
As a result, the schema will be switched successfully for the first time, but failed in the following.

Which means at current stage, if you want to change search_path, the only correct way is to use PostgreSQLAdapter#schema_search_path=, and PLEASE PLEASE ignore the warning "This should be not be called manually but set in database.yml." in the source code! It is really a misleading message!

I understand current implementation is for performance consideration, but caching the value is absolutely not a good idea when you cannot keep things in sync and the sync is critical in some cases.
I’m planning to fix this issue in rails codebase and create a pull request to rails maintainer. Wish they could accept this fix. Or at least they should change the warning message.

And besides of using the out-dated and mysterious PgTools mentioned in a lot of posts (I saw a lot of people mentioned this class, but I cannot find it anywhere even I from google or github. It is really a mystery). I create a new utility module called MultiSchema.

You can use it as the utility class in the old-fashioned way:

Procedure usage

1
2
3

MultiSchema.with_in_schemas :except => :public do
  # Play around the data in one schema
end

Or you can use it in a DSL-like way:

DSL usage

class SomeMigration < ActiveRecord::Migration
include MultiSchema
  def change
    with_in_schemas :except => :public do
      # Play around the data in one schema
    end
  end
end

with_in_schemas method accept both symbol and string, and you can pass single value, array or hash to it.

with_in_schemas yield all user schemas in the database
with_in_schemas :only => %w(schema1 schema2) populates all given schemas.
with_in_schemas :except => %w(schema1 schema2) populates all except given schemas.
with_in_schemas :except => [:public] is equivalent to with_in_schemas :except => ['public']
with_in_schemas :only => [:public] is equivalent to with_in_schemas :only => :public and equivalent to with_in_schemas :public
with_in_schemas :except => [:public] is equivalent to with_in_schemas :except => :public

2012-06-02

Programming►Ruby

Programming

Name trap in Rails

I’m a newbie to Rails, and my past few projects are all rails based, including MetaPaas, Recruiting On Rails and current SFP.
I was amazed by the convenience of Rails, and also hurt by its “smartness”.
The power of Rails was described in quite a lot of posts, so I wanna to share some failure experiences.
Actually I have felt into quite a number of pitfalls in Rails, and here is one of the most painful ones.

To explain the problem easier, I just simplify the scenario:
I have a model called “Candidate”, which holds a “Status” to store the status of the candidate, so I have the code like this:

Candidate and Status

class Candidate < ActiveRecord::Base
	has_one status
	# other definition
end
class Status < ActiveRecord::Base
	belongs_to candidate
	# other definition
end

For some reason, I change the relationship between Candidate and Status. It is changed from one-to-one to one-to-many.
So I changed the has_one to has_many:

Candidate and Status

class Candidate < ActiveRecord::Base
	has_many status
	# other definition
end
class Status < ActiveRecord::Base
	belongs_to candidate
	# other definition
end

I thought it is an easy modification, but the app fails to run, even I have done the database migration.
It said rails cannot find a constant named “Statu”!

After my first sight on this error message, I believed it is caused by a typo, I must mistyped “Status” as “Statu”.
So I full-text search the whole project for “Statu”, but I cannot find any.

This error message is quite weird to me, since I have no idea about where the word “Statu” come from!
After spent half an hour on pointless trying, I suddenly noticed that the word “status” is end with “s”, and according to Rails’ convention, rails must think “status” is the plural form of “statu”. So according to the convention again, it try to find a class named “Statu”.
And we should use the plural form noun as name for one-to-many field, since that holds an array rather than a single object.

So after changing status to statuses, the problem solved.

Convention based system is powerful, a lot magic just happened there. But also the magic things are hard to debug when some special case breaks the convention presumption.

ThoughtWorkshop

Digital Bigs in my thought

Manage configuration in Rails way on node.js by using inheritance

pitfall when return string in via json in rails

Pretty Singleton in RoR app

Use Postgres Multiple Schema Database in Rails

Name trap in Rails