Guys familiar with Rails are very likely used to the following code, and will not be surprised by it:
ActiveRecord
1
2
3
4
5
6
classUser < ActiveRecord::Base
end
first_user = User.find(0)
But actually the code is not as simple as it looks like, especially for the ones from Java or C# world. In this piece of code, we can figure out that the class User inherited the method find from its parent class ActiveRecord::Base(If you are doubt or interested in how it works, you can check this post Ruby Class Inheritance).
If you write the following code, it should works fine:
Simple Class
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
classBase
defself.foo
bar_result = new.bar
"foo #{bar_result}"
end
defbar
'bar'
end
end
classDerived < Base
end
Base.new.bar.should == 'bar'
Derived.new.bar.should == 'bar'
Base.foo.should == "foo bar"
Derived.foo.should == "foo bar"
In Ruby’s world, most of the time you can replace a inheritance with a module mixin. So we try to refactor the code as following:
Exract to Module
1
2
3
4
5
6
7
8
9
10
11
12
13
14
moduleBase
defself.foo
bar_result = new.bar
"foo #{bar_result}"
end
defbar
'bar'
end
end
classDerived
include Base
end
If we run the tests again, the 2nd test will fail:
Test
1
2
3
4
Dervied.new.bar.should == 'bar'# Passed
Dervied.foo.should == 'foo bar'# Failed
The reason of the test failure is that the method ‘foo’ is not defined! So it is interesting, if we inherits the class, the class method of base class will be available on the subclass; but if we include a module, the class methods on the module will be available on the host class!
As we discussed before(Ruby Class Inheritance), the module mixed-in is equivalent to include insert a anonymous class with module’s instance methods into the ancestor chain of child class.
So is there any way to make all tests passed with module approach? The answer is yes absolutely but we need some tricky thing to make it happen:
Nokogiri is a really popular Xml and Html library for Ruby. People loves Nokogiri is not just because it is powerful and fast, the most important is its flexible and convenient. Nokogiri works perfect in most aspects, but there is a big pitfall when handling the xml namespace!
I met a super weird issue when processing xml returned by Google Data API, and the API returns the following xml document:
I instantiated a Nokogiri::XML DOM with the xml document, and then I try to query the DOM with XPath: xml_dom.xpath '//entry':
Query DOM
1
2
xml_dom = Nokogiri::XML Faraday.get api_url
entries = xml_dom.xpath '//entry'
I’m expecting entries is an array with 4 elements, but actually it is empty array. After a few tries, I found the query yields empty array when I introduce the element name in the query.
Try Xpath Queries
1
2
3
4
5
xml_dom.xpath '.'# returns document
xml_dom.xapth '//.'# returns all element nodes
xml_dom.xpath '/feed'# returns empty array
xml_dom.xpath '//entry'# returns empty array
xml_dom.xpath '//media:group', 'media' => 'http://search.yahoo.com/mrss/'# returns 4 the media:group nodes
It is super weird.
After half an hour fighting against the Nokogiri, I begin to realize that it must be related to the namespace. And I found that there is an attribute applied to the root element of the document: xmlns="http://www.w3.org/2005/Atom", which means all the elements without explicit namespace declaration in the xml dom are under the namespace http://www.w3.org/2005/Atom by default.
And for some reason, the XPath query is namespace sensitive! It requires the full name rather than the local name, which means we should query the DOM with the code: xml_dom.xpath '//atom:entry', 'atom' => 'http://www.w3.org/2005/Atom'.
xml_dom.xpath '//media:group', 'media' => 'http://search.yahoo.com/mrss/'# returns 4 the media:group nodes
So in a sentence: XPath in Nokogiri doesn’t inherit the default namespace, so when query the DOM with default namespace, we need to explicitly specify the namespace in XPath query. It is really a hidden requirement and is very likely to be ignored by the developers!
So if there is no naming collision issue, it is recommeded to avoid this kind of “silly” issues by removing the namespaces in the DOM. Nokogiri::XML::Document class provides Nokogiri::XML::Document#remove_namespaces! method to achieve this goal.
I found the behavior of keyword def in ruby is really confusing! At least, really confusing to me! In most case, we use def in class context, then it defines a instance method on specific class.
Use def in class
1
2
3
4
5
6
7
8
9
classFoo
deffoo
:foo
end
$context = self
end
Foo.new.foo.should == :foo
$context.should == Foo
Besides the typical usage, we can also use def in block.
Use def in class_eval block
1
2
3
4
5
6
7
8
9
10
11
classFoo;end
Foo.class_eval do
deffoo
:foo
end
$context = self
end
Foo.new.foo.should == :foo
$context.should == Foo
This previous piece of code works as we reopened the class Foo, and add a new method to it. It is also not hard to understand.
The fact that really surprised me is in the following code:
Use def in instance_eval block
1
2
3
4
5
6
7
8
9
10
11
classFoo;end
Foo.instance_eval do
deffoo
:foo
end
$context = self
end
Foo.foo.should == :foo# Method foo goes into the Foo class itself rather than Foo's instance!
$context.should == Foo
Here we can found that method foo goes into the Foo class itself, rather than Foo‘s instance! But the $context is still Foo class!
So in a word, calling def foo in instance_eval block is equivalent to calling ‘def self.foo’ in class_eval block, even though the context of both block are the class itself. So we can figure out that keyword def works different than method define_method and define_singleton_method, since it doesn’t depend on self, but the two methods does!
To me it is kind of hard to understand. and confusing. And I think it is not a good design! Ruby is different to other Java or C#, ruby uses methods on class to take place of the keywords in other languages, such as public, protected and private. In most of the language, they are keywords. But in ruby they are actually the class methods of Class. This design is good, because it is kind of enabled the developer to extend the “keyword” they can use! But at the same time, this design melted the boundary between customizable methods and predefined keywords, so people won’t pay much attention to the difference of the two. So it is important to keep the consistency between methods and keyword behaviors. But def breaks the consistency, so it is confusing!
Look the following code:
def vs define_method
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
definition_block = Proc.new do
deffoo
:foo
end
define_method :bardo
:bar
end
end
classA;end
classB;end
A.class_eval &definition_block
B.instance_eval &definition_block
Comparing class A and class B, we can find that they are different, even they are defined with exactly same block!
Days ago, I post a blog about the ruby inheritance hierarchy. When discuss the topic with Yang Lin, he mentioned a crazy but interesting idea that introducing the prototype based OO into ruby. To introducing the prototype OO into ruby, Lin mentioned a possible approach is by using clone. But I’m not familiar with clone mechanism in ruby. So I tried another approach. Thanks to Ruby’s super powerful meta-programming mechanism, so I can forward the unknown message to prototype by using method_missing. And I encapsulate the code in a module, so every instance extended that module will obtain such capability.
Prototype Module
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
modulePrototype
definherit_from(prototype)
@prototype = prototype
self
end
defcreate_child
Object.new.extend(Prototype).inherit_from(self)
end
defrespond_to?(msg_id, priv = false)
returntrueifsuper
if @prototype
@prototype.respond_to?(msg_id, priv)
else
false
end
end
defmethod_missing(symbol, *args, &block)
if @prototype
@prototype.send(symbol, *args, &block)
else
super
end
end
defself.new
Object.new.extend(Prototype)
end
end
If I have the following code:
Prototype Inheritance
1
2
3
4
5
6
7
8
9
10
11
12
13
a = Object.new
defa.foo
'foo'
end
b = Object.new
b.extend(Prototype).inherit_from(a)
c = b.create_child
p b.foo # => 'foo'
p c.foo # => 'foo'
So b.foo and c.foo will yield ‘foo’.
And I can override the parent implementation by refine a method with the same name:
Prototype Overrides
1
2
3
4
5
6
7
8
9
10
11
defa.bar
'bar'
end
defc.bar
'c.bar'
end
p a.bar # => 'bar'
p b.bar # => 'bar'
p c.bar # => 'c.bar'
So I add a new singleton method bar in a, and b automatically inherits the method, and I override the bar on object c.
As a conclusion that we’re able to introduce the prototype based inheritance in ruby by using ruby’s powerful meta-programming mechanism. This implementation is only for concept-proof, so its performance is not quite good. But we can try to improve the performance by consolidating process by defining the method dynamically. The child object will query the parent for the first time, if invoking succeeded then it can consolidate the behavior into a method to avoid calling method_missing every time.
I just realize I have misunderstood the ruby “class methods” for quite a long time!
Here is a piece of code:
Instance Methods Inheritance
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
classA
deffoo
'foo'
end
end
classB < A
end
a = A.new
b = B.new
a.foo.should == 'foo'
b.foo.should == 'foo'
The previous piece of code demonstrated the typical inheritance mechanism in almost every Class-Object style OO language (There are a few exceptions, which are Prototype inheritance. Such as JavaScript, but it is also a miracle that whether Javascript is OO language XD). In most common OO languages, this is what inheritance about! But in Ruby, things is not that easy! Great thanks to Ruby’s eigen-class (aka Singleton class or Metaclass)
In ruby, I just found that derived class not just inherits the instance methods but also the class methods! It is kind of surprise to me!
Class Methods Inheritance
1
2
3
4
5
6
7
8
9
10
11
classA
defself.bar
'bar'
end
end
classB < A
end
A.bar.should == 'bar'
B.bar.should == 'bar'
For most people who knows Java or C# or even C++, who won’t be surprised about A.bar.should == 'bar', but you might feel surprised about B.bar.should == 'bar' like I do.
To me, bar is declared on class A, B inherits A, than I can even call method declared on class A on class B! It is amazing!
Since in ruby, “class method” is actually the instance method of the eigen-class of the class. And def self.foo is just a syntax sugar. So we can rewrite the code as:
Rewriten Class Methods Inheritance
1
2
3
4
5
6
7
8
9
10
11
12
13
14
classA
end
classB < A
end
class << A
defbar
'bar'
end
end
A.bar.should == 'bar'
B.bar.should == 'bar'
If we call A’s eigen-class AA, and B’s eigen-class BB. Then we will found that BB.superclass == AA
BB and AA
1
2
3
4
5
6
7
8
9
10
11
classA;end
classB < A;end
AA = class << A;self; end
BB = class << B;self; end
B.superclass.should == A
BB.superclass.should == AA
And we know A is actually an instance of AA, and B is an instance of BB, so obviously on B we can call the instance methods defined on AA. That’s the reason why class method in Ruby can be inherited!
But there are so inconsistency in Ruby, that AA is the superclass of BB, but you won’t be able to found AA in BB‘s ancestors! In fact, BB.ancestors might yield something similar to [Class, Module, Object, BasicObject, Kernel] if not any module is injected to Class, Module, Object
This design is wield to me, and kind of hard to understand, so for quite a long time, I don’t even know class methods in ruby can be inherited! I drew a graph to show the relationship about the classes, in graph I use <class:A> to indicate the class is the eigen class of A. And the line with a empty triangle to represents the inheritance, and arrow line to represents the instantiation. And this graph is not a complete one, I omitted some unimportant classes, and I uses the dot line to indicate that something is missing on the line.
Today we met a weird problem when return a string via json.
Here is the coffee script code:
Front End
1
2
3
4
$.post serverUrl, data, (status) ->
console.log status
And here is our controller:
Backend Action
1
2
3
4
5
6
7
defaction
# do some complex logic
render json:"success"
end
Code looks perfect, but we found that the callback is never called! When we check the network traffic, you will found that server does send its response “success”, but the callback is not called!
After spending half an hour to struggle against the jQuery, we finally find the problem!
The reason is that success is not a valid json data! A valid json string should be quoted with “”, or JSON parser will treat it as token, like true or false or nil.
So to fix the problem, we need to change our action code:
Fixed Backend Action
1
2
3
4
5
6
7
defaction
# do some complex logic
render json:'"success"'
end
This is really a pitfall, since the wrong code looks so nature!
Thanks to Ruby powerful meta programming capability and Rails delegate syntax, we can easily write graceful singleton class which makes the class works like a instance.
In traditional language such as C#, usually we write singleton code like this:
Singleton in C##
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Foo
{
// Singleton Declaration
privatestatic readonly Foo instance;
pubilc static Foo Instance
{
get
{
if(instance == null)
{
instance = new Foo();
}
return instance;
}
}
// Define instance behaviors
// ...
}
The previous approach works fine but the code that uses Foo will be kind of ugly. Every time when we want to invoke the method Bar on Foo, we need to write Foo.Instance.Bar() rather than more graceful way Foo.Bar(). To solve this problem we need implement the class in this way:
Class Delegation in C##
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class Foo
{
// Singleton Declaration
// ...
// Define instance behaviors
publicvoidBar()
{
// Bar behaviors
// ...
}
publicstaticvoidBar()
{
Instance.Bar();
}
publicstring Baz
{
get { /* Getter behavior */ }
set { /* Setter behavior */ }
}
publicstaticstring Baz
{
get { return Instance.Baz; }
set { Instance.Baz = value; }
}
}
This approach simplified the caller code but complicated the declaration. You can use some trick such as Code-Snippet or code generating technology such as Text Template or CodeSmith to generate the dull delegation code. But it is still not graceful at all.
If we write same code in ruby, things become much easier, great thanks to Ruby’s powerful meta programming capability.
So in ruby solution we just use one statement delegate *Foo::Base.instance_methods, :to => :instance then delegate all methods defined in base to instance.
Besides this solution, there is also another kind of cheaper but working solution:
Singleton in Ruby
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# foo.rb
classFoo
autoload :Base
include Base
extend Base
end
# foo/base.rb
moduleFoo::Base
# Define instance behaviors
# ...
end
Two different approaches make the code behaves slightly different, but anyway they both works.
I’m working on a project that need some complicated html snippets for test, which cannot be easily generated with factory. So I put these snippets into fixture files.
RSpec provides a very convenient DSL keyword let, which allow us to define something for test and cached it in the same test. And I want I could have some similar keyword for my html fixtures. To achieve this goal I decide to extend DSL.
So I created module which contains the new DSL I want to have:
Put this file into the path spec/support, by default, spec_helper.rb would require this file for you. Then we should tell rspec to load the DSL into test cases.
Load DSL
1
2
3
4
5
RSpec.configure do|config|
# ...
config.extend HtmlPages
# ...
end
By telling config to extend the module, our DSL will be loaded as the class methods of RSpec::Core::ExampleGroup, where let is being defined.
HINT: Rspec config has another way to extend DSL by calling config.include. Then the DSL methods will be injected into the test example group instance, then these methods can be used in the test cases. That’s how runtime DSLs like FactoryGirl work.
Postgres provided a very interesting feature called “Schema” in addition to other “normal” database features, which provide a extra layer between database and tables. So with schema, you can have tables with same name in one database, if they are in different schemas. To me schema is not a good idea! I assume “table-space” or even “namespace” could be a better name. In fact, there are a number of people agree that schema is not a good name:
“Schema” is such a terrible name for this feature. When most people hear the term “schema” they think of a data definition of some sort. This is not what PostgreSQL schemas are. I’m sure the PostgreSQL devs had their reasons, but I really wish they would have named it more appropriately. “Namespaces” would have been apropos.
Anywho, the easiest way for me to describe PostgreSQL schemas (besides telling you that they are, indeed, namespaces for tables) is to relate them to the UNIX execution path. When you run a UNIX command without specifying its absolute path, your shell will work its way down the $PATH until it finds an executable of the same name.
And there is a popular routine is to use the postgres schema for sub-domains. For example, you’re a BSS provider, you rend your BBS apps to different organizations. To the organizations. they want to have its own BBS app instance running independently, the most important is that data should be stored into separated spaces, and could be accessed from its own domain name. But to you, for administration, you want they share the same backend management console. In this case the best way to solve the problem is to store the data owned by different subsystem into different schemas. But store all the administration data into a single schema or even in public schema. The same guy Jerod has a post described how to build this kind of system in details. There are a bunch of posts described how to build the system like this, which could be found by googling easily. And there is even a ruby gem called apartment from Brad Robertson to support this kind of system
This idea looks fancy, but unless you 10000% certain that the sub-systems will keep its independent status and without any collaboration forever. Or it sooner or later, you will find the “fancy idea” become a horrible idea. When time goes by, there could be more and more and more features that required to add collaboration between sub-systems. Such as provide a unified authentication mechanism, so user can logged in once and switch between different systems easily. Or administrator might ask for a unified statistics graph for all sub-systems. All these requirements are related to cross-schema query! To be honest, cross query in some cases could be painful! And it brings trouble to all aspects in your system, such as data migration, test data generation, etc.
That’s what exactly happens in my current project! My current project is Rails 3 project, the codebase is brand new but built on a legacy multiple schema postgres database. And for some reason, we must keep the multiple schema design unchanged. But our goal is to unify the separated subsystem into a more closed-collaborated system.
Since ActiveRecord in Rails doesn’t include the native support to this fancy feature. Which means you will met problem during migration, or even preparing test data with factory-girl.
Postgres allows to locate the table in different schemas with full qualified name like this <schema name>.<table name>.<column name>. The schema name is optional, when you omitted the schema name, Postgres will search the table in a file-system-path-like order called “search-path“.
And you can set and query current search path with Postgres SQL statements:
Query and Set search_path
1
2
3
SHOW search_path;
SET search_path TO <new_search_path>;
Since ActiveRecord won’t add the full qualified schema name in front of the table name when it translate the ARel into SQL statements. So we can only support the multiple schema database with the search_path.
Basically, it is a very natural idea that you can use the following ruby code to make ActiveRecord make query on different schemas:
Select Schema
1
2
3
4
5
6
7
8
9
defadd_schema_to_search_path(schema)
ActiveRecord::Base.connection.execute "SET search_path TO #{schema}, public;"
end
defrestore_search_path
ActiveRecord::Base.connection.execute "SET search_path TO public;"
end
This two methods work perfect when querying things from the database. But sooner or later, you will run into big trouble when you try to write data into database.
In db migration or use factory_girl to generate test fixtures, you might found that the data you insert in different schemas finally goes into the first non-public schema. But all the query still works perfect!
We found this problem occurs when the following conditions are satisfied:
Query are happened in a Transaction.
You insert data into multiple non-public schemas.
You user SET search_path TO SQL statement to switch between schema rather than explicitly using full-qualified table name.
And the most interesting thing is that:
All the SELECT queries are executed on schemas correctly
If you use SHOW search_path; to query current search path, you will got correct search path value.
All data are inserted into first non-public schema that you actually wrote data into. So which means it you try to insert data into public schema, it won’t go wrong. Or you switched to a non-public schema, but actually you doesn’t insert any rows, it also won’t be impacted.
To solve this problem, I spent 2 nights and 2 days to digged into the source code of ActiveRecord gem and pg gem (the Postgres database adapter). And finally I solved the problem by using the attribute on PostgreSQLAdapter.
Basically, instead of using the SQL query, you should use the PostgreSQLAdapter#schema_search_path and PostgreSQLAdapter#schema_search_path= to get and set the search path. And if you dig into the source code, you will find the two methods does the exact same thing as we did except it assigned one more instance variable @schema_search_path.
methods on PostgreSQLAdapter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Sets the schema search path to a string of comma-separated schema names.
# Names beginning with $ have to be quoted (e.g. $user => '$user').
The most interesting thing is if you search the reference to @schema_search_path, you will find it is only used as a local cache of current search_path in the adapter, and it is initialized with the value from the query SHOW search_path; if it is nil, and then keep the value as the cache! This implementation is buggy and caused the problems described before!
If we use the SQL query to set the search path rather than calling schema_search_path=, we won’t set the @schema_search_path at sametime, ideally this value will remain nil by default. Then transaction or other object in ActiveRecord call schema_search_path to get current search path. The first time, the variable @@schema_search_path is nil, and will be initialized by the value from query SHOW search_path; and then won’t changed any more, since in the future this query won’t be executed any longer. As a result, the schema will be switched successfully for the first time, but failed in the following.
Which means at current stage, if you want to change search_path, the only correct way is to use PostgreSQLAdapter#schema_search_path=, and PLEASE PLEASE ignore the warning "This should be not be called manually but set in database.yml." in the source code! It is really a misleading message!
I understand current implementation is for performance consideration, but caching the value is absolutely not a good idea when you cannot keep things in sync and the sync is critical in some cases. I’m planning to fix this issue in rails codebase and create a pull request to rails maintainer. Wish they could accept this fix. Or at least they should change the warning message.
And besides of using the out-dated and mysterious PgTools mentioned in a lot of posts (I saw a lot of people mentioned this class, but I cannot find it anywhere even I from google or github. It is really a mystery). I create a new utility module called MultiSchema.
You can use it as the utility class in the old-fashioned way:
Procedure usage
1
2
3
MultiSchema.with_in_schemas :except => :publicdo
# Play around the data in one schema
end
Or you can use it in a DSL-like way:
DSL usage
1
2
3
4
5
6
7
8
9
classSomeMigration < ActiveRecord::Migration
include MultiSchema
defchange
with_in_schemas :except => :publicdo
# Play around the data in one schema
end
end
end
with_in_schemas method accept both symbol and string, and you can pass single value, array or hash to it.
with_in_schemas yield all user schemas in the database
with_in_schemas :only => %w(schema1 schema2) populates all given schemas.
with_in_schemas :except => %w(schema1 schema2) populates all except given schemas.
with_in_schemas :except => [:public] is equivalent to with_in_schemas :except => ['public']
with_in_schemas :only => [:public] is equivalent to with_in_schemas :only => :public and equivalent to with_in_schemas :public
with_in_schemas :except => [:public] is equivalent to with_in_schemas :except => :public
I’m a newbie to Rails, and my past few projects are all rails based, including MetaPaas, Recruiting On Rails and current SFP. I was amazed by the convenience of Rails, and also hurt by its “smartness”. The power of Rails was described in quite a lot of posts, so I wanna to share some failure experiences. Actually I have felt into quite a number of pitfalls in Rails, and here is one of the most painful ones.
To explain the problem easier, I just simplify the scenario: I have a model called “Candidate”, which holds a “Status” to store the status of the candidate, so I have the code like this:
Candidate and Status
1
2
3
4
5
6
7
8
9
10
11
classCandidate < ActiveRecord::Base
has_one status
# other definition
end
classStatus < ActiveRecord::Base
belongs_to candidate
# other definition
end
For some reason, I change the relationship between Candidate and Status. It is changed from one-to-one to one-to-many. So I changed the has_one to has_many:
Candidate and Status
1
2
3
4
5
6
7
8
9
10
11
classCandidate < ActiveRecord::Base
has_many status
# other definition
end
classStatus < ActiveRecord::Base
belongs_to candidate
# other definition
end
I thought it is an easy modification, but the app fails to run, even I have done the database migration. It said rails cannot find a constant named “Statu”!
After my first sight on this error message, I believed it is caused by a typo, I must mistyped “Status” as “Statu”. So I full-text search the whole project for “Statu”, but I cannot find any.
This error message is quite weird to me, since I have no idea about where the word “Statu” come from! After spent half an hour on pointless trying, I suddenly noticed that the word “status” is end with “s”, and according to Rails’ convention, rails must think “status” is the plural form of “statu”. So according to the convention again, it try to find a class named “Statu”. And we should use the plural form noun as name for one-to-many field, since that holds an array rather than a single object.
So after changing status to statuses, the problem solved.
Convention based system is powerful, a lot magic just happened there. But also the magic things are hard to debug when some special case breaks the convention presumption.