2013-01-30

Implement sqrt with basic algebra operators

It is a very basic question, but to solve it in a time limited environment, require solid knowledge about algorithm, and could use these knowledges flexibly.
I found the problem of myself is that I know it, but I cannot use it as flexible as my hand.

The problem description:

Calculate the square root of a given number N
The N could be a decimal, such as 6.25 or 0.01
The implementation only allow to use basic algebra operators like +, -, *, /, <, >, etc.
Advanced functions like Math.sqrt is not allowed

As a TDDer, I’m used to write a simple test structure that is gonna used to express the test cases:

Code Skeleton


def assert(expected, actual)
  if expected == actual
    puts "Passed"
  else
    puts "Failed"
    p expected
    p actual
  end
end
def sqrt(n)
end

After this, we can begin to write our 1st test case, which is the simplest scenario that I can imagine:
We assume:

n must be an integer
n must have a integer square root

1st test case


def assert(expected, actual)
  if expected == actual
    puts "Passed"
  else
    puts "Failed"
    p expected
    p actual
  end
end
def sqrt(n)
end
# Start from the easiest case, the integer square root.
assert 3, sqrt(9)

Run the code, if everything goes right, we will get a failed message as expected. Then we gonna introduce our first implementation to fix the failed test case:

With the 2 additional assumptions in 1st test case, we can easily figure out a simple solution: Simply linear search the integers between the root between 1 and n.

1st implementation


def assert(expected, actual)
  if expected == actual
    puts "Passed"
  else
    puts "Failed"
    p expected
    p actual
  end
end
def sqrt(n)
  (1..n).each do |x|
    return x if x * x == n
  end
end
# Start from the easiest case, the integer square root.
assert 3, sqrt(9)

So far so good. But there are 2 magic integers that related to the sqrt, one is 1 and another is 0.
And it seems our function cannot handle all of them correctly, so I wanna improve my algorithm to enable it deals with special numbers: 0, 1.
So I added 2 test cases, and improved the implementation:

sqrt of 0 and 1


def assert(expected, actual)
  if expected == actual
    puts "Passed"
  else
    puts "Failed"
    p expected
    p actual
  end
end
def sqrt(n)
  return 0 if n == 0
  (1..n).each do |x|
    return x if x * x == n
  end
end
# Start from the easiest case, the integer square root.
assert 3, sqrt(9)
# 2 corner cases
assert 1, sqrt(1)
assert 0, sqrt(0)

Now everything looks good, except the performance.
The time complexity of this algorithm is O(n), which is bad. I expected the algorithm complexity could close to O(1). At least it should be O(log n)

How could we improve the performance?
I had ever thought that it is safe to shrink the range to (1..2/n), but in fact it doesn’t really help to improve the performance of this algorithm, it is still O(n) after the update.
And it causes problems when dealing with the number 1, so I prefers to keep it as is.

So what we did in the sqrt function now it kind of a search, we search the number match the condition between 1 and n.
Obviously that 1..n is a ascending series, and mapping x -> x*x has positive differential coefficient.
So it is possible for use to use variant binary search replace the linear search, which reduce the time complexity from O(n) to O(log n)

Binary Search


def assert(expected, actual)
  if expected == actual
    puts "Passed"
  else
    puts "Failed"
    p expected
    p actual
  end
end
def binary_search(goal, start, stop)
  mid = (stop - start) / 2 + start
  mid_square = mid * mid
  if mid_square == goal
    return mid
  elsif mid_square > goal
    return binary_search(goal, start, mid)
  else
    return binary_search(goal, mid, stop)
  end
end
def sqrt(n)
  return 0 if n == 0
  binary_search(n, 1, n)
end
# Start from the easiest case, the integer square root.
assert 3, sqrt(9)
# 2 corner cases
assert 1, sqrt(1)
assert 0, sqrt(0)
# 2 normal cases
assert 5, sqrt(25)
assert 9, sqrt(81)

After implemented the binary search algorithm, we found a very interesting phenomenon: We didn’t restrict n to integer, and it seems it get some capability to dealing with float number?!
So I tried to add 2 float number test cases:

Float number test cases


def assert(expected, actual)
  if expected == actual
    puts "Passed"
  else
    puts "Failed"
    p expected
    p actual
  end
end
def binary_search(goal, start, stop)
  mid = (stop - start) / 2 + start
  mid_square = mid * mid
  if mid_square == goal
    return mid
  elsif mid_square > goal
    return binary_search(goal, start, mid)
  else
    return binary_search(goal, mid, stop)
  end
end
def sqrt(n)
  return 0 if n == 0
  binary_search(n, 1, n)
end
# Start from the easiest case, the integer square root.
assert 3, sqrt(9)
# 2 corner cases
assert 1, sqrt(1)
assert 0, sqrt(0)
# 2 normal cases
assert 5, sqrt(25)
assert 9, sqrt(81)
# float number
assert 2.5, sqrt(6.25)
assert 1.5, sqrt(2.25)

Amazing, our code works fine!
But I believe it is tricky, since both 2.5 and 1.5 is the number stand on the right center between 2 near-by integers. And it fails dealing with generic float number.
The problem we met is call stack overflow. Binary search algorithm failed to hit the exactly accurate number that we expected.
To solve the problem, we can use a small enough range to replace the accurate equality comparison.
We introduce a const EPSILON to describe the accuracy of the calculation.

Adjust precision


EPSILON = 100 * Float.const_get(:EPSILON)
def assert(expected, actual)
  if (expected - actual) < EPSILON
    puts "Passed"
  else
    puts "Failed"
    p expected
    p actual
  end
end
def binary_search(goal, start, stop)
  mid = (stop - start) / 2 + start
  mid_square = mid * mid
  if (mid_square - goal).abs < EPSILON
    return mid
  elsif mid_square > goal
    return binary_search(goal, start, mid)
  else
    return binary_search(goal, mid, stop)
  end
end
def sqrt(n)
  return 0 if n == 0
  binary_search(n, 1, n)
end
# Start from the easiest case, the integer square root.
assert 3, sqrt(9)
# 2 corner cases
assert 1, sqrt(1)
assert 0, sqrt(0)
# 2 normal cases
assert 5, sqrt(25)
assert 9, sqrt(81)
# float number
assert 2.5, sqrt(6.25)
assert 1.5, sqrt(2.25)
# float numbers not at 2^n
assert 3.3, sqrt(10.89)
assert 7.7, sqrt(59.29)

Now it looks our code can calculate the square root of most of the numbers that larger than 1. But it fails to calculate the square root of number less than 1.
The reason of the failure is because x x < x when x < 1 but x x > 1 when x > 1, which means we should search in different range for the numbers > 1 and numbers < 1.

float numbers <1


EPSILON = 100 * Float.const_get(:EPSILON)
def assert(expected, actual)
  if (expected - actual).abs < (EPSILON * 10)
    puts "Passed"
  else
    puts "Failed"
    p expected
    p actual
  end
end
def binary_search(goal, start, stop)
  mid = (stop - start) / 2 + start
  mid_square = mid * mid
  if (mid_square - goal).abs < EPSILON
    return mid
  elsif mid_square > goal
    return binary_search(goal, start, mid)
  else
    return binary_search(goal, mid, stop)
  end
end
def sqrt(n)
  return 0 if n == 0
  if n == 1
    return 1
  elsif n > 1
    return binary_search(n, 1, n)
  else
    return binary_search(n, n, 1)
  end
end
puts "Start from the easiest case, the integer square root."
assert 3, sqrt(9)
puts "2 corner cases"
assert 1, sqrt(1)
assert 0, sqrt(0)
puts "2 normal cases"
assert 5, sqrt(25)
assert 9, sqrt(81)
puts "float number"
assert 2.5, sqrt(6.25)
assert 1.5, sqrt(2.25)
puts "float numbers not at 2^n"
assert 3.3, sqrt(10.89)
assert 7.7, sqrt(59.29)
puts "float number < 1"
assert 0.1, sqrt(0.01)
assert 0.02, sqrt(0.0004)

So now the algorithm is pretty much as what we want, but we still found it raise call stack overflow exception sometimes. And it wastes too much iterations on unnecessary precision.
So I think maybe we can make the algorithm unlimitedly close to O(1) by sacrificing some precision of the result.
So we set up a limit for the maximum iterations that we can take during the calculation. When the limit is reached, we break out the iteration, and return a less accurate number.

It is really useful when calculating the irrational square root, which is has unlimited digits, and there is no accurate solution to it.

irrational square root


EPSILON = 10 * Float.const_get(:EPSILON)
DEPTH_LIMIT = 100
def assert(expected, actual)
  if (expected - actual).abs < (EPSILON * 10)
    puts "Passed"
  else
    puts "Failed"
    p expected
    p actual
  end
end
def binary_search(goal, start, stop, depth)
  mid = (stop - start) / 2 + start
  mid_square = mid * mid
  if (mid_square - goal).abs < EPSILON
    return mid
  else
    return mid if depth >= DEPTH_LIMIT
    if mid_square > goal
      return binary_search(goal, start, mid, depth + 1)
    else
      return binary_search(goal, mid, stop, depth + 1)
    end
  end
end
def sqrt(n)
  return 0 if n == 0
  n = n.to_f
  if n == 1
    return 1
  elsif n > 1
    return binary_search(n, 1, n, 0)
  else
    return binary_search(n, n, 1, 0)
  end
end
puts "Start from the easiest case, the integer square root."
assert 3, sqrt(9)
puts "2 corner cases"
assert 1, sqrt(1)
assert 0, sqrt(0)
puts "2 normal cases"
assert 5, sqrt(25)
assert 9, sqrt(81)
puts "float number"
assert 2.5, sqrt(6.25)
assert 1.5, sqrt(2.25)
puts "float numbers not at 2^n"
assert 3.3, sqrt(10.89)
assert 7.7, sqrt(59.29)
puts "float number < 1"
assert 0.1, sqrt(0.01)
assert 0.02, sqrt(0.0004)
puts "irrational root"
assert 1.414213562373095, sqrt(2)
assert 1.732050807568877, sqrt(3)

So besides of the binary search approach, we can calculate the square root with Newton’s method, which calculates the result with a iterative equation.
Newton’s method has the limitation in precision, but has a good performance. It is said that one of the ancestors of FPS game Quake uses it in the game engine to get a good performance with limited computing power.
Here is a easy-understood document explain how it works.

2013-01-30

Programming►node.js

Programming

Pitfall in node crypto and base64 encoding

Today, we found there is a huge pitfall in node.js crypto module! Decipher has potential problem when processing Base64 encoding.

We’re building RESTful web service based on Node.js, which talks to some other services implemented with Ruby.

Ruby

In ruby, we use the default Base64 class to handle Base64 encoding.

Base64#encode64 has a very interesting feature:
It add line break (\n) to output every 60 characters. This format make the output look pretty and be friendly for human reading:

Ruby Base64 Block

MSwyLDMsNCw1LDYsNyw4LDksMTAsMTEsMTIsMTMsMTQsMTUsMTYsMTcsMTgs
MTksMjAsMjEsMjIsMjMsMjQsMjUsMjYsMjcsMjgsMjksMzAsMzEsMzIsMzMs
MzQsMzUsMzYsMzcsMzgsMzksNDAsNDEsNDIsNDMsNDQsNDUsNDYsNDcsNDgs
NDksNTAsNTEsNTIsNTMsNTQsNTUsNTYsNTcsNTgsNTksNjAsNjEsNjIsNjMs
NjQsNjUsNjYsNjcsNjgsNjksNzAsNzEsNzIsNzMsNzQsNzUsNzYsNzcsNzgs
NzksODAsODEsODIsODMsODQsODUsODYsODcsODgsODksOTAsOTEsOTIsOTMs
OTQsOTUsOTYsOTcsOTgsOTksMTAw

The Base64#decode64 class ignores the line break (\n) when parsing the base64 encoded data, so the line break won’t pollute the data.

Node.js

Node.js take Base64 as one of the 5 standard encodings (ascii, utf8, base64, binary, hex). Ideally the data or string can be transcoded between these 4 encodings without data loss.

The Buffer class is the simplest way to transcode the data:

Base64 Encoder in Node.js


Base64 =
  encode64: (text) ->
    new Buffer(text, 'utf8').toString('base64')
  decode64: (base64) ->
    new Buffer(base64. 'base64').toString('utf8')

Although encode64 function in node.js won’t add line break to the output, but the decode64 function does ignore the line break when parsing the data. It keeps the consistent behavior with ruby Base64 class, so we can use this decode64 function to decode the data from ruby.

Since base64 is one of the standard encodings, and some of the node.js API does allow set encoding for input and output. So ideally, we can complete the base64 encoding and decoding during processing the data.
It seems Node.js is more convenient comparing to Ruby when dealing with Base64.

e.g. We can combine reading file and base64 encoding the content into one operation by setting the encoding to readFileSync API.

Write and Read string as Base64


fs = require('fs')
fileName = './binary.dat' # this file contains binary data
base64 = fs.readFileSync(fileName, 'base64') # file content has been base64 encoded

It looks like we can always use this trick to avoid manually base64 encoding and decoding when the API has encoding parameter! But actually it is not true! There is a BIG pitfall here!

In our real case, we uses crypto module to decrypt the the JSON document that encrypted and base64 encoded by Ruby:

Base64 Deocde and Decrypt


crypto = require('crypto')
parse = (data, algorithm, key, iv) ->
  decipher = crypto.createDecipheriv(algorithm, key, iv)
  decrypted = decipher.update(data, 'base64', 'utf8') # Set input encoding to 'base64' to ask API to base64 decode the input before decryption
  decrypted += dechiper.final('utf8')
  JSON.parse(decrypted)

Manually Base64 Decoding


crypto = require('crypto')
parse = (data, algorithm, key, iv) ->
  decipher = crypto.createDecipheriv(algorithm, key, iv)
  binary = new Buffer(data,'base64') # Manually Base64 Decode
  decrypted = decipher.update(binary, 'binary', 'utf8') # Set input encoding to 'binary'
  decrypted += dechiper.final('utf8')
  JSON.parse(decrypted)

The previous 2 implementations are very similar except the second one base64 decoded the data manually by using Buffer. Ideally they should be equivalent in behavior. But in fact, they are NOT equivalent!

The previous implementation throws “TypeError: DecipherFinal fail”.
And the reason is that the shortcut way doesn’t ignore the line break, but Buffer does!!! So in the previous implementation, the data is polluted by the line break!

Conclusion

Be careful, when you try to ask the API to base64 decode the data by setting the encoding argument to ‘base64’. It has inconsistent behavior comparing to Buffer class.

I’m not sure whether it is a node.js bug, or it is as is by design. But it is indeed a pitfall that hides so deep. And usually is extremely hard to figure out. Since encrypted binary is hard to human to read, and debugging between 2 languages are also kind of hard!

ThoughtWorkshop

Digital Bigs in my thought