Collection Filtering Algorithm with Bit Operation - Part.1

Background

Collection must be the most commonly used data model in programming. And there are a number of data structure that represents collection in memory, in which Array and Liked List must be the most well known ones.

When dealing with collection, filtering must be one of the most widely used operation being used. Such as find item(s) satisfies specific criteria from the collection. Although filtering is used a lot, but it isn’t actually a simple or easy operation to apply, especially dealing with tremendous data volume.

In fact, there are a number of derived topics about collection filtering, which is impossible to cover them all in the article. This article will focus on bit operation, probably a not widely known but really interesting topic to introduce.

Problem: Find the missing integer

Problem
Suppose there is a collection of integers L.
Name the collection that contains integers from 1 to N as G
Then L contains all the elements of G except one number x.
Items in L are in random order.
Find x.

TIP: This problem be too easy to experienced developers and algorithm hackers. But it is a good opening words to the following 2 problems in this series. Also it reveals the core concept of some other advanced technologies, such as CRC or bloom filter.

When got this problem, the very intuitive(means do it manually by human being) idea is to compare the elements in L and G, find each element in L and remove it from G, then the only left item in G is x.

Here is the code:

1
2
3
4
5
6
7
8
9
10
# @param Integer[] L
# @param Integer N
# @return Integer x
def find_missing_x(L, N)
G = [*(1..N)] # Create G with all integers from 1 to N
L.each { |i| G.delete(i) } # Traverse L, and remove each item in L from G
return G.last # Return the only rest item from G
end

Well this is a work but brute algorithm, which is not efficient in both time and space:

  • Instantiate G means it has space complexity N
  • Find and remove i from G means it has time complexity at O(N*Log(N))

Yes, the algorithm can be optimized, with bit operation. In fact the problem can be resolved with time complexity O(N) and space complexity O(1).

Exclusive-Or Operation

Exclusive-Or(Or XOR for short, written as ^ in this article) is a basic bit operation, here is an Venn Diagram explains it:

XOR

XOR is an important bit operation because it has following interesting features:

  1. A ^ B = B ^ A (aka Commutativity)
  2. (A ^ B) ^ C = A ^ (B ^ C) (aka Associativity)
  3. A ^ A = 0
  4. A ^ 0 = A

Commutativity and Associativity ensure that XOR can be used despite of order. The 3rd and 4th feature establishes a critical position for XOR in cryptology and encoding.

Solution: with XOR

By making use of XOR, previous problem can ben resolved in a much more graceful way:

  • Given y is the xor result of all items in G
  • And z is the xor result of all items in L
  • Then x = y ^ z
1
2
3
4
5
6
7
8
9
10
11
12
# @param Integer[] L
# @param Integer N
# @return Integer x
def find_missing_x(L, N)
x = 0 # Set x to 0, since A ^ 0 = A
(1..N).each { |i| x ^= i } # Calculate the xor result for G
L.each { |i| x ^= i } # Keep xor all elements in L
return x # Return the only rest item from G
end

Let’s prove it mathematically:

1
2
3
4
5
6
7
Given x'= y ^ z
Then x' = (n1 ^ n2 ^ n3 ^ ... ^ x ^ ... ^ nN) ^ (n1 ^ n2 ^ n3 ^ ... ^ nN)
Since ^ obeys commutativity and associativity
Then = (n1 ^ n1) ^ (n2 ^ n2) ^ ... ^ (nN ^ nN) ^ x
As n ^ n = 0 and n ^ 0 = n
Then x'= 0 ^ 0 ^ ... ^ 0 ^ x
= x

Actually the code can be more concise by using reduce, with 2-lines of code

1
2
3
4
5
6
7
8
# @param Integer[] L
# @param Integer N
# @return Integer x
def find_missing_x(L, N)
x = (1..N).reduce(&:^) # Calculate the xor result for G
L.reduce(x, &:^) # Keep xor all elements in L and return
end

Or C version:

1
2
3
4
5
6
7
8
9
int find_missing_x(long N, long* L) {
long x = N;
for(long i = 0; i < N - 1; i++) { // L has size of N-1
x ^= i ^ L[i];
}
return x;
}

Extending problems

So far the problem has been gracefully resolved. But it is yet the end, here is a couple of extending questions:

  1. If G is collection with random but unique numbers (Not continues integer from 1 to N). How to write the code?
  2. Based on 1, if there are 2 numbers are missing? How to find them both?
  3. Based on 1, but L contains the members from G except xfor exactly 2 times. That means L might contain 0 or 1 x, but exactly 2 for any other members from G. How to find x?
  4. Based on 1, but G isn’t a collection of integer, but a collection of web urls. How to find the one missing url?
  5. Based on 4, G is a collection of url, L is subset of G (might contains fewer members other than exactly N - 1). How can we know a randomly picked member from G is absolutely not contained by L. Given the tolerance to mistake on saying a member from G is in L but it doesn’t.

These problems will be explained in the following posts.

Awesome geek game Untrusted - the Continuing Adventures of Dr. Eval

Untrusted is an awesome game brought to us by Alex Nisnevich and Greg Shuflin. Untrusted is an unique puzzle game designed for geeks and developers. The reason I said it is designed for geeks and developer is because to solve the puzzles in the game, you need to know or even to write some javascript.

Here is how the creators describe their baby:

Untrusted —or— the Continuing Adventures of Dr. Eval is an exciting Meta-Javascript Adventure Game wherein you guide the dashing, steadfast Dr. Eval through a mysterious MACHINE CONTINUUM, wherein, using only his trusty computer and the TURING-COMPLETE power of Javascript, he must literally ALTER HIS REALITY in order to find his freedom! You must literally edit and re-execute the very Javascript running the game in your browser to save Dr. Eval from this dark and confusing reality!

The description is a little bit hard to understand if you don’t touch the game. I’ll try to translate it a little bit:

In the game, to clear a level, you need to move your avatar to the exit. And just like other normal acarde game, you can control the your avatar, Dr. Eval using arrow keys. The intresting part of this game is that basically, you will always run into the a dead end if you just move Dr.Eval around without doing anything else. Luckily, you will be able to access the source code that creates the world and the rule in the world. To save yourself from the dead end, you need to change part the world/rule by hacking the source code.

The game isn’t that hard if you have some coding experience and is familiar with javascript concepts. And learn from the passed level is very important, since you might find either useful hints or even code to solve your current problem.

NOTE Besides the puzzle and the code, the music of each level is also great! 8Bit music in different style! Really awesome!

Insterested? Here is the port lead to the game: Untrusted

Hints and Walkthrough

I attached my hints and solutions below, is case if you run into trouble.

I have extract the walkthrough into another post.

Here is the link to the walkthrough: Walkthrough