Background
Collection must be the most commonly used data model in programming. And there are a number of data structure that represents collection in memory, in which Array
and Liked List
must be the most well known ones.
When dealing with collection, filtering
must be one of the most widely used operation being used. Such as find item(s) satisfies specific criteria from the collection. Although filtering is used a lot, but it isn’t actually a simple or easy operation to apply, especially dealing with tremendous data volume.
In fact, there are a number of derived topics about collection filtering, which is impossible to cover them all in the article. This article will focus on bit operation
, probably a not widely known but really interesting topic to introduce.
Problem: Find the missing integer
Problem
Suppose there is a collection of integersL
.
Name the collection that contains integers from 1 toN
asG
ThenL
contains all the elements ofG
except one numberx
.
Items inL
are in random order.
Findx
.
TIP: This problem be too easy to experienced developers and algorithm hackers. But it is a good opening words to the following 2 problems in this series. Also it reveals the core concept of some other advanced technologies, such as CRC
or bloom filter
.
When got this problem, the very intuitive(means do it manually by human being) idea is to compare the elements in L
and G
, find each element in L
and remove it from G
, then the only left item in G
is x
.
Here is the code:
|
|
Well this is a work but brute algorithm, which is not efficient in both time and space:
- Instantiate
G
means it has space complexityN
- Find and remove
i
fromG
means it has time complexity atO(N*Log(N))
Yes, the algorithm can be optimized, with bit operation. In fact the problem can be resolved with time complexity O(N)
and space complexity O(1)
.
Exclusive-Or
Operation
Exclusive-Or
(Or XOR
for short, written as ^
in this article) is a basic bit operation, here is an Venn Diagram explains it:
XOR
is an important bit operation because it has following interesting features:
- A ^ B = B ^ A (aka Commutativity)
- (A ^ B) ^ C = A ^ (B ^ C) (aka Associativity)
- A ^ A = 0
- A ^ 0 = A
Commutativity and Associativity ensure that XOR
can be used despite of order. The 3rd and 4th feature establishes a critical position for XOR
in cryptology
and encoding
.
Solution: with XOR
By making use of XOR
, previous problem can ben resolved in a much more graceful way:
- Given
y
is thexor
result of all items inG
- And
z
is thexor
result of all items inL
- Then
x = y ^ z
|
|
Let’s prove it mathematically:
Actually the code can be more concise by using reduce
, with 2-lines of code
|
|
Or C
version:
Extending problems
So far the problem has been gracefully resolved. But it is yet the end, here is a couple of extending questions:
- If
G
is collection with random but unique numbers (Not continues integer from 1 toN
). How to write the code? - Based on
1
, if there are 2 numbers are missing? How to find them both? - Based on
1
, butL
contains the members fromG
exceptx
for exactly 2 times. That meansL
might contain 0 or 1x
, but exactly 2 for any other members fromG
. How to findx
? - Based on
1
, butG
isn’t a collection of integer, but a collection of web urls. How to find the one missing url? - Based on
4
,G
is a collection of url,L
is subset ofG
(might contains fewer members other than exactlyN
- 1). How can we know a randomly picked member fromG
is absolutely not contained byL
. Given the tolerance to mistake on saying a member fromG
is inL
but it doesn’t.
These problems will be explained in the following posts.