## Background

Collection must be the most commonly used data model in programming. And there are a number of data structure that represents collection in memory, in which `Array`

and `Liked List`

must be the most well known ones.

When dealing with collection, `filtering`

must be one of the most widely used operation being used. Such as find item(s) satisfies specific criteria from the collection. Although filtering is used a lot, but it isn’t actually a simple or easy operation to apply, especially dealing with tremendous data volume.

In fact, there are a number of derived topics about collection filtering, which is impossible to cover them all in the article. This article will focus on `bit operation`

, probably a not widely known but really interesting topic to introduce.

## Problem: Find the missing integer

Problem

Suppose there is a collection of integers`L`

.

Name the collection that contains integers from 1 to`N`

as`G`

Then`L`

contains all the elements of`G`

except one number`x`

.

Items in`L`

are in random order.

Find`x`

.

**TIP: ** This problem be too easy to experienced developers and algorithm hackers. But it is a good opening words to the following 2 problems in this series. Also it reveals the core concept of some other advanced technologies, such as `CRC`

or `bloom filter`

.

When got this problem, the very intuitive(means do it manually by human being) idea is to compare the elements in `L`

and `G`

, find each element in `L`

and remove it from `G`

, then the only left item in `G`

is `x`

.

Here is the code:

1 | # @param Integer[] L |

Well this is a work but brute algorithm, which is not efficient in both time and space:

- Instantiate
`G`

means it has space complexity`N`

- Find and remove
`i`

from`G`

means it has time complexity at`O(N*Log(N))`

Yes, the algorithm can be optimized, with bit operation. In fact the problem can be resolved with time complexity `O(N)`

and space complexity `O(1)`

.

`Exclusive-Or`

Operation

`Exclusive-Or`

(Or `XOR`

for short, written as `^`

in this article) is a basic bit operation, here is an Venn Diagram explains it:

`XOR`

is an important bit operation because it has following interesting features:

- A ^ B = B ^ A (aka
**Commutativity**) - (A ^ B) ^ C = A ^ (B ^ C) (aka
**Associativity**) - A ^ A = 0
- A ^ 0 = A

Commutativity and Associativity ensure that `XOR`

can be used despite of order. The 3rd and 4th feature establishes a critical position for `XOR`

in `cryptology`

and `encoding`

.

## Solution: with `XOR`

By making use of `XOR`

, previous problem can ben resolved in a much more graceful way:

- Given
`y`

is the`xor`

result of all items in`G`

- And
`z`

is the`xor`

result of all items in`L`

- Then
`x = y ^ z`

1 | # @param Integer[] L |

Let’s prove it mathematically:

1 | Given x'= y ^ z |

Actually the code can be more concise by using `reduce`

, with 2-lines of code

1 | # @param Integer[] L |

Or `C`

version:

1 | int find_missing_x(long N, long* L) { |

## Extending problems

So far the problem has been gracefully resolved. But it is yet the end, here is a couple of extending questions:

- If
`G`

is collection with random but unique numbers (Not continues integer from 1 to`N`

). How to write the code? - Based on
`1`

, if there are 2 numbers are missing? How to find them both? - Based on
`1`

, but`L`

contains the members from`G`

except`x`

for exactly 2 times. That means`L`

might contain 0 or 1`x`

, but exactly 2 for any other members from`G`

. How to find`x`

? - Based on
`1`

, but`G`

isn’t a collection of integer, but a collection of web urls. How to find the one missing url? - Based on
`4`

,`G`

is a collection of url,`L`

is subset of`G`

(might contains fewer members other than exactly`N`

- 1). How can we know a randomly picked member from`G`

is absolutely not contained by`L`

. Given the tolerance to mistake on saying a member from`G`

is in`L`

but it doesn’t.

These problems will be explained in the following posts.