субота, 10 червня 2017 р.

Applied computer science. Algorithms complexity analysis. Part 1

Algorithms complexity and big-O notation

In computer science, the analysis of algorithms is the determination of the amount of resources (such as time and storage) necessary to execute them. Most algorithms are designed to work with inputs of arbitrary length. Usually, the efficiency or running time of an algorithm is stated as a function relating the input length to the number of steps (time complexity) or storage locations (space complexity). (Wikipedia)

Definition. Algorithm time-complexity O — is the upper bound of elementary operations number depending on the input size.

Example. Consider following code:
for (int i=0; i<n; i++) c++;
Let's count number of elementary operations
  • 1 for "i=0"
  • n+1 for "i<n"
  • 2n for "i++" (i++ is equivalent to  i = i+1 — which consists of two elementary operations)
  • 2n for "c++"

So complexity of this code is O(n) =  2+5n

One more example before we go to Big-O notation.

 
int i = 0;
while (i < n){
  for(int j=0; j


Let's count complexity:
  • i = 0 will be executed once.
  • i<n in the while loop will be executed n+1 times
  • i++ — n times
  • for-loop will be started n times, and every time:
    • j=0 will be executed once
    • j<n — n+1
    • j++ — n times
    • c++ — n times

So in total complexity is O(n) = 1 + (n+1) + 2n + n(1 + (n+1) + 2n + 2n) = 2 + 5n + 5n²
For real code samples complexity may be quite complex function! :-)
And it is easy to mistaken in compolexity calculation and for instance forget some summand.

Moreover  there are a lot of reasons not to bother to calculate the precise form of complexity function:
  • Some operations are more expensive than the others. Complexity function doesn't consider that.
  • Cost of one operation may differ for different CPU's.
  • Cost of one operation may differ for different execution contexts (CPU caching, multithreading context switches, ...)
  • Some costs are hidden, when code is written in a high level language (memory allocation, garbage collection, just in time compilation, etc)
That's why usually engineers use so called complexity asymptotic estimation instead of complexity function itself.

To make the long story short — you can just drop all summands except one with the fastest growth rate and then drop all constants. The result is called complexity asymptotic estimation.

During complexity estimation engineers usually use Big-O notation, written in form O(n²) or Ѳ(n log(n)). The strict formal definition of this symbolic notation is below:

Definition 1. They say "g(n) = O(f(n))"  ⇔  ∀c>0 ∃n₀: ∀n>n₀ → g(n)<c·f(n)

Definition 2. They say "g(n) = Ѳ(f(n))"  ⇔  g(n) = O(f(n)) and  f(n) = O(g(n))

(Ѳ — is greek capital letter Theta)

Examples.

  • n = O(n)
  • 2n+5 = O(n)
  • n = O(n²)
  • 2n²+20n+1 = O(n²)
  • n³ = Ѳ(n³)
  • n³ + n²log n = Ѳ(n³)
  • 5n² + 3n + log n + 1000 = Ѳ(n²)

Treating these definitions informally: f = O(g) means "f grows not faster than g (ignoring constants)". 
While f = Ѳ(g) means "f growth rate is the same as g (ignoring constants)". 

As you can see, Ѳ — is more precise and therefore more preferable estimation than O. But somehow people very-very often use O-symbol in sence of Ѳ-symbol, especially during informal conversations, especially in written form (may be because of not so many people knows how to type Ѳ-symbol). So, if you want to be a wet blanket, you should correct everybody who mistakenly uses O-notation instead of Ѳ-notation. ;-)

Немає коментарів:

Дописати коментар

HyperComments for Blogger

comments powered by HyperComments