The Insightful Troll

Rants and Ruminations.

Shortening URLs

How to design a system that takes big URLs like https://www.geeksforgeeks.org/count-sum-of-digits-in-numbers-from-1-to-n and converts them into a short 6 character URL. It is given that URLs are stored in database and every URL has an associated integer id.

One important thing to note is, the long URL should also be uniquely identifiable from short URLs. So we need a Bijective Function.

One Simple Solution could be Hashing. Use a hash function to convert long string to short string. In hashing, that may be collisions (2 long URLs map to same short URL) and we need a unique short URL for every long URL so that we can access long URL back.

A Better Solution is to use the integer id stored in database and convert the integer to character string that is at most 6 characters long. This problem can basically seen as a base conversion problem where we have a 10 digit input number and we want to convert it into a 6 character long string.

Below is one important observation about possible characters in URL.

A URL character can be one of the following:

  1. A lower case alphabet [‘a’ to ‘z’], total 26 characters
  2. An upper case alphabet [‘A’ to ‘Z’], total 26 characters
  3. A digit [‘0’ to ‘9’], total 10 characters

There are total 26 + 26 + 10 = 62 possible characters.

So the task is to convert a decimal number to base 62 number.

To get the original long URL, we need to get URL id in database. The id can be obtained using base 62 to decimal conversion.

Here is some code in ruby that implements a Base62 encoder and decoder:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Base62
  CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".chars
  BASE = 62

  def self.encode(value)
    s = []
    while value >= BASE
      value, rem = value.divmod(BASE)
      s << CHARS[rem]
    end
    s << CHARS[value]
    s.reverse.join("")
  end

  def self.decode(str)
    str = str.split('').reverse
    total = 0
    str.each_with_index do |v,k|
      total += (CHARS.index(v) * (BASE ** k))
    end
    total
  end
end

And here is an example:

1
2
3
4
5
6
irb(main):001:0> n=12345
=> 12345
irb(main):002:0> shorturl = Base62::encode(n)
=> "3d7"
irb(main):003:0> t = Base62::decode(shorturl)
=> 12345

Stay tuned for a simple personal URL manager coming shortly…

Comments