Saturday 30 May 2015

Data Structure and Algorithm - Trie and Palindrome (Google Interview Question)

1.Problem Description


2. Analysis
Initially I was only thinking about the case given the example list - all the elements/strings have the same length. A simple solution is to use hash map.
    - Use the string as the key and the frequency of it appearing in the list as the value.
    - Inset the first element and set the frequency as 1.
    - From the second string on, try to find its reverse string in the hash map.
        * If found, then there is palindrome and increment the counter by the frequency.
        * Insert the string into the hash map
            ~ If already in the hasp map, then increment the frequency by 1
            ~ Otherwise insert and set the frequency as 1

Unfortunately this algorithm won't work for strings with varying length, for instance {a, aa, aaa, aaaa}, - the elements can have different length. In this example {a, aa, aaa, aaaa} all the combinations are palindrome and in total it has 6 combinations. The hash map solution does not work any more because all the strings are different keys.

Let's examine what properties they share.
The strings have the same length. It is quite simple. The reverse of the second string should be equal to the first string.
The strings have different length. Assume two string with size L1, L2, and L1 > l2. Then after reverse one of them. Then the first L2 should be same and the rest of L1-l2 should be palindrome as well. In the above example
    - "a" and "aaaa", the reverse of  "aaaa" is "aaaa".
        * The red part is the same and blue part is palindrome as well. 

In summary for any two given string s1 and s2 with length of L1 and L2, to decide if s1s2 is a palindrome follows, (s2' is the reverse string of s2)
    - Case 1: L1 = L2, then  s1 = s2'
    - Case 2: L1 >  l2  then s1.substr(0, L2) = s2' and s1.substr(L2 + 1, L1 - L2) is palindrome too.
    - Case 3: L1 < L2, then s1 = s2'.substr(0, L1) and s2.substr(L1 + 1, L2 - L1) is palindreom too.
3. Solution
Some contributors mentioned using Trie structure/algorithm to solve this issue. But none of them proposed how the data structure looks like exactly and how to do the searching/traversing and insertion operations.

Data Structure
In this case I assume all the strings given are made of letters only and are either in upper case or in lower case. Based on my original Trie data structure and  the fact that there might be duplicates of strings in the list according to the problem description, the data structure is modifies as,

constexpr unsigned char NUMBER_OF_LETTERS = 26
struct DictionaryTrie {
char oneLetter;
        size_t frequency;
DictionaryTrie* children[NUMBER_OF_LETTERS];
DictionaryTrie() {
str = NULL;
for (int i = 0; i != NUMBER_OF_LETTERS; ++i) {
children[i] = NULL;
}
}
};
Each node only takes one letter, "oneLetter" as data and records how many times this string appears in the list as "frequency".

How to insert a string into Trie?
When a string is inserted into Trie, only the last letter's "frequency" set to indicate how many times this string appears in the list. For instance,
    - "abc" inserted into a empty Trie as,
        * root->(a, 0)->(b, 0)->(c, 1)
    - "abcd" inserted into this Trie
        * root->(a, 0)->(b, 0)->(c, 1)->(d, 1)
    - "abc" inserted into the Trie again
        * root->(a, 0)->(b, 0)->(c, 2)->(d, 1)
    - "abef" inserted into the Trie
        * root->(a, 0)->(b, 0)->(c, 2)->(d, 1)
                                         |->(e, 0)->(f, 1)

Pseudo-code
So far we understand how the string is inserted into the Trie and how to decide if the combination of two strings (Section 2) is a palindrome. Here is the pseudo-code:
1. Initialize counter as 0
2. Insert the first string of the list into an empty Trie
3. Take the next string S and its reverse string S'
4. Traverse the Trie with S'
    4.1 If found a node with frequency > 0, then check S'  (check Section 2)
        - Increment counter by frequency and go to Step 5, if reach the end of S' too.
                                                                                          : Case 1 in Section 2
        - Check if the rest of S' is palindrome, if does not reach the end of S'
                                                                                          : Case 2 in Section 2
            * If yes, then increment counter by frequency and go to Step 4
            * If no, then go to Step 4
    4.2 If reach the end of S', record node X                      : Case 3 in Section 2
        - 4.2.1 Continue traverse the children of X
            * If found a node Y with frequency > 0, then
                ~ If between X and Y is palindrome,  increment counter by frequency and go to Step 4.2.1
                ~ If between X and Y is not palindrome, go to Step 4.2.1
            * Reach the end of Trie, go to Step 5
5. Insert S into Trie
    - If not exist, insert it and set the last letter's frequency as 1
    - If exist. increment the last letter's frequency by 1
6. Repeat Step 3,4 and 5 until exhaust the list

4. Example
Here is an example to show how the algorithm works.
    - The black color is the Trie structure
    - The red color is the newly inserted new string (tracing back to the root node) : Step 5
    - The green color is to detect the palindrome: Step 4
        * The reverse string is to compare with the Trie before inserting it
        * For instance in Step 5: "aabb" is to compare with the Trie in Step 4
    - The blue color is indicate the result
    - Case 3: 2, 3, 4
    - Case 1: 6 @ (a, 1) of level 1 (Trie at 5)
    - Case 2: 6 @ (a, 1) of level 2 (Trie at 5)
       
Example 1

5. Complexity
I don't think O(nk) is achievable and the worst case will be (n*(k^2)). Because there is a special case like this {a, aa, aaa, aaaa, aaaaa}.
When the algorithm does with "aaaaa", it will have to decide if "aaaa", "aaa", "aa" and "a" are palindrome. This operation is quite expensive as O(k^2). 

No comments:

Post a Comment