652. Find Duplicate Subtrees

how to serialization

序列化: root + "," + s(root.left) + "," + s(root.right)

xx

xxx

因為對子樹做操作, so use postorder

postorder dfs, serialization with String

Can anyone explain how the first solution is O(n^2)? My understanding is that it pretty much does the same thing as second solution. Arent both O(n) - n being number of nodes in the tree?

Edit:

I understand now. Since strings are bring used, the concatenation takes O(n) time in each recursive call taking overall complexity to O(n2). https://stackoverflow.com/questions/15400508/string-concatenation-complexity-in-c-and-java

雖然 DFS 是 O(n), 但字串串接時用 String, 所以串接會是 O(n), 所以是 O(n^2)

use StringBuilder 應該就沒字串串接 O(n) 這問題, 但問題還是在於 hash , 如果字太長了 , hash 也會變成 從 O(1) -> O(n), (這題數量可以到

  • The number of the nodes in the tree will be in the range [1, 10^4]

), 所以是 dfs * String hash = O(n^2)

@shiyuluu is right. Concatenating in string isn't a main problem here. The problem is building hash from all seeing nodes fast.

If you want to build a hash from a string, you have to go through all the letters of that string. Each letter of this string is one node. Therefore, the first solution has a complexity of n ^ 2.

T: O(n^2)

S: O(n..larger)

class Solution {
    
    public List<TreeNode> findDuplicateSubtrees(TreeNode root) {
        Map<String, Integer> map = new HashMap<>();
        List<TreeNode> res = new ArrayList<>();
        serialization(root, map, res);
        return res;
    }
    private String serialization(TreeNode root, Map<String, Integer> map, List<TreeNode> res) {
        if (root == null) {
            return "#";
        }
        String key = root.val + "," + serialization(root.left, map, res) + "," + serialization(root.right, map, res);
        map.put(key, map.getOrDefault(key,0)+1);
        if (map.get(key) == 2) {
            res.add(root);
        }
        return key;
    }
}

postorder dfs + serialization + int id (識別符

Map<String key, global number> , when add new key, global number++

Map<global number, count

T: O(n)

S: O(n..larger)

We can improve this to O(n) by replacing full serializations with serial ids instead.

關鍵是 private int serialization, serialization 回來的是 int, 這樣就可以避免用 string 來做

serialization

所以雖然這也還是用 String, Integer , Map<String, Integer>

但因為 String key 會很短了, root.val + "int id" + "int id"

所以整個 serialization 過程就不會很久

class Solution {
    
    public List<TreeNode> findDuplicateSubtrees(TreeNode root) {
        Map<String, Integer> countMap = new HashMap<>();
        Map<String, Integer> idMap = new HashMap<>();

        List<TreeNode> res = new ArrayList<>();
        serialization(root, idMap, countMap, res);
        return res;
    }
    // the num of nodes can be 10^4, so serialization string may be too long, it casues hashmap confliction
    // the goal is reduce serialization string to a "short id"
    private int serialization(TreeNode root, Map<String, Integer> idMap, Map<String, Integer> countMap, List<TreeNode> res) {
        if (root == null) {
            return -1;
        }
        String key = root.val + "," + 
            serialization(root.left, idMap, countMap, res) + "," + 
            serialization(root.right, idMap, countMap, res);
        
        if (!idMap.containsKey(key)) {
            // when there is new key to add, idMap.size() will increase, so it's a unique number
            // idMap.size() be the short id, or use a global variable is also ok
            idMap.put(key, idMap.size()); 
            
            countMap.put(key, 1); // countMap part is just like solution1's map
        } else {
            countMap.put(key, countMap.get(key)+1);
            if (countMap.get(key) == 2) {
                res.add(root);
            }
        }
        return idMap.get(key); // return uid
    }

}

these 3 line can to one line

//int uid = uidMap.computeIfAbsent(serial, x-> t++); // 如果是有這個 key, 用原本的值, 沒有才回傳 t++加玩的結果
        
int uid = serialToUidMap.getOrDefault(serial, t);
if (uid == t) t++;
serialToUidMap.put(serial, uid);

for 不同的 key(很短的 serial), 給一個新的 uid, 最後回傳的也是 uid, 這樣序列化整個的結果會很短

一樣的 key 還是用原本的 value(uid)

不過要注意 uidMap.computeIfAbsent(serial, x-> t++)
這個 t 要全域變數

class Solution {
    int t;
    public List<TreeNode> findDuplicateSubtrees(TreeNode root) {
        t = 1;
        Map<String, Integer> serializeToUidMap = new HashMap<>(); // <string key, uid>
        Map<Integer, Integer> uidToCountMap = new HashMap<>(); // <uid, count>
        List<TreeNode> res = new ArrayList<>();
        
        serialize(root, serializeToUidMap, uidToCountMap, res);
        return res;
    }
    
    private int serialize(TreeNode node, Map<String, Integer> serializeToUidMap,
                         Map<Integer, Integer> uidToCountMap, List<TreeNode> res) {
        if (node == null) {
            return 0;
        }
        String serializedStr = node.val + "," +
            serialize(node.left, serializeToUidMap, uidToCountMap, res) + "," +
            serialize(node.right, serializeToUidMap, uidToCountMap, res);
        
        int uid = serializeToUidMap.computeIfAbsent(serializedStr, k -> t++);
        
        uidToCountMap.put(uid, uidToCountMap.getOrDefault(uid, 0) + 1);
        if (uidToCountMap.get(uid) == 2) {
            res.add(node);
        }
        return uid;
    }
}

Last updated