0187. Repeated DNA Sequences

https://leetcode.com/problems/repeated-dna-sequences

Description

The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'.

  • For example, "ACGAATTCCG" is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Example 1:

**Input:** s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
**Output:** ["AAAAACCCCC","CCCCCAAAAA"]

Example 2:

**Input:** s = "AAAAAAAAAAAAA"
**Output:** ["AAAAAAAAAA"]

Constraints:

  • 1 <= s.length <= 105

  • s[i] is either 'A', 'C', 'G', or 'T'.

ac

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res = new ArrayList<String>();
        Set<Integer> repeat = new HashSet<Integer>();
        Set<Integer> repeatMore = new HashSet<Integer>();

        // dictionary for  A C G T
        Map<Character, Integer> map = new HashMap<Character, Integer>();
        map.put('A', 0);
        map.put('C', 1);
        map.put('G', 2);
        map.put('T', 3);


        int val = 0;
        for (int i = 0; i < s.length(); i++) {
            val <<= 2; // move 2 digits left, make room for new letter
            val |= map.get(s.charAt(i));
            val &= 0xfffff; // window technique: any bit larger than 20 is set to 0

            if (i < 9) continue;

            if (!repeat.add(val) && repeatMore.add(val)) {
                res.add(s.substring(i-9, i+1));
            }
        }

        return res;
    }
}
/**
1. 10-letter-long means every 10 letter in the sequence
2. [important] storing 10 letters is too space consuming to implement, so make it smaller, "compress" it to a 4-bit int
3. store 4-bit int in one hashset, when repeat, add to result. when repeat more than twice, ignore it.
4. window technique: val &= 0xfffff;  any bit larger than 20 is set to 0
5. set.add(val) can be used to replace `if (set.contains(val)) do sth else set.add()` because it will return boolean result. 

A  00
C  01
G  10
T  11
**/

Last updated