# 0187. Repeated DNA Sequences

<https://leetcode.com/problems/repeated-dna-sequences>

## Description

The **DNA sequence** is composed of a series of nucleotides abbreviated as `'A'`, `'C'`, `'G'`, and `'T'`.

* For example, `"ACGAATTCCG"` is a **DNA sequence**.

When studying **DNA**, it is useful to identify repeated sequences within the DNA.

Given a string `s` that represents a **DNA sequence**, return all the **`10`-letter-long** sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in **any order**.

**Example 1:**

```
**Input:** s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
**Output:** ["AAAAACCCCC","CCCCCAAAAA"]
```

**Example 2:**

```
**Input:** s = "AAAAAAAAAAAAA"
**Output:** ["AAAAAAAAAA"]
```

**Constraints:**

* `1 <= s.length <= 105`
* `s[i]` is either `'A'`, `'C'`, `'G'`, or `'T'`.

## ac

```java
class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res = new ArrayList<String>();
        Set<Integer> repeat = new HashSet<Integer>();
        Set<Integer> repeatMore = new HashSet<Integer>();

        // dictionary for  A C G T
        Map<Character, Integer> map = new HashMap<Character, Integer>();
        map.put('A', 0);
        map.put('C', 1);
        map.put('G', 2);
        map.put('T', 3);


        int val = 0;
        for (int i = 0; i < s.length(); i++) {
            val <<= 2; // move 2 digits left, make room for new letter
            val |= map.get(s.charAt(i));
            val &= 0xfffff; // window technique: any bit larger than 20 is set to 0

            if (i < 9) continue;

            if (!repeat.add(val) && repeatMore.add(val)) {
                res.add(s.substring(i-9, i+1));
            }
        }

        return res;
    }
}
/**
1. 10-letter-long means every 10 letter in the sequence
2. [important] storing 10 letters is too space consuming to implement, so make it smaller, "compress" it to a 4-bit int
3. store 4-bit int in one hashset, when repeat, add to result. when repeat more than twice, ignore it.
4. window technique: val &= 0xfffff;  any bit larger than 20 is set to 0
5. set.add(val) can be used to replace `if (set.contains(val)) do sth else set.add()` because it will return boolean result. 

A  00
C  01
G  10
T  11
**/
```
