The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'.
For example, "ACGAATTCCG" is a DNA sequence.
When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Example 1:
**Input:** s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
**Output:** ["AAAAACCCCC","CCCCCAAAAA"]
Example 2:
**Input:** s = "AAAAAAAAAAAAA"
**Output:** ["AAAAAAAAAA"]
Constraints:
1 <= s.length <= 105
s[i] is either 'A', 'C', 'G', or 'T'.
ac
classSolution {publicList<String> findRepeatedDnaSequences(String s) {List<String> res =newArrayList<String>();Set<Integer> repeat =newHashSet<Integer>();Set<Integer> repeatMore =newHashSet<Integer>();// dictionary for A C G TMap<Character,Integer> map =newHashMap<Character,Integer>();map.put('A',0);map.put('C',1);map.put('G',2);map.put('T',3);int val =0;for (int i =0; i <s.length(); i++) { val <<=2; // move 2 digits left, make room for new letter val |=map.get(s.charAt(i)); val &=0xfffff; // window technique: any bit larger than 20 is set to 0if (i <9) continue;if (!repeat.add(val) &&repeatMore.add(val)) {res.add(s.substring(i-9, i+1)); } }return res; }}/**1. 10-letter-long means every 10 letter in the sequence2. [important] storing 10 letters is too space consuming to implement, so make it smaller, "compress" it to a 4-bit int3. store 4-bit int in one hashset, when repeat, add to result. when repeat more than twice, ignore it.4. window technique: val &= 0xfffff; any bit larger than 20 is set to 05. set.add(val) can be used to replace `if (set.contains(val)) do sth else set.add()` because it will return boolean result. A 00C 01G 10T 11**/