Repeated DNA Sequences

Problem

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Solution

public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        Set<String> seen = new HashSet<>(), res = new HashSet<>();
        for (int i = 0; i < s.length() - 9; i++) {
            String dna = s.substring(i, i + 10); //Need to be i + 10 here because the last index one won't be added  
            if (!seen.add(dna)) res.add(dna);
        }
        return new ArrayList<>(res);
    }
}

Analysis

Pretty straightforward solution using two HashSet
The first set is to check the duplicated, the second set is to guarantee there is no duplicates in our result
At the end, convert set to list and return it

results matching ""

    No results matching ""