How is this tool different from other DNA reports?
When a genome is sequenced, usually the data is mapped to a reference genome. For human DNA, for example, the GRCh38 reference is often used. The DNA data is then mapped onto this reference. A lot of DNA read in the sequence normally remains unmapped. This may be because the DNA is not human (i.e. bacterial) or because it just differs so much from the reference that a confident mapping could not be found.
This tool will search the unmapped data that was present in a sample. This unused DNA info can show large mutations that originated after birth, called somatic mutations. Normal DNA analyzers tend to focus on the human genome, that was inherited at birth and does not change during life. (They will look, for example, for the risk of developing cancer later in life.)
This tool will search the sample for the other DNA that was also present. That could be from bacteria, viruses or large mutations that were not part of the original genome. These mutations may be one-off and have developed after birth.
Is this tool free?
Yes, for the time being this tool is free to use. We are still testing at the moment. The results are not guaranteed.
Is privacy guaranteed?
Yes: we do not store any of your genetic data. The DNA upload is analyzed and the data is immediately deleted. We do store the data that is associated with your account (like email or billing history). This can also be deleted on request.
What is a Whole Genome Sequence (WGS)?
A Whole Genome Sequence is when ALL DNA in a sample is sequenced.
To sequence a DNA, usually all real DNA in a sample is chemically split into little pieces. These pieces are sequenced individually and put in a data file. These pieces are put together and matched against a reference. This then constitutes the DNA present in the sample.
Many providers of WGS sequences focus on the human inherited genome. However this, is not the only DNA present in the sample! The sample will also contain other organisms like viruses and bacteria. Also, it will contain somatic mutations.
What is a somatic mutation?
We are all born with a fixed DNA and all cells in our bodies have the same DNA.
However, sometimes this DNA is damaged. This is called a somatic mutation and can happen, for instance, through a virus, radiation, chemicals or even spontaneously. If this happens, often a mutated cell is not working well and it will die. The dead cell will be transported out of the body through the bloodstream, usually. If however the cell does not die, the somatic mutation is passed on to the daughter cells of the mutated cells. Sometimes, but not always, the somatic mutation is a cancer.
How do I get a WGS sequence file?
For this, you need to get your DNA sequenced. Your local clinic may be able to do that and there are many companies offering sequencing services. Make sure you get a Whole Genome Sequence.
A “partial” sequence like the ones done by many ancestry services is not sufficient for this tool, as they will usually not sequence all DNA in a sample.
What sample should I take?
The sample you take determines what DNA you will find. For example:
- A blood sample may contain somatic mutations
- A blood sample will contain bacteria, viruses and fungi DNA
- A stool sample will contain your intestinal biome
- A saliva sample will contain oral bacteria
How does the search algorithm work for classifying reads?
The reads are classified against a library of bacteria, viruses and fungi from NCBI. Each read is as much as possible matched against this library. The algorithm is as efficient as possible and runs in parallel for speed reasons.
Many reads cannot be matched, for instance because they contain quality-errors or because they match more than one organism. (A lot of DNA is actually shared between species) The reads that cannot be matched to one organism are not classified by this tool. Also, any human reads are ignored.
A BAM file typically contains a lot of reads. The tool starts classifying the reads as soon as the upload of the file starts. Usually, after a small percentage of the file has been uploaded, the list of detected organisms does not change much anymore.
How does the search algorithm work for finding somatic mutations?
For this search, first, only the human DNA is searched that has not been classified as human in the BAM file. This can happen, for example, if a read is human but does not match the reference genome enough. A large somatic mutation could be classified as such.
In a lot of cases, the unmapped read can still be matched against a human reference genome. If this can be done, the rest of the gene is then constructed ad-novum from the BAM reads. The gene may turn out to be different from the reference. If it is, this is marked as a mutation. No assumptions are made as to what the mutation may be or could cause.
The ad-novum construction takes a lot of processing power, time and needs all the reads of a file to be uploaded. For this reason, this tool is limited to genes that are listed on this website (mainly known onco-related genes).
