Translate DNA sequences to human or formal lenguages

Hi all,
I am Nahuel, a biotechnologist writing from Argentina. I am fascinated by the power of this platform, really guys, I think you are shaping the future here and now.
I have an idea to develop (a project to propose), that I want to share here briefly and get help from whoever wants to be part of the project. I am bad at programming, I am learning, but I need help to code from whoever wants to get involved in the project.
My idea is to search for coherent messages in the human genome. So, I need to code some programs to address the goal in different steps. First (I guess) we would have to create a program capable of finding coherence between the nucleotides of a DNA sequence (eg AACTGGTACC) and the characters of the different alphabets (coherence for formal words). Here comes my contribution from the biological sciences. The way to do it, according to nature, is by a triplet, that is, to code a program to find the best coherent correlation between triplets of nucleotides and human alphabets in relation to that decoding finds in the genome (DNA nucleotide sequnces) the largest number of coherent words possible. I have chosen triplets, because nature works in nucleotide triplets, I mean, triplets of nucleotides codes for an amino acids. Just exist 4 nucleotides (named witha a letter: A, C, T, G) to encode all the genomes. If, for example, the AAC nucleotid triplet codes for amino acid 1, and the ATG nucleotid triplet codes for amino acid 2, then the genomic sequence AACAACATG codes for the protein formed by the amino acid sequence 112. In nature there are only 4 nicleotides (A, C , T, G) and 20 amino acids (really 22 but I do not want to extend here with details), which make up all the diversity of living beings that we know. The combination of the 4 nucleotides in triplets gives (4x4x4 = 64) 64 triplets that code for the 20 amino acids (22) (obviously some are redundant, that is, more than one triplet codes for an amino acid). Thus, it is interesting to note that the human alphabet that most closely resembles this scheme is Hebrew, it already has 22 characters. With which we could start trying to find correlations between DNA sequences and the Hebrew alphabet. I mean, code a program to find the best coherent correlation between the 64 DNA triplets and the characters of the Hebrew alphabet in order to form words when serching human DNA sequences. At least two (I suspect more maybe) programms are needed, one to assign DNA triplets to characters in the alphabet, and another to, once the best triplet/character translation relationship is found, look up words (and perhaps messages) in DNA sequences.

Is there a hidden message in the human genome? Why would there be? I honestly don’t know, but if there is, it would be very interesting. The human genome has 3.2 billion nucleotides, I think we will find something. But it does not make sense to search the entire genome, I can provide short nucleotide sequences to start the project, that have the greatest possible biological relevance (I do not want to expand, but there are many different regions and types of sequences to search, I would choose the most relevant).

The second thing would be to try to translate the genetic code into some formal language (programming or mathematics), since, DNA is a programming language, a sequence that reads a turing machine (the cell) to carry out specific functions. But here I stop because otherwise it becomes very extensive. I hope someone is interested in my proposal, help me and let’s start having fun with this. Alive organism, and DNA nature itself is amazing complex to happen by their own, if we are an alien experiment as many good scientist believe, maybe exist some kind of hide messaje in DNA. Please do not think that I believe we are an alien experimnt (I just do not undestand how life happen), but this is an interesting approach to that idea. A big greeting to all.

7 Likes

Can you please clarify for me as I’m a bit confused.
Do you want to do this because it could make finding the DNA sequences more easy or because you think there is going to be some divine revelation at the end of this?

2 Likes

Hi Felix, thanks for ask about this project. The idea is to find interesting patterns in DNA sequences, either in relation to a formal language (math or programming language) or an auditory language, such as Hebrew. Why Hebrew? because interestingly it has 22 characters, and the genetic code also works with 22 characters (aminoacids), that is, the DNA sequences composed of 4 letters (ACTG) grouped in tripeltes encode the information for each of the 22 amino acids that exist in nature. So, 4x4x4 = 64 there are 64 possible triplets to combine with the 22 amino acids, this is how nature works. Then it would be relatively simple to develop a program that finds a relationship between DNA triplets and characters of the Hebrew alphabet, and then to see if the DNA sequences of the human genome form coherent messages. Why? why not?. We can not assum we will find something, but we can not discart it neither. I hope I have answered your question, any other comments do not hesitate to ask, and if you know how to program and you are interested in the project, I would be helpful if you join it
best
N

hi, thanks for your interest. Do you want to help with this? I hope so, and I hope you know how to code, which is what I really need here.
About your questions, find my input in capital letters

Would it be fair to say that there are, maybe, hundreds of amino acids in nature and that only about 20 are needed to make all the proteins found in the human body (found so far). With 9 of 20 being called essential and having to come from the diet.
OK. NOT RELEVANT REALLY. NATURE NEEDS 22

Why try to link that to human languages?
WHY NOT?
Why Hebrew? Because it has 22 signs?
YES, BUT IT CAN POTENTIALY MATCH OTHER LENGUAGES (EVEN FORMAL)

OK. But where exactly are the 22 amino acids out of the, maybe, hundreds found in nature (debate remains open whether more than 22 can be genetically produced)? Not in humans as we can only make 11.
NATURE NEEDS 22

Why include the stop codons to precisely get to 22? Only 20 are in proteins.
YOU ARE MIXING CONCEPTS

I am no chemist, nor biologist, but something is strange here. Which does not remove the interest to use CODEX in order to match amino acids found in proteins to nucleotides sequences or the other way around. In fact, CODEX can do that pretty well.
ALREADY EXIST REALLY GOOD PROGRAMS WORKING IN THAT DIRECTION, MOREOVER THAT DIRECTION IS BORING TOO MUCH CLASSIC TO BE OF MY INTEREST. HAPPY TO KNOW IF YOU WANT TRY CODEX TO WORK IN THE DIRECTION OF MY IDEA.
BEST
N.