C代写|C++代写|算法代写 – ECS30: Homework #6

C代写|C++代写|算法代写: 这是一个通过结合DNA相关知识考察C语言字符串处理、文件处理相关知识的作业
1 Changelog
2 General information
3 Programming exercise
1 Changelog
2 General information
3 Programming exercise
A working executable for the following problem is available on my CSIF account, in directory
/home/jporquet/ecs30/hw6.
You will also find the necessary genetic code (codeoflife.txt) in the same directory.
3.1 DNA sequencer
The biology department at UC Davis is looking for an application that can decode sequences of DNA,
by locating genes and transcribing the sequence of corresponding proteins.
Genes are substrings of DNA which code for proteins and carry the heritable information from our
parents. Genes start with the sequence of three letters ATG, called the start codon, and end with one of
the three sequences TGA, TAA, or TAG, called stop codons. The stretch of sequence between the start
codon and any of the stop codons is a potential gene.
Each codon codes for an amino acid represented by a letter of the alphabet. There is a total of 19 amino
acids. Strung together, amino acids from proteins. A substring of a DNA sequence is a translatable
sequence if:
it has a length that is multiple of three,
it starts with a start codon and ends with a stop codon
it can be translated into an amino acid sequence
For example, DNA sequence AATTAAGATGGGGCTCTAAAAT contains such a translatable sequence,
starting at the 8th position and of length 12 (ATGGGGCTCTAA), thus consisting of 4 codons. This
sequence can be translated using a codon table into the length three amino acid sequence MGL.
Note that the start codon codes for amino acid M while the stop codons don’t code for any amino acids.
On the other hand, DNA sequence AATGAATCTAGT is not a translatable sequence.
Write program dna_translate.c that takes two command line arguments: an input file name,
containing DNA sequences, and an output file name, in which you will store the translated, protein
sequences. For each sequence, the program should identify the longest possible translatable subsequence,
if one exists, and translate it into a protein using a codon table given in the file
codeoflife.txt. See example below.
$ cat codeoflife.txt
I ATT
I ATC
I ATA

R CGT
x TAA
x TAG
x TGA
$ cat dna_seqs.txt
aaATttaTggattagcaagcag
ACGATGATGATGGGGCCCTAATAGTGATAAAAAACT
AAAATAATTTGGA
ATGAAATGGTAGATGAAACCCGGGATATGATAG
$ ./dna_translate dna_seqs.txt prot_seqs.txt
$ cat prot_seqs.txt
MD
MMMGP
none
MKPGI
$
Here are a list of requirements, assumptions and hints:
This program shall contain no global variables.
All the dynamically allocated memory should be properly freed by the terminated by the end of the
program.
The translated sequences, in the output file, must be in the same order as the DNA sequences.
If no translatable sequence is found, none should be outputted.
We assume that the maximum number of characters a DNA sequence can contain is
256.
We assume that the DNA sequence file contain only proper sequences ( i.e. strings over {A, C, G,
T, a, c, g, t}).
You are expected to use a linked-list to represent the codon table (as read from file
codeoflife.txt).
You are expected to use a linked-list to represent the list of DNA sequences (as read from the input
file).
You will probably need to split the problem into a few principal functions, such as:
A function that builds the linked-list of codons, as read from codeoflife.txt.
A function that builds the linked-list of DNA sequences, as read from the input file.
You will probably need to think of the order of insertion, in order to keep the same order
when outputting the resulting sequences of proteins.
A function that iterates through all the DNA sequences, and for each, finds the longest
translatable sequence from each and outputs the corresponding sequence of proteins in the
output file (or none if no translatable sequence was found).
Two functions that iterate through the two linked-lists and free every dynamically allocated
items and any dynamically allocated objects they might contain.
List of some important libc functions that are used in the reference program: fopen(), fgets(),
fprintf(), fclose(), sscanf(), strncpy(), strncmp(), etc.

发表评论

电子邮件地址不会被公开。 必填项已用*标注