python - Match nucleotide position to sequence from fasta file -
i have list of positions:
chr1 1000 chr2 2000 chr3 4000 and able transform position in nucleotide sequence giving custom fasta file. such as:
chr1 1000 chr2 2000 t chr3 4000 g is there written tool in python can job?
given fasta file chromosomes.fasta:
>chr1 gattaca >chr2 attacga >chr3 gccaacg and positions file positions.txt:
chr1 3 chr2 4 chr3 5 you can use following code:
from bio import seqio record_dict = seqio.to_dict(seqio.parse('chromosomes.fasta', "fasta")) chromosome_positions = {} open('positions.txt') f: line in f.read().splitlines(): if line: chromosome, position = line.split() chromosome_positions[chromosome] = int(position) chromosome in chromosome_positions: seq = record_dict[chromosome] position = chromosome_positions[chromosome] base = seq[position] print chromosome, position, base which output:
chr3 5 c chr2 4 c chr1 3 t note python uses zero-based indexing, position 5 in positions.txt give sixth base in corresponding sequence.
Comments
Post a Comment