sql server - Implementation of K-mer/n-gram in SQL -
i want implement k-mer/n-gram algoirthm in sql server. (https://en.wikipedia.org/wiki/n-gram). in databases, have millions of protein sequences , want find k-mers array.
as example; atataggtcgt | k=5
result
1 | atata 2 | tatag 3 | atagg 4 | taggt 5 | aggtc 6 | ggtcg 7 | gtcgt
thanks attention.
with respect https://en.wikipedia.org/wiki/n-gram k or n variable. user function best solution taking k or n input parameter.
if object_id('dbo.ngram','if') not null drop function dbo.ngram; go create function dbo.ngram(@s nvarchar(max),@ int) returns table return value as( select 2 p,left(@s,@)g len(@s)>=@ union select p+1,substring(@s,p,@)from value len(@s)>p-2+@) select g value go t as( select s from(values('atcgaaggtcgt'),('at'))t(s) ) select s,g t outer apply dbo.ngram(s,2)
i think query works you.
Comments
Post a Comment