sql server - Implementation of K-mer/n-gram in SQL -


i want implement k-mer/n-gram algoirthm in sql server. (https://en.wikipedia.org/wiki/n-gram). in databases, have millions of protein sequences , want find k-mers array.

as example; atataggtcgt | k=5 result

1 | atata 2 | tatag 3 | atagg 4 | taggt 5 | aggtc 6 | ggtcg 7 | gtcgt 

thanks attention.

with respect https://en.wikipedia.org/wiki/n-gram k or n variable. user function best solution taking k or n input parameter.

if object_id('dbo.ngram','if') not null   drop function dbo.ngram;  go  create function dbo.ngram(@s nvarchar(max),@ int) returns table   return      value as(        select 2 p,left(@s,@)g         len(@s)>=@         union        select p+1,substring(@s,p,@)from value         len(@s)>p-2+@)     select g value go t as(    select s from(values('atcgaaggtcgt'),('at'))t(s) ) select s,g t outer apply dbo.ngram(s,2) 

i think query works you.


Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -