a simple R programs for short seq assmebling

Given sequence S = { ATC, CCA, CAG, TCC, AGT }, use R to perform overlap assemble( greedy approach)  of the given sequences. We ca nuse R to approach this problems:

pseudocode for Greedy approach (suboptimal solution)

Define overlap ( si, sj ) as the length of the longest prefix of sj that matches a suffix of si.

1. Calculate pairwise overlap of strings
2. Merge a pair with maximum overlap
3. Repeat 1. – 3. until there is only one string

R codes:

#///////////////////////////////////////////////////////////////////#
#//  Overlap alignment for SSP                                      #
#//                                                                 #
#// Pseudocode for Greedy approach (suboptimal solution):           #
#//        Input sequences S={s1,s2,...,sn}                         #
#//        while(S!={})                                             #
#//            Calculte pairwise overlap of strings                 #
#//            Merge a pair with maximum overlap                    #
#//        endwhile                                                 #
#//                                                                 #
#// Program author: Yupeng Cun                                      #
#// Create data:  20,Apr.,2011                                      #
#// Latest update:                                                  #
#///////////////////////////////////////////////////////////////////#

##Calculte pairwise overlap of strings
overlap <- function(a,b)
{
a = toupper(a) ;
b = toupper(b)
aa = unlist(strsplit(a,split=””))
bb = unlist(strsplit(b,split=””))

n=length(aa)
m=length(bb)
maxOverlap=0

for (i in 1: m)
{
statA= n-m+i
posA = substr(a, statA,n)
preB = substr(b, 1, m-i+1)
if(preB==posA )   {   maxOverlap=m-i+1;    break      }
}
maxOverlap
return (maxOverlap)
}

#Calculte the maximum overlap
maxOverlap <- function(aligned, seqs)
{
n=length(seqs)

if(n==1)
{
tMax=overlap(aligned, seqs[1])
tMax.index=1
}
else
{
tMax=overlap(aligned, seqs[1])
tMax.index=1

for(i in 2:n)
{
temp=overlap(aligned,seqs[i])
temp.index=i
if(temp> tMax)
{
tMax = temp
tMax.index = temp.index
}
}
}
tMax
res= list(oMax=tMax,oMax.index=tMax.index)
return (res)
}

#Merge a pair with maximum overlap
merge <- function(a, s)
{
oMax=overlap(a,s)
ss=unlist(strsplit(s,split=””))
m=length(ss)
addString = substr(s,oMax+1,m)
a=paste(a, addString,sep=””)
return (a)
}

# Overlap alignment for SSP
assemble <- function(seqs)
{
#seqs<- c(“ATC”,”CCA”,”TCC”,”CAG”,”AGT”)
#assemble the sequence segments,seqs, to one sequence.
seqs = toupper(seqs)
Strs=  seqs
nSeqs = length(seqs)
if(nSeqs<2)    stop(“the string in seqs must be more than two!\n”)

asemSeqs = seqs[1] # put the first string to the alliagined sequence
seqs = seqs[-1]

i=1
nSeqs = nSeqs-1
for (i in 1 : nSeqs)
{
cat(” The “,i,”-th step of Greedy overlap alignment:\n “, asemSeqs,”\n”)

oM=maxOverlap(asemSeqs,seqs)
oMax=oM$oMax
oMax.index=oM$oMax.index
cat(“\naligned: “,asemSeqs,” -> “,seqs[oMax.index],”\t”, “maxoverlap is: “,oMax, “\n”)

oMaxString = seqs[oMax.index]
asemSeqs = merge(asemSeqs, oMaxString)

seqs = seqs[-oMax.index]
if(is.null(seqs))break
}

cat(“\n the Greedy overlap alignment for SSP for {“,Strs,” }is:\n”)
#cat(“\n\n”,asemSeqs,”\n”)
return (asemSeqs)
}

#### Rung the programs

seqs<- c(“ATC”,”CCA”,”TCC”,”CAG”,”AGT”)
assemble(seqs)

the results is :

> assemble(seqs)
The  1 -th step of Greedy overlap alignment:
ATC

aligned:  ATC  ->  TCC      maxoverlap is:  2
The  2 -th step of Greedy overlap alignment:
ATCC

aligned:  ATCC  ->  CCA      maxoverlap is:  2
The  3 -th step of Greedy overlap alignment:
ATCCA

aligned:  ATCCA  ->  CAG      maxoverlap is:  2
The  4 -th step of Greedy overlap alignment:
ATCCAG

aligned:  ATCCAG  ->  AGT      maxoverlap is:  2

the Greedy overlap alignment for SSP for { ATC CCA TCC CAG AGT  }is:
[1] “ATCCAGT”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close