1. BWT and SA
Given a string S, a sorted suffix array SA,
BWT[i] = S[SA[i] - 1]
2. Backward search
2-1. Rank Function
rank() sometimes is named with occurrence()
rank() counts the number of an alphabet up to a certain position.
In practice, we uses sampled cumulative alphabet counters.
Given a pattern P, a global alphabet counter C, a low suffix array index l, a high suffix array index h,
for (i = len(P) - 1; h > l && i >= 0; --i) {
c = P[i]
l = C[c] + rank(l - 1, c) + 1
h = C[c] + rank(h, c)
}
l, h usually implemented as a class named with Interval in practical applications
struct Interval {
int64_t low;
int64_t high;
}
2-2. Select Function
select() sometimes is named with locate(), or psi()
select() gives the location of a pattern in the original string S.
The value of suffix array is the position of the pattern in S.
Given a suffix array SA, a low suffix array index l, a high suffix array index h,
for (suffix_array_index = l; suffix_array_index <= h; ++suffix_array_index) {
print(SA[suffix_array_index])
}
To get SA[suffix_array_index] from a BWT we need an algorithm as follows.
get_SA(suffix_array_index, BWT) {
suffix_array_index = -1;
do {
c = BWT[suffix_array_index];
suffix_array_index = C[c] + rank(suffix_array_index, c);
++index_in_S;
} while(suffix_array_index > 0);
return index_in_S;
}
댓글