| LCS solution : concept Given two strings, A and B : • LEN is the potential substring length, starting from the length of B and decrementing to 1 • For each value of LEN, try all possible starting points B_STRT • Test whether the substring of B starting at B_STRT for length LEN is within A • Stop at first substring found • Return A starting point, B starting point and substring length LEN B-STRT B = "SQL" 3 1 "SQL" 2 1 "SQ" 2 2 "QL" 1 1 "S" 1 2 "Q" 1 3 "L"
| SQL implementation: imperfections • Assumes the lateral clauses generate rows in order of decreasing length • Compare to the reply "Another recursive variant", where Iudith Mentzel specifically orders the rows before returning the first one. • Starts with length of B, even when A is shorter.
| Was SQL the right choice? For • Can be done using hierarchical queries • Uses "fast dual" so no additional data access Against • Assumes "procedural" implementation: process in order, stop at first hit. • It's just a calculation with no data access other than the input strings • How will it work as part of the Pattern Recognition solution?
| Pattern Recognition: concept 1. Get lcs of A and B: starting points, length of lcs 2. Add lcs length of left sides & right sides; 3. Continue recursively until no match; 4. Multiply cumulative length by 2, divide by sum of lengths of A and B. A_STRT B_STRT LEN cumul level 1 level 2 level 3 4 4 2 2 OU aFbOUcRd iFjOUkRl A_STRT B_STRT LEN cumul level 1 level 2 level 3 4 4 2 2 OU aFbOUcRd iFjOUkRl 2 2 1 3 F aFb iFj A_STRT B_STRT LEN cumul level 1 level 2 level 3 4 4 2 2 OU aFbOUcRd iFjOUkRl 2 2 1 3 F aFb iFj 0 3 a i 0 3 b j A_STRT B_STRT LEN cumul level 1 level 2 level 3 4 4 2 2 OU aFbOUcRd iFjOUkRl 2 2 1 3 F aFb iFj 0 3 a i 0 3 b j 2 2 1 4 R cRd kRl A_STRT B_STRT LEN cumul level 1 level 2 level 3 4 4 2 2 OU aFbOUcRd iFjOUkRl 2 2 1 3 F aFb iFj 0 3 a i 0 3 b j 2 2 1 4 R cRd kRl 0 4 c k 0 4 d l
| b) Gestalt solution in PL/SQL, lcs uses SQL create or replace function gestalt( l in varchar2, r in varchar2 ) return number is function lcs ( l in varchar2, r in varchar2 ) return number is ll varchar2(4000); rl varchar2(4000); lr varchar2(4000); rr varchar2(4000); begin for rec in (<SELECT lcs data>) loop ll := substr(l, 1, rec.l_strt - 1); rl := substr(r, 1, rec.r_strt - 1); lr := substr(l, rec.l_strt + rec.len); rr := substr(r, rec.r_strt + rec.len); return rec.len + lcs(ll, rl) + lcs(lr, rr); end loop; return 0; end lcs; begin return 2 * lcs(l, r) / ( length(l) + length(r) ); end gestalt; FOR loop not correct here, use SELECT INTO as in Iudith Mentzel's reply
| c) Gestalt solution in PL/SQL, including for lcs function lcs ( l in varchar2, r in varchar2 ) return number is l_start pls_integer := 0; r_start pls_integer := 0; lr_length pls_integer := least(length(l),length(r)); ll varchar2(128); rl varchar2(128); lr varchar2(128); rr varchar2(128); begin if l is null or r is null then return 0; end if; <<len_loop>> loop l_start := 1; loop r_start:=instr(r,substr(l,l_start,lr_length)); exit len_loop when r_start > 0; exit when l_start = length(l)-lr_length+1; l_start := l_start + 1; end loop; exit when lr_length = 1; lr_length := lr_length - 1; end loop; if r_start = 0 then return 0; else <snipped> return lr_length + lcs(ll, rl) + lcs(lr, rr); end if; end lcs;
| Solution c) advantages • No row generation, no fake data access (DUAL) • Starting length is lesser of two string lengths • Procedural order: • Test lengths from longest to shortest • Test starting point from the left • Stop at first find • Minimal context switches • Result: performance
| Further Reading Ask TOM Question https://asktom.oracle.com/pls/apex/asktom.search? tag=longest-common-substring-ratcliff-obershelp- similarity-algorithm On Cursors, Context Switches, and Mistakes https://blogs.oracle.com/oraclemagazine/on- cursors-context-switches-and-mistakes