来自两个以上字符串的最长公共子字符串 – C.
我需要从C中的一组文件名计算最长的公共子串.
确切地说,我有一个std :: std :: strings的列表(或QT等价,也很好) char const *x[] = {"FirstFileWord.xls","SecondFileBlue.xls","ThirdFileWhite.xls","ForthFileGreen.xls"}; std::list<std::string> files(x,x + sizeof(x) / sizeof(*x)); 我需要计算所有字符串的n个不同的最长公共子串,在这种情况下,例如对于n = 2 "File" and ".xls" 如果我可以计算最长的公共子序列,我可以将其删除并再次运行算法以获得第二长的,所以基本上归结为: 是否有一个(引用?)实现来计算std :: list的std :: strings的LCS? 这不是一个好的答案,但是我有一个肮脏的解决方案 – 在QUrls的QList上蛮力,只从中获取最后一个“/”之后的部分.我想用“正确的”代码替换它. (我发现了http://www.icir.org/christian/libstree/ – 这会有很大的帮助,但是我无法在我的机器上编译它.有人可能会使用它吗?) QString SubstringMatching::getMatchPattern(QList<QUrl> urls) { QString a; int foundPosition = -1; int foundLength = -1; for (int i=urls.first().toString().lastIndexOf("/")+1; i<urls.first().toString().length(); i++) { bool hit=true; int xj; for (int j=0; j<urls.first().toString().length()-i+1; j++ ) // try to match from position i up to the end of the string :: test character at pos. (i+j) { if (!hit) break; QString firstString = urls.first().toString().right( urls.first().toString().length()-i ).left( j ); // this needs to match all k strings //qDebug() << "SEARCH " << firstString; for (int k=1; k<urls.length(); k++) // test all other strings,k = test string number { if (!hit) break; //qDebug() << " IN " << urls.at(k).toString().right(urls.at(k).toString().length() - urls.at(k).toString().lastIndexOf("/")+1); //qDebug() << " RES " << urls.at(k).toString().indexOf(firstString,urls.at(k).toString().lastIndexOf("/")+1); if (urls.at(k).toString().indexOf(firstString,urls.at(k).toString().lastIndexOf("/")+1)<0) { xj = j; //qDebug() << "HIT LENGTH " << xj-1 << " : " << firstString; hit = false; } } } if (hit) xj = urls.first().toString().length()-i+1; // hit up to the end of the string if ((xj-2)>foundLength) // have longer match than existing,j=1 is match length { foundPosition = i; // at the current position foundLength = xj-1; //qDebug() << "Found at " << i << " length " << foundLength; } } a = urls.first().toString().right( urls.first().toString().length()-foundPosition ).left( foundLength ); //qDebug() << a; return a; } 解决方法
如果你说后缀树太重量级或其他不切实际,以下
相当简单的蛮力方法可能适合您的应用. 我假设不同的子串应该是非重叠的并从中挑选出来 即使有这些假设,也不需要包含一个独特的集合 算法如下: > Q是目标长度配额. >如果ss不是字符串的常见子字符串,则接下来. >返回结果. 这是一个实现解决方案并用它来演示的程序 #include <list> #include <map> #include <string> #include <iostream> #include <algorithm> using namespace std; // Get a non-const iterator to the shortest string in a list list<string>::iterator shortest_of(list<string> & strings) { auto where = strings.end(); size_t min_len = size_t(-1); for (auto i = strings.begin(); i != strings.end(); ++i) { if (i->size() < min_len) { where = i; min_len = i->size(); } } return where; } // Say whether a string is a common substring of a list of strings bool is_common_substring_of( string const & candidate,list<string> const & strings) { for (string const & s : strings) { if (s.find(candidate) == string::npos) { return false; } } return true; } /* Get a multimap whose keys are the at-most `quota` greatest lengths of common substrings of the list of strings `strings`,each key multi-mapped to the set of common substrings of that length. */ multimap<size_t,string> n_longest_common_substring_sets(list<string> & strings,unsigned quota) { size_t nlengths = 0; multimap<size_t,string> results; if (quota == 0) { return results; } auto shortest_i = shortest_of(strings); if (shortest_i == strings.end()) { return results; } string shortest = *shortest_i; strings.erase(shortest_i); for ( size_t start = 0; start < shortest.size();) { size_t skip = 1; for (size_t len = shortest.size(); len > 0; --len) { string subs = shortest.substr(start,len); if (!is_common_substring_of(subs,strings)) { continue; } auto i = results.lower_bound(subs.size()); for ( ;i != results.end() && i->second.find(subs) == string::npos; ++i) {} if (i != results.end()) { continue; } for (i = results.begin(); i != results.end() && i->first < subs.size(); ) { if (subs.find(i->second) != string::npos) { i = results.erase(i); } else { ++i; } } auto hint = results.lower_bound(subs.size()); bool new_len = hint == results.end() || hint->first != subs.size(); if (new_len && nlengths == quota) { size_t min_len = results.begin()->first; if (min_len > subs.size()) { continue; } results.erase(min_len); --nlengths; } nlengths += new_len; results.emplace_hint(hint,subs.size(),subs); len = 1; skip = subs.size(); } start += skip; } return results; } // Testing ... int main() { list<string> strings{ "OfBitWordFirstFileWordZ.xls","SecondZWordBitWordOfFileBlue.xls","ThirdFileZBitWordWhiteOfWord.xls","WordFourthWordFileBitGreenZOf.xls"}; auto results = n_longest_common_substring_sets(strings,4); for (auto const & val : results) { cout << "length: " << val.first << ",substring: " << val.second << endl; } return 0; } 输出: length: 1,substring: Z length: 2,substring: Of length: 3,substring: Bit length: 4,substring: .xls length: 4,substring: File length: 4,substring: Word (用gcc 4.8.1构建) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |