使用iSPARQL使用相似性度量来比较值
我有一个问题问你.
我想编写一个查询来检索给定字符串“Londn”的相似值(给定相似函数,如Lev),以与DBPedia的谓词“RDFS:label”进行比较.例如,在输出中,我想获得“伦敦”的价值. 我可以使用iSPARQL还是有一些SPARQL方法来执行相同的操作? 解决方法
简短版本 – 您可以在纯SPARQL中执行此操作
您可以使用这样的查询来查找名称类似于“Londn”的城市,并按相似性(一个度量)对它们进行排序.其余的答案解释了这是如何工作的: select ?city ?percent where { ?city a dbpedia-owl:City ; rdfs:label ?label . filter langMatches( lang(?label),'en' ) bind( replace( concat( 'x',str(?label) ),"^x[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$",'$1$2$3$4$5' ) as ?match ) bind( xsd:float(strlen(?match))/strlen(str(?label)) as ?percent ) } order by desc(?percent) limit 100 SPARQL results city percent ---------------------------------------------- http://dbpedia.org/resource/London 0.833333 http://dbpedia.org/resource/Bonn 0.75 http://dbpedia.org/resource/Loudi 0.6 http://dbpedia.org/resource/Ladnu 0.6 http://dbpedia.org/resource/Lonar 0.6 http://dbpedia.org/resource/Longnan 0.571429 http://dbpedia.org/resource/Longyan 0.571429 http://dbpedia.org/resource/Luoding 0.571429 http://dbpedia.org/resource/Lodhran 0.571429 http://dbpedia.org/resource/Lom%C3%A9 0.5 http://dbpedia.org/resource/Andong 0.5 计算字符串相似性度量
SPARQL中没有内置计算字符串匹配距离,但您可以使用SPARQL中的正则表达式替换机制来完成其中的一些操作.假设您想在某些字符串中匹配序列“cat”.然后你可以使用这样的查询来计算出“cat”序列中给定字符串的多少: select ?string ?match where { values ?string { "cart" "concatenate" "hat" "pot" "hop" } bind( replace( ?string,"^[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$","$1$2$3" ) as ?match ) } ------------------------- | string | match | ========================= | "cart" | "cat" | | "concatenate" | "cat" | | "hat" | "at" | | "pot" | "t" | | "hop" | "" | ------------------------- 通过检查字符串和匹配的长度,您应该能够计算一些不同的相似性度量.作为使用您提到的“Londn”输入的更复杂的示例.百分比列是与输入匹配的字符串的百分比. select ?input ?string (strlen(?match)/strlen(?string) as ?percent) where { values ?string { "London" "Londn" "London Fog" "Lando" "Land Ho!" "concatenate" "catnap" "hat" "cat" "chat" "chart" "port" "part" } values (?input ?pattern ?replacement) { ("cat" "^[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$" "$1$2$3") ("Londn" "^[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$" "$1$2$3$4$5") } bind( replace( ?string,?pattern,?replacement) as ?match ) } order by ?pattern desc(?percent) -------------------------------------------------------- | input | string | percent | ======================================================== | "Londn" | "Londn" | 1.0 | | "Londn" | "London" | 0.833333333333333333333333 | | "Londn" | "Lando" | 0.6 | | "Londn" | "London Fog" | 0.5 | | "Londn" | "Land Ho!" | 0.375 | | "Londn" | "concatenate" | 0.272727272727272727272727 | | "Londn" | "port" | 0.25 | | "Londn" | "catnap" | 0.166666666666666666666666 | | "Londn" | "cat" | 0.0 | | "Londn" | "chart" | 0.0 | | "Londn" | "chat" | 0.0 | | "Londn" | "hat" | 0.0 | | "Londn" | "part" | 0.0 | | "cat" | "cat" | 1.0 | | "cat" | "chat" | 0.75 | | "cat" | "hat" | 0.666666666666666666666666 | | "cat" | "chart" | 0.6 | | "cat" | "part" | 0.5 | | "cat" | "catnap" | 0.5 | | "cat" | "concatenate" | 0.272727272727272727272727 | | "cat" | "port" | 0.25 | | "cat" | "Lando" | 0.2 | | "cat" | "Land Ho!" | 0.125 | | "cat" | "Londn" | 0.0 | | "cat" | "London" | 0.0 | | "cat" | "London Fog" | 0.0 | -------------------------------------------------------- 更新 上面的代码在Apache Jena中有效,但在Virtuoso中失败,因为模式可以匹配空字符串.例如,如果您在DBpedia的端点(由Virtuoso提供支持)上尝试以下查询,您将收到以下错误: select (replace( "foo",".*","x" ) as ?bar) where {}
这令我感到惊讶,但replace的规范说它基于XPath fn:replace.fn:replace的文档说:
但是,我们可以通过在模式和字符串的开头添加一个字符来解决这个问题: select ?input ?string (strlen(?match)/strlen(?string) as ?percent) where { values ?string { "London" "Londn" "London Fog" "Lando" "Land Ho!" "concatenate" "catnap" "hat" "cat" "chat" "chart" "port" "part" } values (?input ?pattern ?replacement) { ("cat" "^x[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$" "$1$2$3") ("Londn" "^x[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$" "$1$2$3$4$5") } bind( replace( concat('x',?string),?replacement) as ?match ) } order by ?pattern desc(?percent) -------------------------------------------------------- | input | string | percent | ======================================================== | "Londn" | "Londn" | 1.0 | | "Londn" | "London" | 0.833333333333333333333333 | | "Londn" | "Lando" | 0.6 | | "Londn" | "London Fog" | 0.5 | | "Londn" | "Land Ho!" | 0.375 | | "Londn" | "concatenate" | 0.272727272727272727272727 | | "Londn" | "port" | 0.25 | | "Londn" | "catnap" | 0.166666666666666666666666 | | "Londn" | "cat" | 0.0 | | "Londn" | "chart" | 0.0 | | "Londn" | "chat" | 0.0 | | "Londn" | "hat" | 0.0 | | "Londn" | "part" | 0.0 | | "cat" | "cat" | 1.0 | | "cat" | "chat" | 0.75 | | "cat" | "hat" | 0.666666666666666666666666 | | "cat" | "chart" | 0.6 | | "cat" | "part" | 0.5 | | "cat" | "catnap" | 0.5 | | "cat" | "concatenate" | 0.272727272727272727272727 | | "cat" | "port" | 0.25 | | "cat" | "Lando" | 0.2 | | "cat" | "Land Ho!" | 0.125 | | "cat" | "Londn" | 0.0 | | "cat" | "London" | 0.0 | | "cat" | "London Fog" | 0.0 | -------------------------------------------------------- (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |