php – 检查句子是否有相同的单词
tb_content(左)和tb_word(右):
===================================== ================================ |id|sentence |sentence_id|content_id| |id|word|sentence_id|content_id| ===================================== ================================ | 1|sentence1| 0 | 1 | | 1| a | 0 | 1 | | 2|sentence2| 1 | 1 | | 2| b | 0 | 1 | | 3|sentence5| 0 | 2 | | 3| c | 1 | 1 | | 4|sentence6| 1 | 2 | | 4| a | 1 | 1 | | 5|sentence7| 2 | 2 | | 5| e | 1 | 1 | ===================================== | 6| f | 0 | 2 | | 7| g | 1 | 2 | | 8| h | 1 | 2 | | 9| i | 1 | 2 | |10| f | 2 | 2 | |11| h | 2 | 2 | |12| f | 2 | 2 | ================================ 我需要检查每个句子是否由每个content_id中的其他句子所拥有的单词组成. 例如 : 检查content_id = 1他们是sentence1和sentence2.从tb_word,我们可以看到sentence1和sentence2由相同的单词a组成.如果两个句子中的a的数量是> = 2,那么a将是结果.因此,如果我打印结果,它必须是: 首先,我使functionTotal计算每个content_id所拥有的句子数: $total = array(); $sql = mysql_query('select content_id,count(*) as RowAmount from tb_content Group By contente_id') or die(mysql_error()); while ($row = mysql_fetch_array($sql)) { $total[] = $row['RowAmount']; } return $total; 从这个函数我得到$total的值,从中我需要检查一些单词(来自tb_word)在2个句子的所有可能性之间的相似性 foreach ($total as $content_id => $totals){ for ($x=0; $x <= ($totals-1); $x++) { for ($y=0; $y <= ($totals-1); $y++) { $shared = getShared($x,$y); } } getShared的功能是: function getShared ($x,$y){ $token = array(); $shared = array(); $i = 0; if ($x == $y) { $query = mysql_query("SELECT word FROM `tb_word` WHERE sentence_id ='$x' "); while ($row = mysql_fetch_array($query)) { $shared[$i] = $row['word']; $i++; } } else { $query = mysql_query("SELECT word,count(word) as jml FROM `tb_word` WHERE sentence_id ='$x' OR sentence_id ='$y' GROUP BY word "); while ($row = mysql_fetch_array($query)) { $jml = $row['jml']; $token[$i] = $row['word']; if ($jml >= 2) { $shared[$i] = $token[$i]; } $i++; } 但我得到的结果仍然是错误的.结果仍然在不同的content_id之间混合.结果必须也是由content_id分组.抱歉我的英语不好,我的解释也不好. cmiiw,请帮帮我..谢谢:) 解决方法
这个实际上可以由DBMS本身完成,在一个查询中有两个步骤.首先,您进行自我加入以准备相同内容中的句子组合:
SELECT a.content_id,a.sentence_id AS sentence_id_1,b.sentence_id AS sentence_id_2 FROM tb_content AS a JOIN tb_content AS b ON ( a.content_id = b.content_id AND a.sentence_id <= b.sentence_id ) “< =”将保持相同的句子连接,如“1-1”或“2-2”,但避免双向重复,如“1-2”和“2-1”.接下来,您可以使用单词加入上述结果并计算出现次数.像那样: SELECT s.content_id,s.sentence_id_1,s.sentence_id_2,c.word,Count(*) AS jml FROM (SELECT a.content_id,b.sentence_id AS sentence_id_2 FROM tb_content AS a JOIN tb_content AS b ON ( a.content_id = b.content_id AND a.sentence_id <= b.sentence_id )) AS s JOIN tb_word AS c ON ( s.content_id = c.content_id AND ( c.sentence_id = s.sentence_id_1 OR c.sentence_id = s.sentence_id_2 ) ) GROUP BY s.content_id,c.word HAVING Count(*) >= 2; 上述查询的结果将为您提供容器,句子1和2,单词和出现次数(2或更多).您现在需要的只是将结果收集到数组中,正如我所知道的那样. 如果我错过了你的目标,请告诉我. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |