加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Java > 正文

Java简单句子解析器

发布时间:2020-12-15 04:51:58 所属栏目:Java 来源:网络整理
导读:是否有任何简单的方法来创建普通 Java中的句子解析器 没有添加任何libs和jar. 解析器不应该只关注单词之间的空白, 但要更聪明和解析:. ! ?, 识别句子何时结束等 解析后,只有真正的单词可以全部存储在db或文件中,而不是任何特殊的字符. 非常感谢你提前:)
是否有任何简单的方法来创建普通 Java中的句子解析器
没有添加任何libs和jar.

解析器不应该只关注单词之间的空白,
但要更聪明和解析:. ! ?,
识别句子何时结束等

解析后,只有真正的单词可以全部存储在db或文件中,而不是任何特殊的字符.

非常感谢你提前:)

解决方法

您可能想从查看 BreakIterator课程开始.

来自JavaDoc.

The BreakIterator class implements
methods for finding the location of
boundaries in text. Instances of
BreakIterator maintain a current
position and scan over text returning
the index of characters where
boundaries occur. Internally,
BreakIterator scans text using a
CharacterIterator,and is thus able to
scan text held by any object
implementing that protocol. A
StringCharacterIterator is used to
scan String objects passed to setText.

You use the factory methods provided
by this class to create instances of
various types of break iterators. In
particular,use getWordIterator,
getLineIterator,getSentenceIterator,
and getCharacterIterator to create
BreakIterators that perform word,
line,sentence,and character boundary
analysis respectively. A single
BreakIterator can work only on one
unit (word,line,and so
on). You must use a different iterator
for each unit boundary analysis you
wish to perform.

Line boundary analysis determines
where a text string can be broken when
line-wrapping. The mechanism correctly
handles punctuation and hyphenated
words.

Sentence boundary analysis allows
selection with correct interpretation
of periods within numbers and
abbreviations,and trailing
punctuation marks such as quotation
marks and parentheses.

Word boundary analysis is used by
search and replace functions,as well
as within text editing applications
that allow the user to select words
with a double click. Word selection
provides correct interpretation of
punctuation marks within and following
words. Characters that are not part of
a word,such as symbols or punctuation
marks,have word-breaks on both sides.

Character boundary analysis allows
users to interact with characters as
they expect to,for example,when
moving the cursor through a text
string. Character boundary analysis
provides correct navigation of through
character strings,regardless of how
the character is stored. For example,
an accented character might be stored
as a base character and a diacritical
mark. What users consider to be a
character can differ between
languages.

BreakIterator is intended for use with
natural languages only. Do not use
this class to tokenize a programming
language.

见demo:BreakIteratorDemo.java

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读