加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Java > 正文

Nutch:用Java调用,而不是命令行?

发布时间:2020-12-15 00:43:55 所属栏目:Java 来源:网络整理
导读:我是否很厚或者是否真的无法以编程方式通过某些 Java代码调用Apache Nutch?关于如何执行此操作的文档(或指南或教程)在哪里?谷歌让我失望了.所以我实际上尝试了Bing. (是的,我知道,可悲.)想法?提前致谢. (另外,如果Nutch是一个废话,那么用Java编写的任何其
我是否很厚或者是否真的无法以编程方式通过某些 Java代码调用Apache Nutch?关于如何执行此操作的文档(或指南或教程)在哪里?谷歌让我失望了.所以我实际上尝试了Bing. (是的,我知道,可悲.)想法?提前致谢.

(另外,如果Nutch是一个废话,那么用Java编写的任何其他爬行器在互联网规模上都可以用实际文档证明是可靠的吗?)

解决方法

如果您查看bin / nutch脚本,您将看到它调用与您的命令对应的Java类:
# figure out which class to run
if [ "$COMMAND" = "crawl" ] ; then
  CLASS=org.apache.nutch.crawl.Crawl
elif [ "$COMMAND" = "inject" ] ; then
  CLASS=org.apache.nutch.crawl.Injector
elif [ "$COMMAND" = "generate" ] ; then
  CLASS=org.apache.nutch.crawl.Generator
elif [ "$COMMAND" = "freegen" ] ; then
  CLASS=org.apache.nutch.tools.FreeGenerator
elif [ "$COMMAND" = "fetch" ] ; then
  CLASS=org.apache.nutch.fetcher.Fetcher
elif [ "$COMMAND" = "fetch2" ] ; then
  CLASS=org.apache.nutch.fetcher.Fetcher2
elif [ "$COMMAND" = "parse" ] ; then
  CLASS=org.apache.nutch.parse.ParseSegment
elif [ "$COMMAND" = "readdb" ] ; then
  CLASS=org.apache.nutch.crawl.CrawlDbReader
elif [ "$COMMAND" = "convdb" ] ; then
  CLASS=org.apache.nutch.tools.compat.CrawlDbConverter
elif [ "$COMMAND" = "mergedb" ] ; then
  CLASS=org.apache.nutch.crawl.CrawlDbMerger
elif [ "$COMMAND" = "readlinkdb" ] ; then
  CLASS=org.apache.nutch.crawl.LinkDbReader
elif [ "$COMMAND" = "readseg" ] ; then
  CLASS=org.apache.nutch.segment.SegmentReader
elif [ "$COMMAND" = "segread" ] ; then
  echo "[DEPRECATED] Command 'segread' is deprecated,use 'readseg' instead."
  CLASS=org.apache.nutch.segment.SegmentReader
elif [ "$COMMAND" = "mergesegs" ] ; then
  CLASS=org.apache.nutch.segment.SegmentMerger
elif [ "$COMMAND" = "updatedb" ] ; then
  CLASS=org.apache.nutch.crawl.CrawlDb
elif [ "$COMMAND" = "invertlinks" ] ; then
  CLASS=org.apache.nutch.crawl.LinkDb
elif [ "$COMMAND" = "mergelinkdb" ] ; then
  CLASS=org.apache.nutch.crawl.LinkDbMerger
elif [ "$COMMAND" = "index" ] ; then
  CLASS=org.apache.nutch.indexer.Indexer
elif [ "$COMMAND" = "solrindex" ] ; then
  CLASS=org.apache.nutch.indexer.solr.SolrIndexer
elif [ "$COMMAND" = "dedup" ] ; then
  CLASS=org.apache.nutch.indexer.DeleteDuplicates
elif [ "$COMMAND" = "solrdedup" ] ; then
  CLASS=org.apache.nutch.indexer.solr.SolrDeleteDuplicates
elif [ "$COMMAND" = "merge" ] ; then
  CLASS=org.apache.nutch.indexer.IndexMerger
elif [ "$COMMAND" = "plugin" ] ; then
  CLASS=org.apache.nutch.plugin.PluginRepository
elif [ "$COMMAND" = "server" ] ; then
  CLASS='org.apache.nutch.searcher.DistributedSearch$Server'
else
  CLASS=$COMMAND
fi

# run it
exec "$JAVA" $JAVA_HEAP_MAX $NUTCH_OPTS -classpath "$CLASSPATH" $CLASS "$@"

从那以后,只有查看API docs以及必要时这些类的源代码的问题.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读