加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Java > 正文

java – 打开文件太多(Selenium PhantomJSDriver)

发布时间:2020-12-14 16:47:02 所属栏目:Java 来源:网络整理
导读:在我的嵌入式Selenium / PhantomJSDriver驱动程序中似乎资源没有被清理.同步运行客户端会导致数百万个打开的文件,并最终会导致“打开太多文件”类型异常. 在程序运行约1分钟的时候,我从lsof收集了一些输出 $lsof | awk '{ print $2; }' | uniq -c | sort -rn
在我的嵌入式Selenium / PhantomJSDriver驱动程序中似乎资源没有被清理.同步运行客户端会导致数百万个打开的文件,并最终会导致“打开太多文件”类型异常.

在程序运行约1分钟的时候,我从lsof收集了一些输出

$lsof | awk '{ print $2; }' | uniq -c | sort -rn | head
    1221966 12180
      34790 29773
      31260 12138
      20955 8414
      17940 10343
      16665 32332
       9512 27713
       7275 19226
       5496 7153
       5040 14065

$lsof -p 12180 | awk '{ print $2; }' | uniq -c | sort -rn | head
    2859 12180
       1 PID

$lsof -p 12180 -Fn | sort -rn | uniq -c | sort -rn | head
    1124 npipe
     536 nanon_inode
       4 nsocket
       3 n/opt/jdk/jdk1.8.0_60/jre/lib/jce.jar
       3 n/opt/jdk/jdk1.8.0_60/jre/lib/charsets.jar
       3 n/dev/urandom
       3 n/dev/random
       3 n/dev/pts/20
       2 n/usr/share/sbt-launcher-packaging/bin/sbt-launch.jar
       2 n/usr/share/java/jayatana.jar

我不明白为什么在lsof上使用-p标志的结果集较小.但是,似乎大多数条目都是管道和anon_inode.

客户端在?100行非常简单,在使用结束时调用driver.close()和driver.quit().我尝试缓存和重用客户端,但它并没有减轻打开的文件

case class HeadlessClient(
                           country: String,userAgent: String,inheritSessionId: Option[Int] = None
                         ) {
  protected var numberOfRequests: Int = 0
  protected val proxySessionId: Int = inheritSessionId.getOrElse(new Random().nextInt(Integer.MAX_VALUE))
  protected val address = InetAddress.getByName("proxy.domain.com")
  protected val host = address.getHostAddress
  protected val login: String = HeadlessClient.username + proxySessionId
  protected val windowSize = new org.openqa.selenium.Dimension(375,667)

  protected val (mobProxy,seleniumProxy) = {

    val proxy = new BrowserMobProxyServer()
    proxy.setTrustAllServers(true)
    proxy.setChainedProxy(new InetSocketAddress(host,HeadlessClient.port))
    proxy.chainedProxyAuthorization(login,HeadlessClient.password,AuthType.BASIC)
    proxy.addLastHttpFilterFactory(new HttpFiltersSourceAdapter() {
      override def filterRequest(originalRequest: HttpRequest): HttpFilters = {
        new HttpFiltersAdapter(originalRequest) {
          override def proxyToServerRequest(httpObject: HttpObject): io.netty.handler.codec.http.HttpResponse = {
            httpObject match {
              case req: HttpRequest => req.headers().remove(HttpHeaders.Names.VIA)
              case _ =>
            }
            null
          }
        }
      }
    })
    proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT,CaptureType.RESPONSE_CONTENT)
    proxy.start(0)
    val seleniumProxy = ClientUtil.createSeleniumProxy(proxy)
    (proxy,seleniumProxy)
  }

  protected val driver: PhantomJSDriver = {
    val capabilities: DesiredCapabilities = DesiredCapabilities.chrome()
    val cliArgsCap = new util.ArrayList[String]
    cliArgsCap.add("--webdriver-loglevel=NONE")
    cliArgsCap.add("--ignore-ssl-errors=yes")
    cliArgsCap.add("--load-images=no")

    capabilities.setCapability(CapabilityType.PROXY,seleniumProxy)
    capabilities.setCapability("phantomjs.page.customHeaders.Referer","")
    capabilities.setCapability("phantomjs.page.settings.userAgent",userAgent)
    capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS,cliArgsCap)

    new PhantomJSDriver(capabilities)
  }

  driver.executePhantomJS(
    """
      |var navigation = [];
      |
      |this.onNavigationRequested = function(url,type,willNavigate,main) {
      |  navigation.push(url)
      |  console.log('Trying to navigate to: ' + url);
      |}
      |
      |this.onResourceRequested = function(request,net) {
      |    console.log("Requesting " + request.url);
      |    if (! (navigation.indexOf(request.url) > -1)) {
      |        console.log("Aborting " + request.url)
      |        net.abort();
      |    }
      |};
    """.stripMargin
  )

  driver.manage().window().setSize(windowSize)

  def follow(url: String)(implicit ec: ExecutionContext): List[HarEntry] = {
    try{
      Await.result(Future{
        mobProxy.newHar(url)
        driver.get(url)
        val entries = mobProxy.getHar.getLog.getEntries.asScala.toList
        shutdown()
        entries
      },45.seconds)
    } catch {
      case e: Exception =>
        try {
          shutdown()
        } catch {
          case shutdown: Exception =>
            throw new Exception(s"Error ${shutdown.getMessage} cleaning up after Exception: ${e.getMessage}")
        }

        throw e
    }
  }

  def shutdown() = {
    driver.close()
    driver.quit()
  }
}

我尝试了几个版本的Selenium,以防万一有bugfix. build.sbt:

libraryDependencies += "org.seleniumhq.selenium" % "selenium-java"   % "3.0.1"
libraryDependencies += "net.lightbody.bmp" % "browsermob-core" % "2.1.2"

另外,我试过PhantomJS 2.0.1和2.1.1:

$phantomjs --version
  2.0.1-development

$phantomjs --version
  2.1.1

这是PhantomJS还是硒问题?我的客户端是否使用API??不当?

解决方法

资源使用是由BrowserMob造成的.要关闭代理并清理资源,必须调用stop().

对于这个客户端,这意味着修改关机方法

def shutdown() = {
  mobProxy.stop()
  driver.close()
  driver.quit()
}

另一个中止方法就是立即终止代理服务器,不等待流量停止.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读