unix – 如何刷新使用`wget -mirror`创建的在线网站镜像？

发布时间：2020-12-15 18:36:09 所属栏目：安全来源：网络整理

导读：一个月前,我使用“ wget –mirror”创建了我们公共网站的镜像,以便在即将到来的预定维护时段内临时使用.我们的主要网站运行HTML,PHP MySQL,但镜像只需要是HTML,不需要动态内容,PHP或数据库. 以下命令将创建我们网站的简单在线镜像： wget --mirror http://ww

一个月前,我使用“ wget –mirror”创建了我们公共网站的镜像,以便在即将到来的预定维护时段内临时使用.我们的主要网站运行HTML,PHP& MySQL,但镜像只需要是HTML,不需要动态内容,PHP或数据库.

以下命令将创建我们网站的简单在线镜像：

wget --mirror http://www.example.org/

请注意,Wget manual表示–mirror“当前等效于-r -N -l inf –no-remove-listing”(人类可读的等价物是`–recursive –timestamping –level = inf –no -remove上市.

现在已经过了一个月,大部分网站内容都发生了变化.我希望wget检查所有页面,并下载任何已更改的页面.但是,这不起作用.

我的问题：

如果不删除目录并重新运行镜像,我需要做些什么才能刷新网站的镜像？

http://www.example.org/index.html的顶级文件没有更改,但还有许多其他文件已更改.

我认为我需要做的就是重新运行wget –mirror,因为–mirror意味着标志–recursive“指定递归下载”和–timestamping“除非比本地更新,否则不要重新检索文件”.我认为这将检查所有页面,只检索比我的本地副本更新的文件.我错了吗？

但是,wget不会在第二次尝试时递归网站. ‘wget –mirror’将检查http://www.example.org/index.html,注意此页面没有改变,然后停止.

--2010-06-29 10:14:07--  http://www.example.org/
Resolving www.example.org (www.example.org)... 10.10.6.100
Connecting to www.example.org (www.example.org)|10.10.6.100|:80... connected.
HTTP request sent,awaiting response... 200 OK
Length: unspecified [text/html]
Server file no newer than local file "www.example.org/index.html" -- not retrieving.

Loading robots.txt; please ignore errors.
--2010-06-29 10:14:08--  http://www.example.org/robots.txt
Connecting to www.example.org (www.example.org)|10.10.6.100|:80... connected.
HTTP request sent,awaiting response... 200 OK
Length: 136 [text/plain]
Saving to: “www.example.org/robots.txt”

     0K                                                       100% 6.48M=0s
2010-06-29 10:14:08 (6.48 MB/s) - "www.example.org/robots.txt" saved [136/136]

--2010-06-29 10:14:08--  http://www.example.org/news/gallery/image-01.gif
Reusing existing connection to www.example.org:80.
HTTP request sent,awaiting response... 200 OK
Length: 40741 (40K) [image/gif]
Server file no newer than local file "www.example.org/news/gallery/image-01.gif" -- not retrieving.

FINISHED --2010-06-29 10:14:08--
Downloaded: 1 files,136 in 0s (6.48 MB/s)

以下解决方法似乎现在有效.它强行删除/index.html,强制wget再次检查所有子链接.但是,不应该自动检查所有子链接吗？

rm www.example.org/index.html && wget --mirror http://www.example.org/

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!