简介

ArchiveBox 是一个用Python编写的自托管且功能强大的互联网存档解决方案,是可用于Linux、macOS和Windows系统的跨平台工具。

它使您能够收集、保存和查看要脱机保存的站点,当前ArchiveBox可以设置为命令行工具、桌面应用程序或通过web访问,可以把你想静态化的任何网站进行静态化,包括文本、图片、PDF 甚至视频。

Github地址:https://github.com/ArchiveBox/ArchiveBox/

官方网站:https://archivebox.io/

前期准备

由于pip命令无法使用root权限运行,需要添加一个普通带sudo权限的账号:

1
adduser archivebox && usermod -a archivebox -G sudo && su archivebox

安装

一键安装

1
curl -sSL 'https://get.archivebox.io' | sh

手动安装

这边以Ubuntu为例,其他系统可以参考:官方手动安装文档,更好的方式还是Docker。

安装依赖

1
2
sudo apt install python3 python3-pip python3-distutils git wget curl youtube-dl
sudo apt install chromium-browser

安装archivebox

1
python3 -m pip install --upgrade archivebox

警告

1
2
3
4
5
6
7
8
9
10
11
12
13
14
WARNING: The script sqlformat is installed in '/home/allen/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script pygmentize is installed in '/home/allen/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script normalizer is installed in '/home/allen/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script django-admin is installed in '/home/allen/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The scripts ipython and ipython3 are installed in '/home/allen/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script dateparser-download is installed in '/home/allen/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script archivebox is installed in '/home/allen/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

解决方案:

执行命令:

1
echo 'export PATH=/home/allen/.local/bin:$PATH' >>~/.bashrc

将黄色警告部分提示的路径复制添加在 export PATH= 后面的, 你需要把你的黄色警告提示的路径复制粘贴替换.

然后再重新安装:

1
python3 -m pip install --upgrade archivebox

运行

初始化:

1
2
mkdir /home/allen/data && cd /home/allen/data
archivebox init

创建管理员账户:

1
archivebox manage createsuperuser

我的密码设置太简单出现红色的警告。

启动服务:

1
archivebox server 0.0.0.0:8000

浏览器打开,正常访问。

点击上面的 ADD ,添加 URL 地址:

等待抓取:

一段时间后可以看到抓取成功:

扩展

反向代理

Nginx的简单配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
server {
listen 80;
listen [::]:80;
server_name archivebox.yydnas.cn;
index index.php index.html index.htm;

location / {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header REMOTE-HOST $remote_addr;
}
}

后台运行

程序默认是在终端中运行,最简单的是运行以下命令:

1
nohup archivebox server 0.0.0.0:8000 &> /dev/null &

也可以创建一个名为 start-archivebox.sh ,放在你的 archivebox 目录,内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/bin/bash

ps -aux | grep "archivebox server" | grep -v grep > /dev/null
if [ "${?}" == "0" ]; then
# echo archivebox is running
exit 1
fi

ABPath=/home/allen/data #替换为你的安装目录
ABPort=8000

if [ -f ${ABPath}/ArchiveBox.conf ]; then
cd ${ABPath}
nohup archivebox server 0.0.0.0:${ABPort} &> /dev/null &
exit 0
fi

exit 2

运行: bash start-archivebox.sh

这个是参考的知乎上面的一篇文章开源的私人档案馆ArchiveBox简介,及二段补强

最后

这只是最简单的安装,更多的使用方法请查阅 ArchiveBox Usage

不过这个程序好像无法设置语言,默认就是英文界面,但是由于界面元素不多,正常使用肯定是没有问题的。