加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

python – 散列多个文件

发布时间:2020-12-20 13:03:39 所属栏目:Python 来源:网络整理
导读:Problem Specification: Given a directory,I want to iterate through the directory and its non-hidden sub-directories, ?and add a whirlpool hash into the non-hidden file’s names. If the script is re-run it would would replace an old hash wi

Problem Specification:

Given a directory,I want to iterate through the directory and its non-hidden sub-directories,
?and add a whirlpool hash into the non-hidden
file’s names.
If the script is re-run it would would replace an old hash with a new one.

<filename>.<extension>???==>??<filename>.<a-whirlpool-hash>.<extension>

<filename>.<old-hash>.<extension>???==>??<filename>.<new-hash>.<extension>

Question:

a) How would you do this?

b) Out of the all methods available to you,what makes your method most suitable?

Verdict:

Thanks all,I have chosen SeigeX’s answer for it’s speed and portability.
It is emprically quicker than the other bash variants,
?and it worked without alteration on my Mac OS X machine.

解决方法

更新以修复:
1.文件名名称中带有“[‘或’]'(实际上,现在是任何字符.请参阅注释)
2.在对名称中带有反斜杠或换行符的文件进行散列处理时处理md5sum
3.功能化的哈希检查算法用于模块化
4.重构散列检查逻辑以消除双重否定

#!/bin/bash
if (($# != 1)) || ! [[ -d "$1" ]]; then
    echo "Usage: $0 /path/to/directory"
    exit 1
fi

is_hash() {
 md5=${1##*.} # strip prefix
 [[ "$md5" == *[^[:xdigit:]]* || ${#md5} -lt 32 ]] && echo "$1" || echo "${1%.*}"
}

while IFS= read -r -d $'' file; do
    read hash junk < <(md5sum "$file")
    basename="${file##*/}"
    dirname="${file%/*}"
    pre_ext="${basename%.*}"
    ext="${basename:${#pre_ext}}"

    # File already hashed?
    pre_ext=$(is_hash "$pre_ext")
    ext=$(is_hash "$ext")

    mv "$file" "${dirname}/${pre_ext}.${hash}${ext}" 2> /dev/null

done < <(find "$1" -path "*/.*" -prune -o ( -type f -print0 ))

到目前为止,此代码与其他条目相比具有以下优点

>它完全符合Bash 2.0.2及更高版本
>没有多余的调用其他二进制文件,如sed或grep;使用内置参数扩展代替
>使用’find’的进程替换而不是管道,没有子shell以这种方式生成
>将目录作为参数进行处理,并对其进行健全性检查
>使用$()而不是“命令替换符号,后者不推荐使用
>使用带空格的文件
>使用带换行符的文件
>使用具有多个扩展名的文件
>使用没有扩展名的文件
>不遍历隐藏目录
>不跳过预先散列的文件,它会根据规范重新计算散列值

测试树

$tree -a a
a
|-- .hidden_dir
|   `-- foo
|-- b
|   `-- c.d
|       |-- f
|       |-- g.5236b1ab46088005ed3554940390c8a7.ext
|       |-- h.d41d8cd98f00b204e9800998ecf8427e
|       |-- i.ext1.5236b1ab46088005ed3554940390c8a7.ext2
|       `-- j.ext1.ext2
|-- c.ext^Mnewline
|   |-- f
|   `-- g.with[or].ext
`-- f^Jnewline.ext

4 directories,9 files 

结果

$tree -a a
a
|-- .hidden_dir
|   `-- foo
|-- b
|   `-- c.d
|       |-- f.d41d8cd98f00b204e9800998ecf8427e
|       |-- g.d41d8cd98f00b204e9800998ecf8427e.ext
|       |-- h.d41d8cd98f00b204e9800998ecf8427e
|       |-- i.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
|       `-- j.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
|-- c.ext^Mnewline
|   |-- f.d41d8cd98f00b204e9800998ecf8427e
|   `-- g.with[or].d41d8cd98f00b204e9800998ecf8427e.ext
`-- f^Jnewline.d3b07384d113edec49eaa6238ad5ff00.ext

4 directories,9 files

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读