根据大小将文本文件拆分为较小的文件(Windows)
发布时间:2020-12-14 04:25:32 所属栏目:Windows 来源:网络整理
导读:偶尔会创建一个太大而无法打开的日志(.txt)文件(5GB),我需要创建一个解决方案,将其拆分为较小的可读块,以便在wordpad中使用.这是在 Windows Server 2008 R2中. 我需要解决方案是批处理文件,powerShell或类似的东西.理想情况下,应该硬编码,每个文本文件包含不
偶尔会创建一个太大而无法打开的日志(.txt)文件(5GB),我需要创建一个解决方案,将其拆分为较小的可读块,以便在wordpad中使用.这是在
Windows Server 2008 R2中.
我需要解决方案是批处理文件,powerShell或类似的东西.理想情况下,应该硬编码,每个文本文件包含不超过999 MB,而不是停在一行中间. 我找到了一个类似于我的需求的解决方案,有时在https://gallery.technet.microsoft.com/scriptcenter/PowerShell-Split-large-log-6f2c4da0工作(按行计数) ############################################# # Split a log/text file into smaller chunks # ############################################# # WARNING: This will take a long while with extremely large files and uses lots of memory to stage the file # Set the baseline counters # Set the line counter to 0 $linecount = 0 # Set the file counter to 1. This is used for the naming of the log files $filenumber = 1 # Prompt user for the path $sourcefilename = Read-Host "What is the full path and name of the log file to split? (e.g. D:mylogfilesmylog.txt)" # Prompt user for the destination folder to create the chunk files $destinationfolderpath = Read-Host "What is the path where you want to extract the content? (e.g. d:yourpath)" Write-Host "Please wait while the line count is calculated. This may take a while. No really,it could take a long time." # Find the current line count to present to the user before asking the new line count for chunk files Get-Content $sourcefilename | Measure-Object | ForEach-Object { $sourcelinecount = $_.Count } #Tell the user how large the current file is Write-Host "Your current file size is $sourcelinecount lines long" # Prompt user for the size of the new chunk files $destinationfilesize = Read-Host "How many lines will be in each new split file?" # the new size is a string,so we convert to integer and up # Set the upper boundary (maximum line count to write to each file) $maxsize = [int]$destinationfilesize Write-Host File is $sourcefilename - destination is $destinationfolderpath - new file line count will be $destinationfilesize # The process reads each line of the source file,writes it to the target log file and increments the line counter. When it reaches 100000 (approximately 50 MB of text data) $content = get-content $sourcefilename | % { Add-Content $destinationfolderpathsplitlog$filenumber.txt "$_" $linecount ++ If ($linecount -eq $maxsize) { $filenumber++ $linecount = 0 } } # Clean up after your pet [gc]::collect() [gc]::WaitForPendingFinalizers () 但是,当我运行这个时,我在powershell中遇到了许多错误,类似于: Add-Content : The process cannot access the file 'C:Desktopsplitlog1.txt' because it is being used by another process... 所以我要求帮助修复上面的代码,或者帮助创建一个不同的/更好的解决方案. 解决方法
好的,我迎接了挑战.这是应该适合您的功能.它可以按行分割文本文件,将每个输出文件放入尽可能多的完整输入行,而不超过大小字节.
注意:无法严格执行输出文件大小限制. 示例:输入文件包含两个非常长的字符串,每个字符串1Mb.如果您尝试将此文件拆分为512KB块,则生成的文件将分别为1MB. 功能Split-FileByLine: <# .Synopsis Split text file(s) by lines,put into each output file as many complete lines of input as possible without exceeding size bytes. .Description Split text file(s) by lines,put into each output file as many complete lines of input as possible without exceeding size bytes. Note,that output file size limit can't be strictly enforced. Example: input files contains two very long strings,1Mb each. If you try to split this file into the 512KB chunks,resulting files will be 1MB each. Splitted files will have orinignal file's name,followed by the "_part_" string and counter. Example: Original file: large.log Splitted files: large_part_0.log,large_part_1.log,large_part_2.log,etc. .Parameter FileName Array of strings,mandatory. Filename(s) to split. .Parameter OutPath String,mandatory. Folder,where splittedfiles will be stored. Will be created,if not exists. .Parameter MaxFileSize Long,mandatory. Maximum output file size. When output file reaches this size,new file will be created. You can use PowerShell's multipliers: KB,MB,GB,TB,PB .Parameter Encoding String. If not specified,script will use system's current ANSI code page to read the files. You can get other valid encodings for your system in PowerShell console like this: [System.Text.Encoding]::GetEncodings() Example: Unicode (UTF-7): utf-7 Unicode (UTF-8): utf-8 Western European (Windows): Windows-1252 .Example Split-FileByLine -FileName '.large.log' -OutPath '.splitted' -MaxFileSize 100MB -Verbose Split file "large.log" in current folder,write resulting files in subfolder "splitted",limit output file size to 100Mb,be verbose. .Example Split-FileByLine -FileName '.large.log' -OutPath '.splitted' -MaxFileSize 100MB -Encoding 'utf-8' Split file "large.log" in current folder,use UTF-8 encoding. .Example Split-FileByLine -FileName '.large_1.log','.large_2.log' -OutPath '.splitted' -MaxFileSize 999MB Split files "large_1.log" ".large_2.log" and in current folder,limit output file size to 999MB. .Example '.large_1.log','.large_2.log' | Split-FileByLine -FileName -OutPath '.splitted' -MaxFileSize 999MB Split files "large_1.log" ".large_2.log" and in current folder,limit output file size to 999MB. #> function Split-FileByLine { [CmdletBinding()] Param ( [Parameter(Mandatory = $true,ValueFromPipeline = $true,ValueFromPipelineByPropertyName = $true)] [string[]]$FileName,[Parameter(ValueFromPipelineByPropertyName = $true)] [string]$OutPath = (Get-Location -PSProvider FileSystem).Path,[Parameter(Mandatory = $true,ValueFromPipelineByPropertyName = $true)] [long]$MaxFileSize,[Parameter(ValueFromPipelineByPropertyName = $true)] [string]$Encoding = 'Default' ) Begin { # Scriptblocks for common tasks $DisposeInFile = { Write-Verbose 'Disposing StreamReader' $InFile.Close() $InFile.Dispose() } $DispoSEOutFile = { Write-Verbose 'Disposing StreamWriter' $OutFile.Flush() $OutFile.Close() $OutFile.Dispose() } $NewStreamWriter = { Write-Verbose 'Creating StreamWriter' $OutFileName = Join-Path -Path $OutPath -ChildPath ( '{0}_part_{1}{2}' -f [System.IO.Path]::GetFileNameWithoutExtension($_),$Counter,[System.IO.Path]::GetExtension($_) ) $OutFile = New-Object -TypeName System.IO.StreamWriter -ArgumentList ( $OutFileName,$false,$FileEncoding ) -ErrorAction Stop $OutFile.AutoFlush = $true Write-Verbose "Writing new file: $OutFileName" } } Process { if($Encoding -eq 'Default') { # Set default encoding $FileEncoding = [System.Text.Encoding]::Default } else { # Try to set user-specified encoding try { $FileEncoding = [System.Text.Encoding]::GetEncoding($Encoding) } catch { throw "Not valid encoding: $Encoding" } } Write-Verbose "Input file: $FileName" Write-Verbose "Output folder: $OutPath" if(!(Test-Path -Path $OutPath -PathType Container)){ Write-Verbose "Folder doesn't exist,creating: $OutPath" $null = New-Item -Path $OutPath -ItemType Directory -ErrorAction Stop } $FileName | ForEach-Object { # Open input file $InFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ( $_,$FileEncoding ) -ErrorAction Stop Write-Verbose "Current file: $_" $Counter = 0 $OutFile = $null # Read lines from input file while(($line = $InFile.ReadLine()) -ne $null) { if($OutFile -eq $null) { # No output file,create StreamWriter . $NewStreamWriter } else { if($OutFile.BaseStream.Length -ge $MaxFileSize) { # Output file reached size limit,closing Write-Verbose "OutFile lenght: $($InFile.BaseStream.Length)" . $DispoSEOutFile $Counter++ . $NewStreamWriter } } # Write line to the output file $OutFile.WriteLine($line) } Write-Verbose "Finished processing file: $_" # Close open files and cleanup objects . $DispoSEOutFile . $DisposeInFile } } } 您可以在脚本中使用它,如下所示: function Split-FileByLine { # function body here } $InputFile = 'c:loglarge.log' $OutputDir = 'c:log_split' Split-FileByLine -FileName $InputFile -OutPath $OutputDir -MaxFileSize 999MB (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
相关内容
- 在Windows上安装和运行hadoop 2.2的文档
- 推荐的Windows 8 Metro开发硬件?
- 新的Windows 10放大镜使用了什么算法?
- winapi – Windows API中有哪些定期计时器对象?
- 有没有办法在Windows上获得超过Meta和Ctrl的方法?长时间Em
- 在Windows上的Apache / PHP中禁用了OpenSSL支持
- batch-file – 如何在运行批处理文件后关闭命令行窗口?
- Windows Azure AppFabric访问控制服务(ACS)中的OAuth 2.0身
- windows – 传递包含连字符的命令行参数
- .net-4.0 – Microsoft图表堆积柱形图存在差距
推荐文章
站长推荐
热点阅读