IO::Compress::FAQ

(來源, CPAN)

#名稱

IO::Compress::FAQ -- IO::Compress 常見問題

#說明

常見問題解答。

#一般

#與 Unix compress/uncompress 相容性。

儘管 Compress::Zlib 有一對名為 compress 和 uncompress 的函式，但它們不與同名的 Unix 程式相關。Compress::Zlib 模組與 Unix compress 不相容。

如果您有可用的 uncompress 程式，可以使用它來讀取壓縮檔案

open F, "uncompress -c $filename |";
while (<F>)
{
    ...

或者，如果您有可用的 gunzip 程式，可以使用它來讀取壓縮檔案

open F, "gunzip -c $filename |";
while (<F>)
{
    ...

如果您有可用的 compress 程式，可以使用它來寫入壓縮檔案

open F, "| compress -c $filename ";
print F "data";
...
close F ;

#存取 .tar.Z 檔案

Archive::Tar 模組可以選擇使用 Compress::Zlib (透過 IO::Zlib 模組) 來存取已使用 gzip 壓縮的 tar 檔案。不幸的是，使用 Unix compress 工具壓縮的 tar 檔案無法由 Compress::Zlib 讀取，因此無法由 Archive::Tar 直接存取。

如果 uncompress 或 gunzip 程式可用，您可以使用下列其中一種解決方案從 Archive::Tar 讀取 .tar.Z 檔案

首先使用 uncompress

use strict;
use warnings;
use Archive::Tar;

open F, "uncompress -c $filename |";
my $tar = Archive::Tar->new(*F);
...

然後使用 gunzip

use strict;
use warnings;
use Archive::Tar;

open F, "gunzip -c $filename |";
my $tar = Archive::Tar->new(*F);
...

類似地，如果 compress 程式可用，您可以使用它來寫入 .tar.Z 檔案

use strict;
use warnings;
use Archive::Tar;
use IO::File;

my $fh = IO::File->new( "| compress -c >$filename" );
my $tar = Archive::Tar->new();
...
$tar->write($fh);
$fh->close ;

#如何使用不同的壓縮方式重新壓縮？

如果您了解所有 IO::Compress::* 物件都衍生自 IO::File，而且所有 IO::Uncompress::* 模組都可以從 IO::File 檔案句柄讀取，您會發現這比您預期的更容易。

例如，假設您有一個已使用 gzip 壓縮的檔案，您想要使用 bzip2 重新壓縮它。以下是執行重新壓縮所需的所有步驟。

use IO::Uncompress::Gunzip ':all';
use IO::Compress::Bzip2 ':all';

my $gzipFile = "somefile.gz";
my $bzipFile = "somefile.bz2";

my $gunzip = IO::Uncompress::Gunzip->new( $gzipFile )
    or die "Cannot gunzip $gzipFile: $GunzipError\n" ;

bzip2 $gunzip => $bzipFile
    or die "Cannot bzip2 to $bzipFile: $Bzip2Error\n" ;

請注意，此技術有一個限制。有些壓縮檔案格式會儲存額外資訊與壓縮資料負載。例如，gzip 可以選擇儲存原始檔名，而 Zip 會儲存許多關於原始檔案的資訊。如果原始壓縮檔案包含任何這些額外資訊，它將不會使用上述技術傳輸到新的壓縮檔案。

#ZIP

#IO::Compress::Zip 和 IO::Uncompress::Unzip 支援哪些壓縮類型？

下列壓縮格式由 IO::Compress::Zip 和 IO::Uncompress::Unzip 支援

儲存 (方法 0)

完全不壓縮。
Deflate (方法 8)

這是使用 IO::Compress::Zip 建立 zip 檔案時所使用的預設壓縮方式。
Bzip2 (方法 12)

僅在安裝 IO-Compress-Bzip2 模組時支援。
Lzma (方法 14)

僅在安裝 IO-Compress-Lzma 模組時支援。

#我可以讀取/寫入大於 4 GB 的 Zip 檔案嗎？

可以，IO-Compress-Zip 和 IO-Uncompress-Unzip 模組都支援稱為 Zip64 的 zip 功能。這讓它們可以讀取/寫入大於 4 GB 的檔案/緩衝區。

如果你使用一次性介面建立 Zip 檔案，且任何輸入檔案都大於 4 GB，則會建立一個符合 zip64 的 zip 檔案。

zip "really-large-file" => "my.zip";

類似地，使用一次性介面時，如果輸入是一個大於 4 GB 的緩衝區，則會建立一個符合 zip64 的 zip 檔案。

zip \$really_large_buffer => "my.zip";

一次性介面允許你透過包含 Zip64 選項來強制建立 zip64 zip 檔案。

zip $filehandle => "my.zip", Zip64 => 1;

如果你想要使用 OO 介面建立 zip64 zip 檔案，你必須指定 Zip64 選項。

my $zip = IO::Compress::Zip->new( "whatever", Zip64 => 1 );

使用 IO-Uncompress-Unzip 解壓縮時，它會自動偵測 zip 檔案是否為 zip64。

如果你打算使用外部 zip/unzip 工具來處理使用 IO-Compress-Zip 建立的 Zip64 zip 檔案，請確定它支援 Zip64。

特別是，如果你正在使用 Info-Zip，你需要有 3.x 或更新版本的 zip 來更新 Zip64 檔案，以及 6.x 版的 unzip 來讀取 Zip64 檔案。

#我可以寫入超過 64K 個項目到 Zip 檔案嗎？

可以。Zip64 允許這樣做。請參閱前一個問題。

#Zip 資源

zip 檔案的主要參考文件是「appnote」文件，可於 http://www.pkware.com/documents/casestudies/APPNOTE.TXT 取得

另一個選擇是 Info-Zip appnote。可從 ftp://ftp.info-zip.org/pub/infozip/doc/ 取得

#GZIP

#Gzip 資源

gzip 檔案的主要參考文件為 RFC 1952 https://datatracker.ietf.org/doc/html/rfc1952

gzip 的主要網站為 http://www.gzip.org。

#處理串接的 gzip 檔案

如果 gunzip 程式遇到一個包含多個串接在一起的 gzip 檔案，它會自動解壓縮所有檔案。以下範例說明此行為

$ echo abc | gzip -c >x.gz
$ echo def | gzip -c >>x.gz
$ gunzip -c x.gz
abc
def

預設情況下，IO::Uncompress::Gunzip 不會像 gunzip 程式一樣運作。它只會解壓縮檔案中的第一個 gzip 資料串流，如下所示

$ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT'
abc

若要強制 IO::Uncompress::Gunzip 解壓縮所有 gzip 資料串流，請包含 MultiStream 選項，如下所示

$ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT, MultiStream => 1'
abc
def

#使用 IO::Uncompress::Gunzip 讀取 bgzip 檔案

bgzip 檔案包含多個有效的 gzip 相容資料串流串接在一起。若要使用 IO::Uncompress::Gunzip 讀取由 bgzip 建立的檔案，請使用前一節中所示的 MultiStream 選項。

請參閱 http://samtools.github.io/hts-specs/SAMv1.pdf 中標題為「BGZF 壓縮格式」的章節，以取得 bgzip 的定義。

#ZLIB

#Zlib 資源

zlib 壓縮函式庫的主要網站為 http://www.zlib.org。

#Bzip2

#Bzip2 資源

bzip2 的主要網站為 http://www.bzip.org。

#處理串接的 bzip2 檔案

如果 bunzip2 程式遇到一個包含多個串接在一起的 bzip2 檔案，它會自動解壓縮所有檔案。以下範例說明此行為

$ echo abc | bzip2 -c >x.bz2
$ echo def | bzip2 -c >>x.bz2
$ bunzip2 -c x.bz2
abc
def

預設情況下，IO::Uncompress::Bunzip2 不會像 bunzip2 程式一樣運作。它只會解壓縮檔案中的第一個 bunzip2 資料串流，如下所示

$ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT'
abc

若要強制 IO::Uncompress::Bunzip2 解壓縮所有 bzip2 資料串流，請包含 MultiStream 選項，如下所示

$ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT, MultiStream => 1'
abc
def

#與 Pbzip2 互通

Pbzip2 (http://compression.ca/pbzip2/) 是 bzip2 的平行實作。pbzip2 的輸出包含多個串接的 bzip2 資料串流。

預設情況下，IO::Uncompress::Bzip2 只會解壓縮 pbzip2 檔案中的第一個 bzip2 資料串流。若要解壓縮完整的 pbzip2 檔案，您必須包含 MultiStream 選項，如下所示。

bunzip2 $input => \$output, MultiStream => 1
    or die "bunzip2 failed: $Bunzip2Error\n";

#HTTP 與網路

#Apache::GZip 再探

以下是一個 mod_perl Apache 壓縮模組，稱為 Apache::GZip，取自 http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression

package Apache::GZip;
#File: Apache::GZip.pm

use strict vars;
use Apache::Constants ':common';
use Compress::Zlib;
use IO::File;
use constant GZIP_MAGIC => 0x1f8b;
use constant OS_MAGIC => 0x03;

sub handler {
    my $r = shift;
    my ($fh,$gz);
    my $file = $r->filename;
    return DECLINED unless $fh=IO::File->new($file);
    $r->header_out('Content-Encoding'=>'gzip');
    $r->send_http_header;
    return OK if $r->header_only;

    tie *STDOUT,'Apache::GZip',$r;
    print($_) while <$fh>;
    untie *STDOUT;
    return OK;
}

sub TIEHANDLE {
    my($class,$r) = @_;
    # initialize a deflation stream
    my $d = deflateInit(-WindowBits=>-MAX_WBITS()) || return undef;

    # gzip header -- don't ask how I found out
    $r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC));

    return bless { r   => $r,
                   crc =>  crc32(undef),
                   d   => $d,
                   l   =>  0
                 },$class;
}

sub PRINT {
    my $self = shift;
    foreach (@_) {
      # deflate the data
      my $data = $self->{d}->deflate($_);
      $self->{r}->print($data);
      # keep track of its length and crc
      $self->{l} += length($_);
      $self->{crc} = crc32($_,$self->{crc});
    }
}

sub DESTROY {
   my $self = shift;

   # flush the output buffers
   my $data = $self->{d}->flush;
   $self->{r}->print($data);

   # print the CRC and the total length (uncompressed)
   $self->{r}->print(pack("LL",@{$self}{qw/crc l/}));
}

1;

以下是您需要用來使用它的 Apache 設定項目。設定後，/compressed 目錄中的所有內容都會自動壓縮。

<Location /compressed>
   SetHandler  perl-script
   PerlHandler Apache::GZip
</Location>

雖然乍看之下 Apache::GZip 似乎有很多功能，但您可以將程式碼所執行的動作總結如下：讀取 $r->filename 中的檔案內容，壓縮它，並將壓縮後的資料寫入標準輸出。僅此而已。

此程式碼必須經過一些步驟才能達成此目的，因為

Compress::Zlib 版本 1.x 中的 gzip 支援只能與真正的檔案系統檔案代號一起使用。Apache 模組使用的檔案代號與檔案系統無關。
這表示所有 gzip 支援都必須手動完成，在本例中，透過建立一個繫結檔案代號來處理 gzip 標頭和尾端的建立。

IO::Compress::Gzip 沒有這種檔案代號限制（這也是最初撰寫它的原因之一）。因此，如果使用 IO::Compress::Gzip 代替 Compress::Zlib，則可以移除整個繫結檔案代號程式碼。以下是重寫後的程式碼。

package Apache::GZip;

use strict vars;
use Apache::Constants ':common';
use IO::Compress::Gzip;
use IO::File;

sub handler {
    my $r = shift;
    my ($fh,$gz);
    my $file = $r->filename;
    return DECLINED unless $fh=IO::File->new($file);
    $r->header_out('Content-Encoding'=>'gzip');
    $r->send_http_header;
    return OK if $r->header_only;

    my $gz = IO::Compress::Gzip->new( '-', Minimal => 1 )
        or return DECLINED ;

    print $gz $_ while <$fh>;

    return OK;
}

或更簡潔地說，像這樣，使用一次性 gzip

package Apache::GZip;

use strict vars;
use Apache::Constants ':common';
use IO::Compress::Gzip qw(gzip);

sub handler {
    my $r = shift;
    $r->header_out('Content-Encoding'=>'gzip');
    $r->send_http_header;
    return OK if $r->header_only;

    gzip $r->filename => '-', Minimal => 1
      or return DECLINED ;

    return OK;
}

1;

上述一次性 gzip 的使用僅從 $r->filename 讀取並將壓縮後的資料寫入標準輸出。

請注意上述程式碼中 Minimal 選項的使用。當將 gzip 用於內容編碼時，您應該始終使用此選項。在上述範例中，它會防止檔案名稱包含在 gzip 標頭中，並使 gzip 資料串流的尺寸略小一點。

#壓縮檔案和 Net::FTP

Net::FTP 模組提供兩個稱為 stor 和 retr 的低階方法，兩個方法都會傳回檔案代號。這些檔案代號可以用於 IO::Compress/Uncompress 模組，以在從 FTP 伺服器讀取或寫入檔案時壓縮或解壓縮檔案，而無需建立暫存檔案。

首先，以下是使用 retr 在從 FTP 伺服器讀取檔案時解壓縮檔案的程式碼。

use Net::FTP;
use IO::Uncompress::Gunzip qw(:all);

my $ftp = Net::FTP->new( ... )

my $retr_fh = $ftp->retr($compressed_filename);
gunzip $retr_fh => $outFilename, AutoClose => 1
    or die "Cannot uncompress '$compressed_file': $GunzipError\n";

並在寫入 FTP 伺服器時壓縮檔案

use Net::FTP;
use IO::Compress::Gzip qw(:all);

my $stor_fh = $ftp->stor($filename);
gzip "filename" => $stor_fh, AutoClose => 1
    or die "Cannot compress '$filename': $GzipError\n";

#MISC

#使用 `InputLength` 解壓縮嵌入在較大檔案/緩衝區中的資料。

一個相當常見的用例是壓縮資料嵌入在較大檔案/緩衝區中，而您想要同時讀取兩者。

例如，考慮 zip 檔案的結構。這是一種定義良好的檔案格式，它將壓縮和未壓縮的資料區段混合在單一檔案中。

就這次討論的目的而言，您可以將 zip 檔案視為壓縮資料串流的順序，每個串流都以未壓縮的區域標頭為前綴。區域標頭包含有關壓縮資料串流的資訊，包括壓縮檔案的名稱，特別是壓縮資料串流的長度。

以下是如何使用 InputLength 的說明，這個腳本會瀏覽一個 zip 檔案並列印出每個壓縮檔案中有多少行（如果您打算撰寫程式碼來實際瀏覽 zip 檔案，請參閱 "瀏覽 zip 檔案" in IO::Uncompress::Unzip ）。此外，儘管此範例使用基於 zlib 的壓縮，但其他 IO::Uncompress::* 模組也可以使用此技術。

use strict;
use warnings;

use IO::File;
use IO::Uncompress::RawInflate qw(:all);

use constant ZIP_LOCAL_HDR_SIG  => 0x04034b50;
use constant ZIP_LOCAL_HDR_LENGTH => 30;

my $file = $ARGV[0] ;

my $fh = IO::File->new( "<$file" )
            or die "Cannot open '$file': $!\n";

while (1)
{
    my $sig;
    my $buffer;

    my $x ;
    ($x = $fh->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH
        or die "Truncated file: $!\n";

    my $signature = unpack ("V", substr($buffer, 0, 4));

    last unless $signature == ZIP_LOCAL_HDR_SIG;

    # Read Local Header
    my $gpFlag             = unpack ("v", substr($buffer, 6, 2));
    my $compressedMethod   = unpack ("v", substr($buffer, 8, 2));
    my $compressedLength   = unpack ("V", substr($buffer, 18, 4));
    my $uncompressedLength = unpack ("V", substr($buffer, 22, 4));
    my $filename_length    = unpack ("v", substr($buffer, 26, 2));
    my $extra_length       = unpack ("v", substr($buffer, 28, 2));

    my $filename ;
    $fh->read($filename, $filename_length) == $filename_length
        or die "Truncated file\n";

    $fh->read($buffer, $extra_length) == $extra_length
        or die "Truncated file\n";

    if ($compressedMethod != 8 && $compressedMethod != 0)
    {
        warn "Skipping file '$filename' - not deflated $compressedMethod\n";
        $fh->read($buffer, $compressedLength) == $compressedLength
            or die "Truncated file\n";
        next;
    }

    if ($compressedMethod == 0 && $gpFlag & 8 == 8)
    {
        die "Streamed Stored not supported for '$filename'\n";
    }

    next if $compressedLength == 0;

    # Done reading the Local Header

    my $inf = IO::Uncompress::RawInflate->new( $fh,
                        Transparent => 1,
                        InputLength => $compressedLength )
      or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;

    my $line_count = 0;

    while (<$inf>)
    {
        ++ $line_count;
    }

    print "$filename: $line_count\n";
}

上述程式碼的大部分與讀取 zip 區域標頭資料有關。我想關注的程式碼在底部。

while (1) {

    # read local zip header data
    # get $filename
    # get $compressedLength

    my $inf = IO::Uncompress::RawInflate->new( $fh,
                        Transparent => 1,
                        InputLength => $compressedLength )
      or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;

    my $line_count = 0;

    while (<$inf>)
    {
        ++ $line_count;
    }

    print "$filename: $line_count\n";
}

呼叫 IO::Uncompress::RawInflate 會建立一個新的檔案代號 $inf，可以用來從父檔案代號 $fh 讀取，並在讀取時解壓縮。使用 InputLength 選項將保證從 $fh 檔案代號讀取的壓縮資料最多為 $compressedLength 位元組（唯一的例外是發生錯誤的情況，例如檔案被截斷或資料串流損毀）。

這表示一旦 RawInflate 完成，$fh 將會停留在壓縮資料串流之後的位元組。

現在考慮沒有 InputLength 的程式碼會是什麼樣子

while (1) {

    # read local zip header data
    # get $filename
    # get $compressedLength

    # read all the compressed data into $data
    read($fh, $data, $compressedLength);

    my $inf = IO::Uncompress::RawInflate->new( \$data,
                        Transparent => 1 )
      or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;

    my $line_count = 0;

    while (<$inf>)
    {
        ++ $line_count;
    }

    print "$filename: $line_count\n";
}

這裡的差異是新增了暫時變數 $data。這用於在解壓縮時儲存壓縮資料的副本。

如果您知道 $compressedLength 沒有那麼大，那麼使用暫時儲存空間不會是個問題。但如果 $compressedLength 非常大，或者您正在撰寫其他人會使用的應用程式，因此不知道 $compressedLength 會有多大，這可能會是個問題。

使用 InputLength 可以避免使用暫時儲存空間，這表示應用程式可以應付大型壓縮資料串流。

最後一點——顯然地，InputLength 只能在您事先知道壓縮資料長度時使用，例如這裡的 zip 檔案。

#支援

一般性的回饋/問題/錯誤報告應傳送至 https://github.com/pmqs//issues（優先）或 https://rt.cpan.org/Public/Dist/Display.html?Name=。

#另請參閱

Compress::Zlib、IO::Compress::Gzip、IO::Uncompress::Gunzip、IO::Compress::Deflate、IO::Uncompress::Inflate、IO::Compress::RawDeflate、IO::Uncompress::RawInflate、IO::Compress::Bzip2、IO::Uncompress::Bunzip2、IO::Compress::Lzma、IO::Uncompress::UnLzma、IO::Compress::Xz、IO::Uncompress::UnXz、IO::Compress::Lzip、IO::Uncompress::UnLzip、IO::Compress::Lzop、IO::Uncompress::UnLzop、IO::Compress::Lzf、IO::Uncompress::UnLzf、IO::Compress::Zstd、IO::Uncompress::UnZstd、IO::Uncompress::AnyInflate、IO::Uncompress::AnyUncompress