目錄

名稱

perldsc - Perl 資料結構食譜

說明

Perl 讓我們擁有複雜的資料結構。您可以寫一些像這樣的東西,突然間,您將擁有一個具有三個維度的陣列!

for my $x (1 .. 10) {
    for my $y (1 .. 10) {
        for my $z (1 .. 10) {
            $AoA[$x][$y][$z] =
                $x ** $y + $z;
        }
    }
}

唉,儘管這看起來很簡單,但它是一個比肉眼看到的更精緻的結構!

您如何列印它?為什麼您不能只說 print @AoA?您如何對它進行排序?您如何將它傳遞給函數或從函數中獲取其中一個?它是一個物件嗎?您可以將它儲存在磁碟上以供以後讀取嗎?您如何存取該矩陣的整行或整列?所有值都必須是數字嗎?

如您所見,很容易感到困惑。雖然這部分原因可以歸因於基於參考的實作,但實際上更多是基於缺乏為初學者設計範例的現有文件。

這份文件旨在詳細且易於理解地處理您可能想要開發的各種不同資料結構。它也應該作為範例食譜。這樣,當您需要建立這些複雜的資料結構之一時,您只需從這裡摘取、盜用或竊取一個範例即可。

讓我們詳細檢視每個可能的建構。以下是每個建構的獨立部分

但現在,讓我們看看所有這些類型資料結構共有的常見問題。

參考

了解 Perl 中所有資料結構(包括多維陣列)最重要的事情是,即使它們可能看起來不是這樣,Perl @ARRAY%HASH 在內部都是一維的。它們只能儲存純量值(表示字串、數字或參照)。它們無法直接包含其他陣列或雜湊,而是包含對其他陣列或雜湊的參照

您無法以與實際陣列或雜湊相同的方式使用對陣列或雜湊的參照。對於不習慣區分陣列和指向相同陣列的指標的 C 或 C++ 程式設計師來說,這可能會令人困惑。如果是這樣,請將其視為結構和指向結構的指標之間的差異。

您可以在 perlref 中閱讀有關參照的更多資訊(您也應該這麼做)。簡而言之,參照有點像知道它們指向什麼的指標。(物件也是一種參照,但我們現在還不需要它們,甚至永遠不需要。)這表示當您擁有看起來像是對二維或多維陣列和/或雜湊的存取時,實際上發生的事情是基本類型只是一個包含對下一層級參照的一維實體。只是您可以使用它,就像它是一個二維實體一樣。這實際上也是幾乎所有 C 多維陣列運作的方式。

$array[7][12]                       # array of arrays
$array[7]{string}                   # array of hashes
$hash{string}[7]                    # hash of arrays
$hash{string}{'another string'}     # hash of hashes

現在,因為頂層只包含參照,如果您嘗試使用簡單的 print() 函數列印您的陣列,您將會得到一些看起來不太好的東西,如下所示

  my @AoA = ( [2, 3], [4, 5, 7], [0] );
  print $AoA[1][2];
7
  print @AoA;
ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)

這是因為 Perl 絕不會隱含地取消變數的參照。如果您想要取得參照所參照的事物,則必須使用前置輸入指標(例如 ${$blah}@{$blah}@{$blah[$i]})或後置指標箭號(例如 $a->[3]$h->{fred},甚至 $ob->method()->[3])自己執行此操作。

常見錯誤

在建構類似陣列的陣列時,最常犯的兩個錯誤,一是意外地計算元素數量,另一個則是重複引用同一個記憶體位置。以下是只取得計數,而不是巢狀陣列的情況

for my $i (1..10) {
    my @array = somefunc($i);
    $AoA[$i] = @array;      # WRONG!
}

這只是將陣列指定給純量並取得其元素計數的簡單情況。如果這正是你真正想要的,那麼你最好考慮更明確地說明,如下所示

for my $i (1..10) {
    my @array = somefunc($i);
    $counts[$i] = scalar @array;
}

以下是重複引用同一個記憶體位置的情況

# Either without strict or having an outer-scope my @array;
# declaration.

for my $i (1..10) {
    @array = somefunc($i);
    $AoA[$i] = \@array;     # WRONG!
}

那麼,這樣有什麼大問題?看起來沒錯,不是嗎?畢竟,我剛剛告訴你,你需要一個參考陣列,所以天啊,你已經幫我做了一個!

很不幸的是,雖然這是真的,但它仍然是壞掉的。@AoA 中的所有參考都引用到同一個地方,因此它們都將儲存 @array 中最後一個元素!這類似於以下 C 程式中展示的問題

#include <pwd.h>
main() {
    struct passwd *getpwnam(), *rp, *dp;
    rp = getpwnam("root");
    dp = getpwnam("daemon");

    printf("daemon name is %s\nroot name is %s\n",
            dp->pw_name, rp->pw_name);
}

它將印出

daemon name is daemon
root name is daemon

問題在於 rpdp 都是指向記憶體中同一個位置的指標!在 C 中,你必須記得自己使用 malloc() 函數為自己分配一些新的記憶體。在 Perl 中,你應該使用陣列建構函式 [] 或雜湊建構函式 {}。以下是執行前面壞掉的程式片段的正確方法

# Either without strict or having an outer-scope my @array;
# declaration.

for my $i (1..10) {
    @array = somefunc($i);
    $AoA[$i] = [ @array ];
}

方括號會建立一個參考,指向一個新陣列,其中包含指定時 @array 中的副本。這就是你想要的。

請注意,這將產生類似於下列的結果

# Either without strict or having an outer-scope my @array;
# declaration.
for my $i (1..10) {
    @array = 0 .. $i;
    $AoA[$i]->@* = @array;
}

是否相同?嗯,也許是——也許不是。細微的差別在於,當你在方括號中指定某個東西時,你可以確定它永遠是一個全新的參考,並包含資料的全新副本。在這個新的案例中,$AoA[$i]->@* 這個去參考在指定左側時,可能會發生其他事情。這一切都取決於 $AoA[$i] 是否一開始未定義,或者它是否已經包含一個參考。如果你已經使用參考來填充 @AoA,就像

$AoA[3] = \@another_array;

那麼,在左側使用間接指定的指定將使用已經存在的參考

$AoA[3]->@* = @array;

當然,這產生一個「有趣的」效果,就是會破壞 @another_array。(你是否曾注意到,當一位程式設計師說某件事「有趣」時,這表示「令人著迷」的意思,而不是更傾向於表示「令人討厭」、「困難」,或兩者兼具? :-)

因此,只要記得永遠使用陣列或雜湊建構函式,加上 []{},你就會沒問題,儘管這並不總是最佳的效率。

令人驚訝的是,以下這個看起來很危險的建構實際上會正常運作

for my $i (1..10) {
    my @array = somefunc($i);
    $AoA[$i] = \@array;
}

這是因為 my() 更像是一個執行時期陳述,而不是一個編譯時期宣告本身。這表示 my() 變數會在每次迴圈中重新製作。因此,即使看起來你每次都儲存相同的變數參考,但你實際上並沒有!這是一個細微的差別,可能會產生更有效率的程式碼,但風險是會誤導所有經驗最豐富的程式設計師。因此,我通常建議不要教導初學者。事實上,除了傳遞引數給函式之外,我很少喜歡在程式碼中看到取用參考運算子(反斜線)被廣泛使用。相反地,我建議初學者,他們(和我們大多數人)應該嘗試使用更容易理解的建構函式 []{},而不是依賴詞彙(或動態)範圍和隱藏的參考計數,讓它們在幕後做正確的事。

另請注意,還有另一種方法可以撰寫去參考!這兩行是等效的

$AoA[$i]->@* = @array;
@{ $AoA[$i] } = @array;

第一種形式,稱為後綴取消參照通常較容易閱讀,因為表達式可以從左到右閱讀,而且沒有用於平衡的封閉大括號。另一方面,它也是較新的。它在 2014 年新增到語言中,因此您經常會在較舊的程式碼中遇到另一種形式,環繞取消參照

總之

$AoA[$i] = [ @array ];     # usually best
$AoA[$i] = \@array;        # perilous; just how my() was that array?
$AoA[$i]->@*  = @array;    # way too tricky for most programmers
@{ $AoA[$i] } = @array;    # just as tricky, and also harder to read

優先順序警告

談到像 @{$AoA[$i]} 之類的東西,以下實際上是同一件事

$aref->[2][2]       # clear
$$aref[2][2]        # confusing

這是因為 Perl 的優先順序規則在其五個前綴取消參照器上(看起來像有人在發誓:$ @ * % &)使它們比後綴下標方括號或大括號更緊密地結合!這無疑會讓 C 或 C++ 程式設計師感到非常震驚,他們非常習慣使用 *a[i] 來表示 a 的第 i 個元素所指向的內容。也就是說,他們首先取下標,然後才取消參照該下標處的東西。這在 C 中很好,但這不是 C。

Perl 中看似等效的結構 $$aref[$i] 首先對 $aref 進行取消參照,使其將 $aref 作為對陣列的參考,然後取消參照,最後告訴您 $AoA 所指向陣列的第 i 個值。如果您想要 C 的概念,您可以寫 $AoA[$i]->$* 來明確取消參照第 i 個項目,從左到右讀取。

您應該始終使用 use VERSION 的原因

如果這開始聽起來比它值得的更可怕,請放鬆。Perl 有一些功能可以幫助您避免其最常見的陷阱。避免混淆的一種方法是用以下內容開始每個程式

use strict;

這樣,您將被迫使用 my() 宣告所有變數,並且禁止意外的「符號取消參照」。因此,如果您這樣做

my $aref = [
    [ "fred", "barney", "pebbles", "bambam", "dino", ],
    [ "homer", "bart", "marge", "maggie", ],
    [ "george", "jane", "elroy", "judy", ],
];

print $aref[2][2];

編譯器會立即將其標記為編譯時的錯誤,因為您意外存取未宣告的變數 @aref,它會因此提醒您改寫成

print $aref->[2][2]

自 Perl 版本 5.12 開始,use VERSION 宣告也會啟用 strict 實用指令。此外,它還會啟用一個功能組合,提供更多有用的功能。自版本 5.36 開始,它還會啟用 warnings 實用指令。通常,一次啟用所有這些功能的最佳方式是從檔案開始

use v5.36;

這樣,每個檔案都會以 strictwarnings 和許多有用的命名功能開啟,同時關閉幾個較舊的功能(例如 indirect)。有關更多資訊,請參閱 "use VERSION" in perlfunc

偵錯

您可以使用偵錯器的 x 指令來傾印出複雜的資料結構。例如,根據上面對 $AoA 的指定,以下是偵錯器輸出

DB<1> x $AoA
$AoA = ARRAY(0x13b5a0)
   0  ARRAY(0x1f0a24)
      0  'fred'
      1  'barney'
      2  'pebbles'
      3  'bambam'
      4  'dino'
   1  ARRAY(0x13b558)
      0  'homer'
      1  'bart'
      2  'marge'
      3  'maggie'
   2  ARRAY(0x13b540)
      0  'george'
      1  'jane'
      2  'elroy'
      3  'judy'

程式碼範例

這裡簡短說明各種資料結構存取的程式碼範例。

陣列的陣列

宣告陣列的陣列

my @AoA = (
       [ "fred", "barney" ],
       [ "george", "jane", "elroy" ],
       [ "homer", "marge", "bart" ],
     );

產生陣列的陣列

# reading from file
while ( <> ) {
    push @AoA, [ split ];
}

# calling a function
for my $i ( 1 .. 10 ) {
    $AoA[$i] = [ somefunc($i) ];
}

# using temp vars
for my $i ( 1 .. 10 ) {
    my @tmp = somefunc($i);
    $AoA[$i] = [ @tmp ];
}

# add to an existing row
push $AoA[0]->@*, "wilma", "betty";

存取和列印陣列的陣列

# one element
$AoA[0][0] = "Fred";

# another element
$AoA[1][1] =~ s/(\w)/\u$1/;

# print the whole thing with refs
for my $aref ( @AoA ) {
    print "\t [ @$aref ],\n";
}

# print the whole thing with indices
for my $i ( 0 .. $#AoA ) {
    print "\t [ $AoA[$i]->@* ],\n";
}

# print the whole thing one at a time
for my $i ( 0 .. $#AoA ) {
    for my $j ( 0 .. $AoA[$i]->$#* ) {
        print "elem at ($i, $j) is $AoA[$i][$j]\n";
    }
}

陣列的雜湊

宣告陣列的雜湊

my %HoA = (
       flintstones        => [ "fred", "barney" ],
       jetsons            => [ "george", "jane", "elroy" ],
       simpsons           => [ "homer", "marge", "bart" ],
     );

產生陣列的雜湊

# reading from file
# flintstones: fred barney wilma dino
while ( <> ) {
    next unless s/^(.*?):\s*//;
    $HoA{$1} = [ split ];
}

# reading from file; more temps
# flintstones: fred barney wilma dino
while ( my $line = <> ) {
    my ($who, $rest) = split /:\s*/, $line, 2;
    my @fields = split ' ', $rest;
    $HoA{$who} = [ @fields ];
}

# calling a function that returns a list
for my $group ( "simpsons", "jetsons", "flintstones" ) {
    $HoA{$group} = [ get_family($group) ];
}

# likewise, but using temps
for my $group ( "simpsons", "jetsons", "flintstones" ) {
    my @members = get_family($group);
    $HoA{$group} = [ @members ];
}

# append new members to an existing family
push $HoA{flintstones}->@*, "wilma", "betty";

存取和列印陣列的雜湊

# one element
$HoA{flintstones}[0] = "Fred";

# another element
$HoA{simpsons}[1] =~ s/(\w)/\u$1/;

# print the whole thing
foreach my $family ( keys %HoA ) {
    print "$family: $HoA{$family}->@* \n"
}

# print the whole thing with indices
foreach my $family ( keys %HoA ) {
    print "family: ";
    foreach my $i ( 0 .. $HoA{$family}->$#* ) {
        print " $i = $HoA{$family}[$i]";
    }
    print "\n";
}

# print the whole thing sorted by number of members
foreach my $family ( sort { $HoA{$b}->@* <=> $HoA{$a}->@* } keys %HoA ) {
    print "$family: $HoA{$family}->@* \n"
}

# print the whole thing sorted by number of members and name
foreach my $family ( sort {
                           $HoA{$b}->@* <=> $HoA{$a}->@*
                                         ||
                                     $a cmp $b
           } keys %HoA )
{
    print "$family: ", join(", ", sort $HoA{$family}->@* ), "\n";
}

雜湊的陣列

宣告雜湊的陣列

my @AoH = (
       {
           Lead     => "fred",
           Friend   => "barney",
       },
       {
           Lead     => "george",
           Wife     => "jane",
           Son      => "elroy",
       },
       {
           Lead     => "homer",
           Wife     => "marge",
           Son      => "bart",
       }
 );

產生雜湊的陣列

# reading from file
# format: LEAD=fred FRIEND=barney
while ( <> ) {
    my $rec = {};
    for my $field ( split ) {
        my ($key, $value) = split /=/, $field;
        $rec->{$key} = $value;
    }
    push @AoH, $rec;
}


# reading from file
# format: LEAD=fred FRIEND=barney
# no temp
while ( <> ) {
    push @AoH, { split /[\s+=]/ };
}

# calling a function  that returns a key/value pair list, like
# "lead","fred","daughter","pebbles"
while ( my %fields = getnextpairset() ) {
    push @AoH, { %fields };
}

# likewise, but using no temp vars
while (<>) {
    push @AoH, { parsepairs($_) };
}

# add key/value to an element
$AoH[0]{pet} = "dino";
$AoH[2]{pet} = "santa's little helper";

存取和列印雜湊的陣列

# one element
$AoH[0]{lead} = "fred";

# another element
$AoH[1]{lead} =~ s/(\w)/\u$1/;

# print the whole thing with refs
for my $href ( @AoH ) {
    print "{ ";
    for my $role ( keys %$href ) {
        print "$role=$href->{$role} ";
    }
    print "}\n";
}

# print the whole thing with indices
for my $i ( 0 .. $#AoH ) {
    print "$i is { ";
    for my $role ( keys $AoH[$i]->%* ) {
        print "$role=$AoH[$i]{$role} ";
    }
    print "}\n";
}

# print the whole thing one at a time
for my $i ( 0 .. $#AoH ) {
    for my $role ( keys $AoH[$i]->%* ) {
        print "elem at ($i, $role) is $AoH[$i]{$role}\n";
    }
}

HASHES OF HASHES

HASH OF HASHES 的宣告

my %HoH = (
       flintstones => {
               lead      => "fred",
               pal       => "barney",
       },
       jetsons     => {
               lead      => "george",
               wife      => "jane",
               "his boy" => "elroy",
       },
       simpsons    => {
               lead      => "homer",
               wife      => "marge",
               kid       => "bart",
       },
);

HASH OF HASHES 的產生

# reading from file
# flintstones: lead=fred pal=barney wife=wilma pet=dino
while ( <> ) {
    next unless s/^(.*?):\s*//;
    my $who = $1;
    for my $field ( split ) {
        my ($key, $value) = split /=/, $field;
        $HoH{$who}{$key} = $value;
    }
}


# reading from file; more temps
while ( <> ) {
    next unless s/^(.*?):\s*//;
    my $who = $1;
    my $rec = {};
    $HoH{$who} = $rec;
    for my $field ( split ) {
        my ($key, $value) = split /=/, $field;
        $rec->{$key} = $value;
    }
}

# calling a function  that returns a key,value hash
for my $group ( "simpsons", "jetsons", "flintstones" ) {
    $HoH{$group} = { get_family($group) };
}

# likewise, but using temps
for my $group ( "simpsons", "jetsons", "flintstones" ) {
    my %members = get_family($group);
    $HoH{$group} = { %members };
}

# append new members to an existing family
my %new_folks = (
    wife => "wilma",
    pet  => "dino",
);

for my $what (keys %new_folks) {
    $HoH{flintstones}{$what} = $new_folks{$what};
}

HASH OF HASHES 的存取與列印

# one element
$HoH{flintstones}{wife} = "wilma";

# another element
$HoH{simpsons}{lead} =~ s/(\w)/\u$1/;

# print the whole thing
foreach my $family ( keys %HoH ) {
    print "$family: { ";
    for my $role ( keys $HoH{$family}->%* ) {
        print "$role=$HoH{$family}{$role} ";
    }
    print "}\n";
}

# print the whole thing  somewhat sorted
foreach my $family ( sort keys %HoH ) {
    print "$family: { ";
    for my $role ( sort keys $HoH{$family}->%* ) {
        print "$role=$HoH{$family}{$role} ";
    }
    print "}\n";
}


# print the whole thing sorted by number of members
foreach my $family ( sort { $HoH{$b}->%* <=> $HoH{$a}->%* } keys %HoH ) {
    print "$family: { ";
    for my $role ( sort keys $HoH{$family}->%* ) {
        print "$role=$HoH{$family}{$role} ";
    }
    print "}\n";
}

# establish a sort order (rank) for each role
my $i = 0;
my %rank;
for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }

# now print the whole thing sorted by number of members
foreach my $family ( sort { $HoH{$b}->%* <=> $HoH{$a}->%* } keys %HoH ) {
    print "$family: { ";
    # and print these according to rank order
    for my $role ( sort { $rank{$a} <=> $rank{$b} }
                                              keys $HoH{$family}->%* )
    {
        print "$role=$HoH{$family}{$role} ";
    }
    print "}\n";
}

更精細的記錄

更精細記錄的宣告

以下是建立和使用欄位為不同類型的記錄的範例

my $rec = {
    TEXT      => $string,
    SEQUENCE  => [ @old_values ],
    LOOKUP    => { %some_table },
    THATCODE  => \&some_function,
    THISCODE  => sub { $_[0] ** $_[1] },
    HANDLE    => \*STDOUT,
};

print $rec->{TEXT};

print $rec->{SEQUENCE}[0];
my $last = pop $rec->{SEQUENCE}->@*;

print $rec->{LOOKUP}{"key"};
my ($first_k, $first_v) = each $rec->{LOOKUP}->%*;

my $answer = $rec->{THATCODE}->($arg);
$answer = $rec->{THISCODE}->($arg1, $arg2);

# careful of extra block braces on fh ref
print { $rec->{HANDLE} } "a string\n";

use FileHandle;
$rec->{HANDLE}->autoflush(1);
$rec->{HANDLE}->print(" a string\n");

HASH OF COMPLEX RECORDS 的宣告

my %TV = (
   flintstones => {
       series   => "flintstones",
       nights   => [ qw(monday thursday friday) ],
       members  => [
           { name => "fred",    role => "lead", age  => 36, },
           { name => "wilma",   role => "wife", age  => 31, },
           { name => "pebbles", role => "kid",  age  =>  4, },
       ],
   },

   jetsons     => {
       series   => "jetsons",
       nights   => [ qw(wednesday saturday) ],
       members  => [
           { name => "george",  role => "lead", age  => 41, },
           { name => "jane",    role => "wife", age  => 39, },
           { name => "elroy",   role => "kid",  age  =>  9, },
       ],
    },

   simpsons    => {
       series   => "simpsons",
       nights   => [ qw(monday) ],
       members  => [
           { name => "homer", role => "lead", age  => 34, },
           { name => "marge", role => "wife", age => 37, },
           { name => "bart",  role => "kid",  age  =>  11, },
       ],
    },
 );

HASH OF COMPLEX RECORDS 的產生

# reading from file
# this is most easily done by having the file itself be
# in the raw data format as shown above.  perl is happy
# to parse complex data structures if declared as data, so
# sometimes it's easiest to do that

# here's a piece by piece build up
my $rec = {};
$rec->{series} = "flintstones";
$rec->{nights} = [ find_days() ];

my @members = ();
# assume this file in field=value syntax
while (<>) {
    my %fields = split /[\s=]+/;
    push @members, { %fields };
}
$rec->{members} = [ @members ];

# now remember the whole thing
$TV{ $rec->{series} } = $rec;

###########################################################
# now, you might want to make interesting extra fields that
# include pointers back into the same data structure so if
# change one piece, it changes everywhere, like for example
# if you wanted a {kids} field that was a reference
# to an array of the kids' records without having duplicate
# records and thus update problems.
###########################################################
foreach my $family (keys %TV) {
    my $rec = $TV{$family}; # temp pointer
    my @kids = ();
    for my $person ( $rec->{members}->@* ) {
        if ($person->{role} =~ /kid|son|daughter/) {
            push @kids, $person;
        }
    }
    # REMEMBER: $rec and $TV{$family} point to same data!!
    $rec->{kids} = [ @kids ];
}

# you copied the array, but the array itself contains pointers
# to uncopied objects. this means that if you make bart get
# older via

$TV{simpsons}{kids}[0]{age}++;

# then this would also change in
print $TV{simpsons}{members}[2]{age};

# because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
# both point to the same underlying anonymous hash table

# print the whole thing
foreach my $family ( keys %TV ) {
    print "the $family";
    print " is on during $TV{$family}{nights}->@*\n";
    print "its members are:\n";
    for my $who ( $TV{$family}{members}->@* ) {
        print " $who->{name} ($who->{role}), age $who->{age}\n";
    }
    print "it turns out that $TV{$family}{lead} has ";
    print scalar ( $TV{$family}{kids}->@* ), " kids named ";
    print join (", ", map { $_->{name} } $TV{$family}{kids}->@* );
    print "\n";
}

資料庫連結

您無法輕易地將多層資料結構 (例如 HASH OF HASHES) 連結到 dbm 檔案。第一個問題是除了 GDBM 和 Berkeley DB 以外,其他都有大小限制,此外,您還會有如何將參照表示在磁碟上的問題。一個部分嘗試解決這個需求的實驗性模組是 MLDBM 模組。請依據 perlmodlib 中的說明,查看您最近的 CPAN 網站以取得 MLDBM 的原始碼。

另請參閱

perlrefperllolperldataperlobj

作者

Tom Christiansen <tchrist@perl.com>