------------
テストデータ
------------

下記は,デモに用いる PAUP* のテストデータ(霊長類ミトコンドリアDNAの898塩基対)です.PAUP* のデータフォーマットは「NEXUS形式」と呼ばれており,PAUP* だけに限らず,MacClade, PHYLIPを含む,主要な系統解析ソフトウェアにそのまま(あるいはわずかな変換で)利用できる汎用のフォーマットです.

冒頭の「#NEXUS」が,NEXUS形式のデータファイルであることを宣言しており,その後にタクソン(taxaブロック),形質データ(charactersブロック),仮定(assumptionsブロック),集合(setsブロック),そして演算(paupブロック)が続きます.
......................................................................
#NEXUS

[!
Adapted from "primate-mtDNA.nex" of the PAUP* sample data file folder

Data from:
Hayasaka, K., T. Gojobori, and S. Horai. 1988. Molecular phylogeny
and evolution of primate mitochondrial DNA. Mol. Biol. Evol.
5:626-644.
]

taxaブロックは,データ行列のOTUの数(ntax)とその名称(taxlabels)を指定します.

[!----------------------------------------------------------------------]
begin taxa;
dimensions ntax=10;
taxlabels
Lemur_catta
Homo_sapiens
Pan
Gorilla
Pongo
Hylobates
Macaca_fuscata
M._mulatta
Saimiri_sciureus
Tarsius_syrichta
;
end;
[!----------------------------------------------------------------------]

charactersブロックは,形質数(nchar),形質タイプ(ここでは dna),表記法(欠失,一致,不明など)を指定した上で,タクソンごとに形質データを列挙します.

[!----------------------------------------------------------------------]
begin characters;
dimensions nchar=898;
format missing=? gap=- matchchar=. datatype=dna;
options gapmode=missing;
matrix

Homo_sapiens ........CC..C...GT......C........C..C..G........C..AT..C.......C.....A.....A...........C.CT..C..T..C.TC...A.C..CT.T.A...AC.T.....TC.A.....A..A.....TTTT........T..A..A.....CG.......C...T.....C..A....T..CC..C.G..A.....C.CT..GCTAG..A.CA.G.....C........T..C..TC.CC.AC.T.C...AC.C..C..A..AG....A..C..A..C........C.....T..C..A......T.G..CTC......CC.C..CATTA.....A....A..C..A..C.................C..G.T.......C..TC....A.......CC.AT..C.C......G.C..C...ACC..GTTTT.C.CT................CC......TC............TGAC..C...G...T.-CGA.CC.................CT-CAC.A.............ATGC.C..A.G.C...C..C.T....T...................AC..CT.........T......CC.......TT............................CCA.GCACA.TA..A.A...ACCC.A..C....C.TC.....T.CCC..C...C...CC.C.C.CGTT...CCTA.......AA..T.A..CC.C.AT..T........CAT.G.C....CCA....T..T.TC..T...T....C..A.C.A....C..G.G.CT..AC......GTT..T..C..G........C...GCC....C......AACCC.G..C..CC.A.....

Pan ........CC..C....TT..C..C........C..C..A........C..AT..........C.....A.....A..T..T.....C.C...C..T..C.TC...A.T..CT.C.A...AC.T.....TC.A.....A..A......TTT........C..A..A.....CG.......C.........C.TA.C..T...C.CC.A..G.....C.C...GCTAG..A.C..A.....C........T.CC..TC.CC.AC.C.C...A..C..C..A..A.....A..C..G..C........C..G..T..C..A......T....CTC......CC.C..CATTA.T...A....G..C..A..C...........T..T..C....TTT.....C..TC....A.C.....TC.AT..C.C..T..TG.T..C..CACT...TT...C.C.................CC......TC............TGAC..C...G.....-CGA.CC.................CT-.A..A..........T..ATAT.C..A.GCC.G.C..C.T....T...................AC..CC....G....T......CC.......TT............................CCA.G.ATA.TA.CA.A...ACCT.A..C..A.C.CC.T...T.C.C..C...C.C.CC.C.C.C.TT...CCTA.......AA..T.A...C.C.AT..T..G.....CAT...C..G.CCA....T....T......TT....C..A.C.A....C..A.G.CT..AC......G.T..T..C..A.....G..C...GC.....C......AACCC.G..C..CC.A.....

Gorilla ........CC..C...GTTG....T.....T..C..C..A...........AT..........C.....A.....A.............C...C.....C.TC...A.T..CT.T.A...AC........C.A.....A..A......CTT........T..G..A.....CG.C.....C...T.....C..A.C..T..CC..C.A..A..G..C.C...ACTAG..A.CA.A.....C..........CC..CC..T.AC.T.C...A.CT..C..A..A..T..A..C..G..C.....T........T..C..A......T....CCC.......C.C..CATCAC....A....A..C..A................T...C....T...G...C..TC....A.C.....CC.AT..C.C......G.T.....CACC..GTT...C.C.................CC......TC............TGAT..C...G.....-CAA.CC.................CT-C...A..G..........ATAC.C....GCT.G.C..C.T....T...................AC..CT.........T.......C.......TT............................C.A.G.ACG.TA.CA.A...ACCT.AG.C..A.C.TC.T...T.CCC......C...CC.C.T.C.TC..TCCTA.......AAG.T.A..CC.C.AT............ATCG.C....CCA....T....TC......T....C..A.C.A.....C.A.G.CT..AC......G.T..T..C..A.G......C...GC.....C......AA.TC....C..CC.A.....

Pongo ........CC..C.......CC..C..G..T..C.....A..C.....C...C..C.G.....C.....A.....A.............C...C.....C.TC...A.C..CT.T.A...CC.T.....TC.A.....C..A......CT.........T..A..A.....C........T........AC..A.C.....CC.TC.A..A.....C.C...ACTA.....CATA.....T.....T.....C..C..CC.AC...C...AC.C..C..A..A.....AA.C..A..C..T...........C..C..A...........T.C..C....C.C..CATCA.....A....A..T..T..C.....C.....T.....C..GCT.......C..TC....A.C.....C..AT..C.C.......GC..C..CGCT..GTT.G.C...................CC......T.............T.AT.....GGC.C..-CAA.CC.................CT-CAC.A..............CACT-..A.G.G.G.C..C.T....T.....G.............AC..CT....C....T.......T.......TT.........................C.GCCA.G..TA..A.CA.A..TGCCC....CT.A.C.TC.....TCCCC..C..T.CCGCT.C.C.C.TT...CCCA.......AA....A..CC.C.AC..T......A.GG.C..C....CC.....T.CT.TC.....T........A.C.A.........G.CT............C...G.CA.A......TGC....C...C.C...G..A...C....C..AC.A.....

Hylobates ......T.C...T......G.C..C........C..C..A..A..C..T...C.GC.......C..T..A.....A.............CT..C.....C.TC...A.C...T.T......C.....G..T.A.....A..G......TT.........CGCA..A.....CG.......C.........C..A....T..CC.CC.A..T.....C.....ACTA..G..C........C..GG......C...T....CAC.C.CC..GC.C..CG.A..A.....G..C..A..C.....T..C.....T.TC.TA...........C.C...T...C.C..CATTA.A...A....A..C..AC.C.............TAT.A...CTT..G...C.CT.......C.....CC.A...C.C.....T..C..C...ACT..CTTT..TCC.......C.........TC......T.............T.AC......G....G-.AA.C....GC............CC-CAC.A.............ACT.TC..A.G...G.C..C.T....T...................AC..CT.........T.......C.......TT...........................GCAA.G.ACA..A.CA.AG..A..C.A..G..A.CCTC.....T.CCC..C..T.CAGCC.C.C...TT...CCCA.T.....GA..TTA..CC.G.AC.........ATGA.C.....C.C.A....T..A.TC.....AT.T..C..A...A....C..G.G.A...AC........C..T.....A........C....CTG...C......G...G....C..CC.A.....

Macaca_fuscata ......TTCC..C........C..T..G.....T..C..A..C..C..T.......AT.....C........T..A.....T.....C.CT..C.A......C........GT.C.....AC.T....T.C.A..T..A..A.C....TTT......T..ACA..A.....T........T.........C..A........C..C.A..T.....C..T..A.TCGC.A.C..A.....C.....CC.T..C..C.....GC...C...AC.T..C..AT.A..T..G..C.....C..T...C.C.....C.....A.............C.......C....CATAAT....A....G..CC....C..............AT.A....T.......C.CGCT..AA..A....TC.AT..C.C........C..C..C..G..GTTT..C.C.....G...........CT....................T.ACC.....GA....-C.A.C.................ACT-C.C.A.G.........C.ATGTAC.....CC.....T.....T.T...................AC..CT........A.........T......C.............................TCA.GCACA..C.CA...TTA.AACA..C..T..CTC.......C.C..A..TT..GCC.C.C.C.TC...CCT........ACGT..A..CC..GAT.........A.AA.CG.AAT..A...T.......TC......CC.T....A.CT.....C....T....AAC..........C.....GG.G..............TG.C......A...G.C...A.GC.A.....

M._mulatta ......TTCT..C........C..C..G..T..T..C..A..C..C..T.......AT.....C........T..A.....T.....C.CT..C.A......C........GT.C..G..AC.T....T.C.A..T..A..A.C...TTT.......T..ACA..A.....T........T.........C..A.......CC..C.A..T.....C..T..A.TCGCGA.C..A.....C..G..CC.T..C..C.....A....C...A..T..C..A..A..T..G..C.....C......C.C.....C..C..A............GC.......C....CATAAT....A....A..CC....C.............TAT.A....T.......C.CGCT..AA.CA....CC.AT.TC.C........C..C..C..G..GTTT..T.C.....G...........CT......T.............T.ACC.....GA..T.-C.A.C.................ACT-C.CGA.G.........C.ATGTA......CC.....T.....T.T...................AC..CT........A.........T......T.............................TCA.GCACA..C..A...TAA.AACA..C..T..CTC.......C.C..A..TT..GCC.C.C.C.TC...CCT........ACGT..A..CC..GAT.........A.AA.CG.AAT..A...T.......TC......CC.T....A.CT.....C....T....AAC..........C.....GA.G..............T..C......A...G.C...A.AC.A.....

Saimiri_sciureus ........CC..C....TG..C...........T..C..GT....T..G..T..GC.......C.....A.....A..T...........T..C.....A......AC.T.TA.T......C.......AC.AT....G.......G.CT.........C..A..A.AT..CG.......C...........AG....T...C..G.A..A...T..C..ACA.TCG..T.T........T.....C...T....T.....A..C.C...AC.T.....A..A..T..A..A.....C..A..TC....G.ATG.CT.......G.....TCC...T.....CAGCA.CAG...TA....A...ATA..............T..G..A....TT.....T...ACA..AA.C.....CC.......G.G......GG.A...A........T.ACCT.....T.........GCT......T.............T.AT........AAT.-TAA.......A.............G-C.C.A..........T..ATGCTC..AAGAC...C..CTT....TC...................T...T.........T...............C............................---A..CAC.T...CA....T..AA.A.....A....G....CTAGCG..A...C.AGCT.C.C.C.TT...CCTA......G.ACA.TA..CC.GTAC.........CTAG.C..CATC.AC...C....T..C..TAC.T.AT.T.TA.....C......CTTA....C......T....A.....A........C........T.........A.C........CC.A.....

Tarsius_syrichta ...T.....T.....C....C...T.....T..C........C..C..C...C.......T..C.....A..TA.A............G....C..T..A......GC......C..T..CC.T......C.AT.A......GC...A..A........C.........T....C.....G.....T...C.AA.A..T........C..T.....G.C...A.TA......G.A..T.....G...C..C.A..T.........G.....C.T..C.CC..T........C..A........A......C...TC.TA..T............TA..........ATCA....TA.C.TG..CC.T..C..C........T..AT.A...AT.......C..T.T...T.AA.....C.AT.TACC.........G.A...A.....A...T.....................C......T..........G..T.AT.........C....GAT............A........-..C.A.............ATGC....A.......C...GT....T...T-..........................C..T.......A..G....-...............................T....T..A..C.C..TTT.AC.AT....T.CAC..T...T.ACC..AT.T.....T.CAACA..T..A.A...TG...CAC.TG.A..CC.TTAC.........AAC.....C..C..C..A..T..A..A.....AG......TGC.CA.....C.A.A.A..AAT.......T...........................G..T..T..TA.C...T...GCC.A.....
;
end;
[!----------------------------------------------------------------------]

assumptionsブロックでは,形質セット(charset),タクソンセット(taxset),形質変化のステップ行列(usertype)などが指定されます.

[!----------------------------------------------------------------------]
begin assumptions;
charset coding = 2-457 660-896;
charset noncoding = 1 458-659 897-898;

charset 1stpos = 2-457\3 660-896\3;
charset 2ndpos = 3-457\3 661-896\3;
charset 3rdpos = 4-457\3 662-.\3;

exset coding = noncoding;
exset noncoding = coding;

usertype 2_1 = 4 [weights transversions 2 times transitions]
a c g t
[a] . 2 1 2
[c] 2 . 2 1
[g] 1 2 . 2
[t] 2 1 2 .
;
usertype 3_1 = 4 [weights transversions 3 times transitions]
a c g t
[a] . 3 1 3
[c] 3 . 3 1
[g] 1 3 . 3
[t] 3 1 3 .
;

taxset hominoids = Homo_sapiens Pan Gorilla Pongo Hylobates;
end;
[!----------------------------------------------------------------------]

setsブロックでは,形質の分割(partition)が設定されます.ここではILD検定をするためにコード領域/非コード領域の分割を設定しました.

[!----------------------------------------------------------------------]
begin sets;

CharPartition ILDpartition = coding: 2-457 660-896,
noncoding: 1 458-659 897-898;

end;
[!----------------------------------------------------------------------]

最後に,paupブロックは系統樹構築の計算を実行する部分ですが,ここでは樹形に関する制約(constraints)のみを設定しておき,それ以外の計算はコマンドライン(またはメニュー選択)で指定するようにしました.

[!----------------------------------------------------------------------]
begin paup;
constraints ch = ((Homo_sapiens,Pan));
constraints cg = ((Pan,Gorilla));
constraints chg = ((Homo_sapiens,Pan,Gorilla));
end;
[!----------------------------------------------------------------------]
......................................................................

以上で,テストデータの内容の説明はおしまい.

なお,NEXUSフォーマットの詳細については,下記を参照して下さい:

Maddison, D.R., D.L. Swofford, and W.P. Maddison 1997.
NEXUS: An extensible file format for systematic information.
Systematic Biology, Vol.46, No.4, pp.590-621.
http://hydrodictyon.eeb.uconn.edu/systbiol/issues/46_4/vol46_4.html