hmm example

原始文档

the plane can fly . the typical plane can see the plane . a typical fly can see . who might see ? the large can might see a can . the can can destroy a large can . who might see ? who might fly ? who can fly ? the can might see . the plane can fly a typical fly . who can fly ? the

分句

1
./../sentsplit.pl example0.train example0.sentences

the plane can fly .
the typical plane can see the plane .
a typical fly can see .
who might see ?
the large can might see a can .
the can can destroy a large can .
who might see ?

标注数据

the/at plane/nn can/md fly/vb ./.
the/at typical/jj plane/nn can/md see/vb the/at plane/nn ./.
a/at typical/jj fly/nn can/md see/vb ./.
who/wps might/md see/vb ?/.
the/at large/jj can/nn might/md see/vb a/at can/nn ./.
the/at can/nn can/md destroy/vb a/at large/jj can/nn ./.
who/wps might/md see/vb ?/.

数据编号

1
./../create_key.pl words.key < example0.sentences > example0.seq

单词

1 the
6 typical
3 can
8 a
9 who
13 destroy
7 see
2 plane
11 ?
10 might
5 .
12 large
4 fly

标注

6 jj
7 wps
2 nn
3 md
1 at
4 vb
5 .

词频统计

1
./../pretrain.pl example0.all lex ngram

词型及其词性标记的组合在训练集中出现的次数

plane nn 34
a at 58
see vb 45
? . 57
typical jj 25
large jj 22
destroy vb 9
can md 58
might md 42
can nn 39
fly nn 20
who wps 57
fly vb 46
. . 43
the at 35

一元词性及二元词性在训练集中的出现次数

md 100
wps 57
at 93
. 100
nn 93
vb 100
jj 47
vb . 50
wps md 57
at jj 47
nn . 50
nn md 43
vb at 50
at nn 46
md vb 100
jj nn 47

模型训练

1
./../hmmtrain.pl words.key pos.key ngram lex example.hmm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
M= 13
N= 7
A:
0.001002 0.505874 0.001150 0.001003 0.001004 0.001000 0.494967
0.001000 0.001000 0.001422 0.001004 0.001003 0.001003 0.999567
0.500420 0.001002 0.001001 0.001001 0.001000 0.500576 0.001000
0.001001 0.001000 0.999848 0.001002 0.001001 0.001003 0.001146
0.001000 0.001003 0.001000 0.999995 0.001000 0.001000 0.001003
0.424812 0.001003 0.001001 0.001000 0.576183 0.001000 0.001002
0.001000 0.001000 0.001003 0.462992 0.001000 0.538002 0.001002
B:
0.376957 0.001000 0.001000 0.001000 0.001002 0.001001 0.001000 0.624018 0.001001 0.001000 0.001021 0.001001 0.001000
0.001006 0.001004 0.001003 0.001002 0.001000 0.532372 0.001000 0.001005 0.001000 0.001001 0.001000 0.468607 0.001000
0.001000 0.001001 0.001001 0.460642 0.001001 0.001000 0.450462 0.001000 0.001000 0.001000 0.001002 0.001000 0.090891
0.001000 0.001000 0.580419 0.001000 0.001001 0.001000 0.001000 0.001000 0.001000 0.420578 0.001001 0.001000 0.001000
0.001004 0.001003 0.001002 0.001003 0.001000 0.001000 0.001000 0.001002 0.999987 0.001000 0.001000 0.001000 0.001000
0.001001 0.001000 0.001001 0.001001 0.430575 0.001000 0.001001 0.001002 0.001000 0.001000 0.570418 0.001000 0.001002
0.001000 0.366299 0.420021 0.215676 0.001000 0.001001 0.001001 0.001000 0.001001 0.001000 0.001000 0.001001 0.001000
pi:
0.999995 0.001005 0.001000 0.001000 0.001000 0.001000 0.001000

标注
测试数据

the can can destroy the typical fly .

编号

T= 8
1 3 3 13 1 6 4 5

预测

1
./../testvit example.hmm example0.test

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
------------------------------------
Viterbi using direct probabilities
Viterbi MLE log prob = -1.223539E+01
Optimal state sequence:
T= 8
1 7 4 3 1 2 7 6
------------------------------------
Viterbi using log probabilities
Viterbi MLE log prob = -1.223539E+01
Optimal state sequence:
T= 8
1 7 4 3 1 2 7 6
------------------------------------
The two log probabilites and optimal state sequences
should identical (within numerical precision).

the/at can/wps can/vb destroy/md the/at typical/nn fly/wps ./jj

对于无标注:

数据编号

1
./../create_key.pl words.key < example0.sentences > example0.seq

单词

1 the
6 typical
3 can
8 a
9 who
13 destroy
7 see
2 plane
11 ?
10 might
5 .
12 large
4 fly

编号后的文档

1
2
T= 590
1 2 3 4 5 1 6 2 3 7 1 2 5 8 6 4 3 7 5 9 10 7 11 1 12 3 10 7 8 3 5 1 3 3 13 8 12 3 5 9 10 7 11 9 10 4 11 9 3 4 11 1 3 10 7 5 1 2 3 4 8 6 4 5 9 3 4 11 1 12 4 3 4 5 9 3 7 11 9 3 7 8 3 11 1 2 3 7 1 6 3 5 9 3 7 11 8 2 3 7 5 9 3 7 8 12 2 11 9 10 13 8 6 3 11 9 3 7 11 9 10 7 11 9 10 4 11 9 10 7 8 4 11 1 2 3 4 1 2 5 1 6 2 10 4 8 2 5 9 10 4 8 12 2 11 9 3 4 8 12 4 11 9 10 4 11 9 3 7 8 4 11 9 3 4 11 1 2 10 4 8 2 5 9 10 7 1 3 11 8 12 3 10 4 5 1 2 3 7 8 12 4 5 9 3 13 8 12 4 11 9 3 7 8 2 11 8 12 2 3 4 5 9 10 7 11 9 3 4 8 3 11 8 12 3 10 7 5 8 6 3 3 7 1 3 5 9 3 13 8 6 3 11 9 3 4 11 8 6 3 3 4 1 12 2 5 8 4 3 4 8 2 5 9 3 4 8 2 11 9 10 13 1 3 11 1 6 2 3 4 8 12 2 5 1 6 4 3 7 1 12 3 5 9 10 4 8 2 11 9 10 4 11 9 3 7 8 12 4 11 1 6 4 3 13 8 12 2 5 9 3 4 8 3 11 8 6 3 3 7 5 8 6 4 10 4 5 9 3 7 1 2 11 9 3 4 1 12 2 11 1 4 10 4 8 6 3 5 9 3 7 8 2 11 9 10 7 8 4 11 8 3 10 4 5 9 3 7 11 9 10 7 11 9 10 7 11 8 2 3 4 5 9 10 4 8 3 11 8 12 3 10 7 5 9 10 7 11 8 12 3 3 13 8 3 5 8 6 3 10 7 1 3 5 9 10 7 11 1 6 3 3 4 5 9 10 7 8 6 4 11 1 6 4 3 4 5 9 3 4 11 8 4 3 7 8 6 3 5 8 2 10 4 5 9 10 7 11 8 6 3 10 4 8 2 5 9 3 7 1 3 11 8 12 3 3 7 5 9 3 4 8 6 3 11 9 10 4 11 9 10 4 11 9 3 4 8 6 3 11 9 10 4 8 12 2 11 9 3 4 11 9 10 4 1 6 3 11 1 3 3 13 1 6 4 5 9 3 7 11 8 2 3 13 1 3 5 9 10 4 11 9 3 7 11 1 12 2 3 7 5 8 4 3 7 5 8 2 10 4 5 8 3 10 7 5 9 3 4 11

训练

1
./../esthmm -N 7 -M 13 example0.seq > example0.hmm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
M= 13
N= 7
A:
0.001001 0.001041 0.001002 0.001001 0.001002 0.001001 0.999951
0.001001 0.001003 0.576307 0.001004 0.001001 0.001002 0.424681
0.001000 0.001000 0.001000 0.999991 0.001000 0.001008 0.001000
0.093382 0.002156 0.001001 0.003694 0.903712 0.001055 0.001001
0.001004 0.554405 0.001002 0.001018 0.001003 0.001001 0.446567
0.001256 0.354273 0.001008 0.307920 0.007258 0.333280 0.001005
0.001000 0.001000 0.001004 0.001011 0.001225 0.999749 0.001010
B:
0.001000 0.001000 0.001002 0.021460 0.001457 0.001000 0.019199 0.001000 0.001000 0.001001 0.001343 0.001000 0.960537
0.001002 0.001002 0.001004 0.001024 0.430486 0.001000 0.001025 0.001003 0.001000 0.001001 0.570408 0.001000 0.001046
0.001006 0.001006 0.001008 0.001009 0.001000 0.001000 0.001000 0.001003 0.999967 0.001000 0.001000 0.001000 0.001000
0.001000 0.001005 0.580741 0.001008 0.001001 0.001002 0.001000 0.001001 0.001000 0.420237 0.001001 0.001004 0.001000
0.001000 0.001012 0.001017 0.508569 0.001153 0.001000 0.491349 0.001000 0.001000 0.001002 0.001117 0.001000 0.001782
0.001001 0.244775 0.280068 0.140141 0.001001 0.180252 0.001011 0.001001 0.001000 0.001008 0.001001 0.158740 0.001000
0.376952 0.001000 0.001001 0.001000 0.001001 0.001006 0.001000 0.624012 0.001003 0.001001 0.001017 0.001006 0.001000
pi:
0.001000 0.001000 0.001000 0.001000 0.001000 0.001001 0.999999

测试

1
he can can destroy the typical fly .

1
2
T= 8
1 3 3 13 1 6 4 5
1
./../testvit example0.hmm example0.test
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Viterbi using direct probabilities
Viterbi MLE log prob = -1.401504E+01
Optimal state sequence:
T= 8
7 6 4 1 7 6 6 2
------------------------------------
Viterbi using log probabilities
Viterbi MLE log prob = -1.401504E+01
Optimal state sequence:
T= 8
7 6 4 1 7 6 6 2
------------------------------------
The two log probabilites and optimal state sequences
should identical (within numerical precision).
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×