Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

GPT2 HybridBlock #1165

Open
carter54 opened this issue Feb 20, 2020 · 4 comments
Open

GPT2 HybridBlock #1165

carter54 opened this issue Feb 20, 2020 · 4 comments
Labels
bug Something isn't working

Comments

@carter54
Copy link

carter54 commented Feb 20, 2020

Description

It seems hybridized gpt2 in V0.9.0 generate different results with previous versions (not as a hybridblock).
I compared the result between the sequence_sampling.py script in v0.9.0
(https://github.com/dmlc/gluon-nlp/blob/v0.9.0/scripts/text_generation/sequence_sampling.py)
and https://github.com/sxjscience/gluonnlp-gpt2/blob/master/sampling_demo.py from @sxjscience (which I originally used)

Error Message

None

To Reproduce

in https://github.com/dmlc/gluon-nlp/blob/v0.9.0/scripts/text_generation/sequence_sampling.py
I used following parameters:

--random-sample 
--lm-model gpt2_117m
--max-length 10
--print-num 5
--beam-size 5
--bos I think this works
--temperature 0.7
--use-top-k 40

in https://github.com/sxjscience/gluonnlp-gpt2/blob/master/sampling_demo.py
I used following parameters:

--model 117M
--num 5
--temperature 0.7
--context I think this works  *(this is input after showing 'Type in the start of the sentence >>> ')

both these two scripts used model, vocab and bpe files downloaded from gluonnlp model lib.
(gpt2_117m_openai_webtext-26416f2e.params, openai_webtext-f917dc78.vocab, openai_webtext_bpe_ranks-396d4d8e.json)

in addition, to reproduce the result, I set random seed by

mx.random.seed(123)
np.random.seed(123)

in both scriptes.

in gluonnlp v0.9.0, I got following results:

Generation Result:
["I think this works for Went'G to hang<|endoftext|>", -6.5630407]
['I think this works for conne in general,<|endoftext|>', -3.8045387]
['I think this works for Ö IcA<|endoftext|>', -12.143288]
['I think this works for savings! After various drops<|endoftext|>', -6.411801]
["I think this works for ö�'s<|endoftext|>", -8.568734]

in @sxjscience script, I got following results:

I think this works pretty well.

What little<|endoftext|>
 I think this works, but it's a bit too<|endoftext|>
 I think this works for a few reasons.

<|endoftext|>
 I think this works for almost every game of Hearthstone.<|endoftext|>
 I think this works, but I think I'm going<|endoftext|>

I believe the result from @sxjscience script make more sense.

I simply printed out the state result from these two lines respectively:

inputs, begin_states = get_initial_input_state(decoder, bos_ids)

https://github.com/sxjscience/gluonnlp-gpt2/blob/5393325d10b449cf7b263336e5dc2a5b647ad346/sampling_demo.py#L134

these are the initial model state after inputing the words ('I think this work' in this case), and there is no random factors during the state generation. However, I get slightly different outputs from these two scripts.

in gluonnlp v0.9.0, I got following results:

bos_ids: [314, 892, 428, 2499]
eos_id: 50256
initial_state:
[
[[[[-1.5114625   2.022678    1.1816475  ... -1.272368   -0.77062
     1.5050724 ]
   [-2.2167938   2.2953174   1.8480954  ... -1.087445   -1.4295176
     1.7449284 ]
   [-2.0344348   2.347716    1.873954   ... -1.2934592  -1.8628635
     2.1281338 ]
   [-2.8145096   2.7656934   2.7020633  ... -3.3035498  -1.870746
     1.4930353 ]]

  [[-0.25974002 -0.7255037  -1.6046232  ... -0.4246013   1.4355485
    -0.09753075]
   [ 0.27772164 -1.0720729  -1.461111   ... -1.9736105   4.003873
     1.0251817 ]
   [-0.35341865 -0.9198358  -2.3204646  ... -0.28003415  4.191904
    -0.43986958]
   [ 1.0401388  -2.4093304  -1.9443365  ... -3.2719274   3.750334
     1.5413616 ]]
   
   ...
   
   [[-1.7475876e-01 -5.8656417e-02  5.7071950e-02 ... -5.0089438e-02    
	3.8783681e-02 -9.8688319e-02]
   [-1.3505528e+00  2.2637476e-01  1.0895698e-01 ... -3.6589733e-01
     5.2980721e-01  2.0042861e-01]
   [-2.1730498e-03 -1.5134688e-01 -4.7372755e-01 ... -4.3929526e-01
     1.3135010e+00 -1.5517814e-01]
   [-1.9702104e-01 -6.0879529e-01 -8.5921329e-01 ...  1.1960924e+00
     7.3815721e-01 -3.5597485e-02]]

  [[ 9.2752598e-02 -1.1831909e-01  1.4589602e-01 ... -1.3848458e-01
    -9.4607342e-03 -1.6617517e-01]
   [-5.6394035e-01 -1.7828876e-01  9.7797394e-02 ... -5.7982695e-01
    -6.9218093e-01 -5.7731813e-01]
   [-2.2833836e-01 -2.0200127e-01 -2.4465438e-02 ... -1.3059924e+00
    -6.9617134e-01 -2.5296259e-01]
   [ 8.5663158e-01  4.6872348e-01 -1.7862669e-01 ... -1.1065848e+00
     1.8462117e+00  4.7560549e-01]]]]

in @sxjscience script, I got following results:

bos_ids: [314, 892, 428, 2499]
eos_id: 50256
initial_state:
[
[[[[-1.5114625   2.022678    1.1816475  ... -1.272368   -0.77062
     1.5050724 ]
   [-2.2167938   2.2953174   1.8480954  ... -1.087445   -1.4295176
     1.7449284 ]
   [-2.0344348   2.347716    1.873954   ... -1.2934592  -1.8628635
     2.1281338 ]
   [-2.8145096   2.7656934   2.7020633  ... -3.3035498  -1.870746
     1.4930353 ]]

  [[-0.25974002 -0.7255037  -1.6046232  ... -0.4246013   1.4355485
    -0.09753075]
   [ 0.27772164 -1.0720729  -1.461111   ... -1.9736105   4.003873
     1.0251817 ]
   [-0.35341865 -0.9198358  -2.3204646  ... -0.28003415  4.191904
    -0.43986958]
   [ 1.0401388  -2.4093304  -1.9443365  ... -3.2719274   3.750334
     1.5413616 ]]
	 
	...
	 
   [[-1.74661696e-01 -5.87162003e-02  5.70323840e-02 ... -5.00452407e-02
     3.88349257e-02 -9.85485613e-02]
   [-1.35118222e+00  2.26310208e-01  1.08217351e-01 ... -3.66161257e-01
     5.30540764e-01  2.01224759e-01]
   [-1.97911495e-03 -1.51461348e-01 -4.74250615e-01 ... -4.38878387e-01
     1.31352234e+00 -1.54606119e-01]
   [-1.97200328e-01 -6.08567238e-01 -8.60525846e-01 ...  1.19605160e+00
     7.36668587e-01 -3.49551775e-02]]

  [[ 9.26816314e-02 -1.18253030e-01  1.45903692e-01 ... -1.38523147e-01
    -9.41133685e-03 -1.66166440e-01]
   [-5.63177526e-01 -1.77708015e-01  9.64018106e-02 ... -5.80199003e-01
    -6.92616522e-01 -5.75809360e-01]
   [-2.29232520e-01 -2.01344401e-01 -2.51663439e-02 ... -1.30508542e+00
    -6.96419179e-01 -2.53013432e-01]
   [ 8.56036127e-01  4.69387531e-01 -1.79144084e-01 ... -1.10490561e+00
     1.84545624e+00  4.75354373e-01]]]]

It can be noticed that the first two list items in the initial state list are the same between two results, while the last two are diverse.

Any reason to cause this?

Environment

----------pip list----------
certifi 2019.11.28
chardet 3.0.4
Cython 0.29.15
gluonnlp 0.9.0
graphviz 0.8.4
idna 2.8
mxnet 1.6.0b20200128
numpy 1.18.1
packaging 20.1
pip 19.0.3
pyparsing 2.4.6
regex 2020.2.18
requests 2.22.0
setuptools 40.8.0
six 1.14.0
urllib3 1.25.8

----------Python Info----------
Version : 3.7.3
Compiler : GCC 7.3.0
Build : ('default', 'Mar 27 2019 22:11:17')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 19.0.3
Directory : /extend_disk1/gluonnlp/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version : 1.6.0
Directory : /extend_disk1/gluonnlp/lib/python3.7/site-packages/mxnet
Num GPUs : 0
Commit Hash : a15e1b900f3f6e1fbdce62176ab3d8c806a1c2bf
----------System Info----------
Platform : Linux-4.15.0-65-generic-x86_64-with-debian-buster-sid
system : Linux
node : test3
release : 4.15.0-65-generic
version : #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz
Stepping: 7
CPU MHz: 3000.000
BogoMIPS: 6000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 30976K
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear flush_l1d arch_capabilities

@carter54 carter54 added the bug Something isn't working label Feb 20, 2020
@carter54
Copy link
Author

If I unset --use-top-k, the gluonnlp result make sense.

['I think this works pretty well - the only thing that<|endoftext|>', -14.89901]
["I think this works pretty well, but it doesn't<|endoftext|>", -8.94348]
["I think this works pretty well. It's not a<|endoftext|>", -8.87783]
['I think this works pretty well. I have a small<|endoftext|>', -12.630016]
['I think this works pretty well.\n\nIf you<|endoftext|>', -7.6234455]

In addition,

tokens = bos_tokens + generated_tokens[1:]

generated_tokens[1:]

why does generated_token idx start from 1, this will delete the first output token in the result, in this case, 'pretty' will be deleted, the results became:

['I think this works well - the only thing that<|endoftext|>', -14.89901]
["I think this works well, but it doesn't<|endoftext|>", -8.94348]
["I think this works well. It's not a<|endoftext|>", -8.87783]
['I think this works well. I have a small<|endoftext|>', -12.630016]
['I think this works well.\n\nIf you<|endoftext|>', -7.6234455]

@leezu
Copy link
Contributor

leezu commented Feb 26, 2020

Please test with ebfc920 and the previous commit to check if that PR introduced a regression in --use-top-k

@haven-jeon
Copy link
Member

haven-jeon commented Mar 1, 2020

As far as I know, the big difference between the previous version and the current version is that it uses fused GELU. Perhaps because of this operator, the results look somewhat different. At the application level, however, the results are not expected to be very different but more fast.

@kaonashi-tyc
Copy link

I also saw a difference happens to RoBERTa model as well, as comparing to fairseq releases.

might be related issue: #1183

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants