We describe the steps in detail. If any points unclear, please contact authors by emailing [email protected] .
Take an example, use "BOS This is an anonymous Github EOS" to predict "This is an anonymous Github". (see RNN.py)
Pre-training a sequence labeling neural network as policy network using labels yielded by the unsupervised method, Integer Linear Programming.
Start with the pre-trained policy instead of random policy, and take pre-trained language model as reward to fine tune the policy network.