This repository contains code to run our models, including the supervised baseline, the trained reward model, and the RL fine-tuned policy. You'll need to run this on a machine with an Nvidia GPU.
This project is developed on Ubuntu 16.04 with CUDA 9.0.176. First, clone this project to your local environment. Our pre-trained models are now available online. You ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results