Assessing Generalization in Deep Reinforcement Learning