Many statistical methods have been recently developed for finding behaviorally relevant signals in neural population recordings. However, most existing techniques compromise between flexibility and interpretability. While simple ad hoc models are likely to distort defining features in the data, flexible models such as artificial neural networks are difficult to interpret. We developed a flexible yet intrinsically interpretable framework for discovering neural population dynamics from data. Our framework simultaneously learns the dynamics and their nonlinear embedding in the neural activity space without rigid parametric assumptions. Taking advantage of the intrinsic interpretability, we show that good data prediction does not guarantee the correct interpretation of flexible models and propose an alternative model selection strategy to prioritize correct interpretation. We applied this framework to neural activity recorded from the primate cortex during decision making. We discovered that decision-related dynamics were inconsistent with simple hypotheses proposed previously and instead agreed with an attractor network mechanism. Our results reveal a distinction between predictions and interpretation and show that a flexible approach can discover new hypotheses from data.