on the Pareto frontier is reached by considering a single objective Phase 2: Exploration • Improvement step: move the solution towards one objective at a time • Correction step: improvement may lead to a point outside the frontier Correction moves the point again on the frontier Parisi, S., Pirotta, M., Smacchia, N., Bascetta, L., & Restelli, M. Policy gradient approaches for multi-objective sequential decision making. IJCNN 2014