RL & LLMs

Cohere

Biography

I work on RL, LLMs, and their interactions at Cohere. Previously, I was a research scientist at InstaDeep, where I focused on using transformers for combinatorial optimization and discrete problems. I hold a Ph.D. in reinforcement learning for combinatorial optimization from Inria/CNRS, where I was part of the SequeL/ScooL team under the supervision of P. Preux.

Interests

Reinforcement Learning
Large Language Models
Combinatorial Optimization

Experience & Education

Technical Staff, 2024-2023
Cohere
Research Scientist, 2023-2024
Instadeep
PhD Student, 2019-2023
Inria Lille, SequeL/ScooL team

Publications

Quickly discover relevant content by filtering publications.

Combinatorial Optimization with Policy Adaptation using Latent Space Search

Felix Chalumeau, Shikha Surana, Clément Bonnet, Nathan Grinsztajn, Arnu Pretorius, Alexandre Laterre, Thomas D. Barrett

Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization

Nathan Grinsztajn, Daniel Furelos-Blanco, Shikha Surana, Clément Bonnet, Thomas D. Barrett

More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences

Nathan Grinsztajn, Toby Johnstone, Johan Ferret, Philippe Preux

MetaREVEAL: RL-based Meta-learning from Learning Curves

Manh Hung Nguyen, Nathan Grinsztajn, Lisheng Sun-Hosoya, Isabelle Guyon

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

Nathan Grinsztajn, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

See all publications

Experience

Member of Technical Staff

Cohere

Apr 2024 – Present Paris, France

RL for LLMs.

Research Scientist

InstaDeep

Apr 2023 – Apr 2024 London, UK

RL and transformers for combinatorial optimization.

Research Intern

InstaDeep

Apr 2022 – Oct 2022 London, UK

RL for combinatorial optimization, under the supervision of Thomas D. Barrett. Led to: Population-Based Reinforcement Learning for Combinatorial Optimization.

PhD Student

Inria

Oct 2019 – Apr 2023 Lille, France

Reinforcement learning for combinatorial optimization, graph representation. Under the supervision of P. Preux.

Graduate Research Intern

UC Berkeley

Apr 2018 – Aug 2018 California

Machine learning and statistics to study biological scRNA-seq data. Under the supervision of S. Dudoit.

Blockchain Developer (intern)

BitSpread Ltd

Jun 2017 – Nov 2017 London, UK

Developed Ethereum smart-contracts to create a decentralized investment fund.