BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Département de mathématiques et applications - ECPv6.2.2//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Département de mathématiques et applications
X-ORIGINAL-URL:https://www.math.ens.psl.eu
X-WR-CALDESC:évènements pour Département de mathématiques et applications
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20260329T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20261025T010000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=Europe/Paris:20260602T120000
DTEND;TZID=Europe/Paris:20260602T130000
DTSTAMP:20260602T005723
CREATED:20260601T101208Z
LAST-MODIFIED:20260601T101650Z
UID:21779-1780401600-1780405200@www.math.ens.psl.eu
SUMMARY:ENS-Data Science colloquium - Rebecca Willett : How do simple rotations affect the implicit bias of Adam?
DESCRIPTION:Adaptive gradient methods such as Adam and Adagrad are widely used in = machine learning\, yet their effect on the generalization of learned = models =E2=80=93 relative to methods like gradient descent =E2=80=93 = remains poorly understood. Prior work on binary classification suggests = that Adam exhibits a =E2=80=9Crichness bias\,=E2=80=9D which can help it = learn nonlinear decision boundaries closer to the Bayes-optimal decision = boundary relative to gradient descent. However\, the coordinate-wise = preconditioning scheme employed by Adam renders the overall method = sensitive to orthogonal transformations of feature space. We show that = this sensitivity can manifest as a reversal of Adam=E2=80=99s = competitive advantage: even small rotations of the underlying data = distribution can make Adam forfeit its richness bias and converge to a = linear decision boundary that is farther from the Bayes-optimal decision = boundary than the one learned by gradient descent. To alleviate this = issue\, we show that a recently proposed reparameterization method =E2=80=93= which applies an orthogonal transformation to the optimization = objective =E2=80=93 endows any first-order method with equivariance to = data rotations\, and we empirically demonstrate its ability to restore = Adam=E2=80=99s bias towards rich decision boundaries. This is joint work = with Adela DePavia and Vasileios Charisopoulos. \n\n\n\n  \n\n\n\nThese seminars are being made possible through the support of the CFM-ENS Chair « Modèles et Sciences des Données ». \n\n\n\nThe organizers: Giulio Biroli\, Alex Cayco Gajic\, Bruno Loureiro\, Stéphane Mallat\, Gabriel Peyré.
URL:https://www.math.ens.psl.eu/evenement/ens-data-science-colloquium-rebecca-willett-how-do-simple-rotations-affect-the-implicit-bias-of-adam/
LOCATION:Amphi Jean Jaurès\, 29 rue d'Ulm\, PARIS\, 75005\, France
CATEGORIES:ENS-Data Science colloquium
END:VEVENT
END:VCALENDAR