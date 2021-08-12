Join us on Zoom: https://ucsc.zoom.us/j/95375359167?pwd=am1PWjhJSWdxejBGVXBnRFBMMFBIUT09 / Passcode: 878716

Description: A machine learning system continuously observes noisy training annotations and it remains a challenge to perform robust training in such scenarios. Earlier and classical approaches rely on estimation processes to understand the noise rate of the labels and then leverage this knowledge to perform label correction, loss correction, or both, among many other more carefully designed approaches. Recent works have started to propose robust loss functions or metrics that do not require the above estimation. Clear advantages include the easiness in implementation and the robustness to noisy estimates of the parameters.

We propose two robust training methods under synthetic corrupted labels with theoretical guarantees. Our first approach is Robust f-divergence. We show when maximizing a properly defined f-divergence measure with respect to a classifier's predictions and the supervised labels is robust with label noise. With established robustness, this family of f-divergence functions arises as useful metrics for the problem of learning with noisy labels, which do not require noise rate estimation.

Our second approach is Generalized Label Smoothing (GLS), which builds on Label Smoothing (LS) — an arising learning paradigm that uses the positively weighted average of both the hard training labels and uniformly distributed soft labels. We demonstrate that several proposed learning-with-noisy-labels solutions in the literature relate closely to negative label smoothing (NLS), which defines as using a negative weight to combine the hard and soft labels. We unify (positive) LS and NLS into GLS, and provide understandings for the properties of GLS when learning with noisy labels.

Synthetic label noise, though has clean structures which greatly enable statistical analyses, often fails to model the real-world noise patterns. To better understand real-world label noise, we collect human-annotated real-world noisy labels on CIFAR datasets via Amazon Mechanical Turk. We show indeed the real-world noise patterns impose new and outstanding challenges as compared to synthetic label noise, in the view of noise transition vectors, hypothesis testing, and memorization effect of model predictions. These observations require us to rethink the treatment of noisy labels.