Class Imbalance Problem of One-Stage Detector
- A much larger set of candidate object locations is regularly sampled across an image (~100k locations), which densely cover spatial positions, scales and aspect ratios.
- The training procedure is still dominated by easily classified background examples. It is typically addressed via bootstrapping or hard example mining. But they are not efficient enough.
alpha-Balanced CE Loss
CE(pt)=−αttilog(pt)
- To address the class imbalance, one method is to add a weighting factor α for class 1 and 1−α for class -1. α may be set by inverse class frequency or treated as a hyperparameter to set by cross validation.
Focal Loss
FL=−i=1∑C=n(1−pi)γtilog(pi)
- The loss function is reshaped to down-weight easy examples and thus focus training on hard negatives. A modulating factor (1−pt)γ is added to the cross entropy loss where γ is tested from [0,5] in the experiment.
- There are two properties of the FL:
- When an example is misclassified and pt is small, the modulating factor is near 1 and the loss is unaffected. As pt→1, the factor goes to 0 and the loss for well-classified examples is down-weighted.
- The focusing parameter γ smoothly adjusts the rate at which easy examples are down-weighted. When γ=0, FL is equivalent to CE. When γ is increased, the effect of the modulating factor is likewise increased. (γ=2 works best in experiment.)
alpha-Balanced Variant of FL
FL=−i=1∑C=n−αt(1−pi)γtilog(pi)
- The above form is used in experiment in practice where α is added into the equation, which yields slightly improved accuracy over the one without α. And using sigmoid activation function for computing p resulting in greater numerical stability.
- γ: Focus more on hard examples.
- α: Offset class imbalance of number of examples.