Reference

Consistant Rank Ligist for Ordinal Regression

Network design

After the last fully-connected layer with num of classes as one, a 1D linear bias layer is introduced.

self.fc = nn.Linear(4096, 1, bias=False)
self.linear_1_bias = nn.Parameter(torch.zeros(num_classes-1).float())

Loss function

Let WW denote the weight parameters of the neural network excluding the bias units of the final layer. The penultimate layer, whose output is denoted as g(xi,W)g(x_i,W), shares a single weight with all nodes in the final output layer. K1K-1 independent bias units are then added to g(xi,W)g(x_i, W) such that g(xi,W)+bkk=1K1{g(x_i, W)+b_k}_{k=1}^{K-1} are the inputs to the crresponding binary classifiers in the final layer. Let s(z)=1/(1+exp(z))s(z)=1/(1+exp(-z)) be the logistic sigmoid function. The predicted empirical probability for task k is defined as:

P^(yik=1)=s(g(xi,W)+bk) \hat{P}(y_i^k=1) = s(g(x_i, W) +b_k)

For model training, we minimize the loss function:

L(W,b)=i=1Nk=1K1λk[log(s(g(xi,W)+bk))yik+log(1s(g(xi,W)+bk))(1yik)] L(W,b) = - \sum_{i=1}^N \sum_{k=1}^{K-1} \lambda ^k [log(s(g(x_i,W) +b_k))y_i^k + log(1-s(g(x_i, W) + b_k))(1-y_i^k)]

which is the weighted cross-entropy of K-1 binary classifiers. For rank prediction, the binary labels are obtained via:

fk(xi)=1P^(yik=1)>0.5 f_k(x_i) = 1{ \hat{P}(y_i^k=1) > 0.5 }

Example

Let's take a look at the labels, for 7 ranks:

  • Cross-Entropy, the one hot encoded label for class 3 is denoted as [0,0,1,0,0,0,0]T[0,0,1,0,0,0,0]^T,
  • CROAL-Loss, it's [1,1,1,0,0,0]T[1,1,1,0,0,0 ]^T

    levels = [[1] * label + [0] * (self.num_classes - 1 - label) for label in batch_y]
    

The logits for CORAL-loss looks like this [0.9,0.8,0.6,0.4,0.2,0.1]T[0.9, 0.8, 0.6, 0.4, 0.2, 0.1]^T, we find the last num >= 0.5, it's index 3 is our prediction.

During training, the loss for the current sample is calculated as

L=k=1K1[1,1,1,0,0,0]log([0.9,0.8,0.6,0.4,0.2,0.1]T)+(1[1,1,1,0,0,0])log(1[0.9,0.8,0.6,0.4,0.2,0.1]T)=k=1K1[1,1,1,0,0,0]log([0.9,0.8,0.6,0.4,0.2,0.1]T)+[0,0,0,1,1,1]log(1[0.9,0.8,0.6,0.4,0.2,0.1]T) \begin{aligned} L = -\sum_{k=1}^{K-1} [1,1,1,0,0,0] * log( [0.9,0.8,0.6,0.4,0.2,0.1]^T ) \\ + (1 - [1,1,1,0,0,0]) * log (1- [0.9, 0.8, 0.6, 0.4, 0.2, 0.1]^T ) \\ = - \sum_{k=1}^{K-1} [1,1,1,0,0,0] * log( [0.9,0.8,0.6,0.4,0.2,0.1]^T ) \\ + [0,0,0,1,1,1] * log (1- [0.9, 0.8, 0.6, 0.4, 0.2, 0.1]^T ) \\ \end{aligned}

Ordinal Regression

Network design

Last fc layer outputs (num_classes-1)*2 logits.

self.fc = nn.Linear(2048 * block.expansion, (self.num_classes-1)*2)

Final output is similar to CORAL-loss:

probas = F.softmax(logits, dim=2)[:, :, 1]
predict_levels = probas > 0.5
predicted_labels = torch.sum(predict_levels, dim=1)

Loss function

def cost_fn(logits, levels, imp):
    val = (-torch.sum((F.log_softmax(logits, dim=2)[:, :, 1]*levels
                      + F.log_softmax(logits, dim=2)[:, :, 0]*(1-levels))*imp, dim=1))
    return torch.mean(val)