CategoricalCrossentropy VS SparseCategoricalCrossentropy

SparseCategoricalCrossentropy

Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot representation. If you want to provide labels as integers, please use SparseCategoricalCrossentropy loss. There should be # classes floating point values per feature.

https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy

tf.keras.losses.Reduction

class TFTokenClassificationLoss:
    """
    Loss function suitable for token classification.

    .. note::
        Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.
    """

    def compute_loss(self, labels, logits):
        loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(
            from_logits=True, reduction=tf.keras.losses.Reduction.NONE
        )
        # make sure only labels that are not equal to -100
        # are taken into account as loss
        if tf.math.reduce_any(labels == -1):
            warnings.warn("Using `-1` to mask the loss for the token is deprecated. Please use `-100` instead.")
            active_loss = tf.reshape(labels, (-1,)) != -1
        else:
            active_loss = tf.reshape(labels, (-1,)) != -100
        reduced_logits = tf.boolean_mask(tf.reshape(logits, (-1, shape_list(logits)[2])), active_loss)
        labels = tf.boolean_mask(tf.reshape(labels, (-1,)), active_loss)

        return loss_fn(labels, reduced_logits)
                    more ...
                


tf.broadcast_to

Broadcast

  • expand_dims

  • without copying data vs tf.tile

  • tf.broadcast_to

Key idea

  • 如何a,b张量维度不一致,则插入1dim。小维度对齐

  • expand 1dim到相同dims

e.g. a = [4, 32, 32, 3]

b = [3] -> [1, 1, 1, 3] -> [4, 32, 32, 3]

tf.broadcast_to


tf.concat/split/stack

Merge and Split

  • tf.concat

  • tf.split

  • tf.stack

  • tf.unstack

1. tf.concat

原有维度concat,如果新维度concat,则stack

  • concat:拼接的维度可以不同,其他维度必须相同。 concat([3, 35, 8], [2, 35, 8]) -> [5, 35, 8]

  • stack:所有维度必须相同。 stack([3, 35, 8], [3, 35, 8]) -> [2, 3, 35, 8]



tf.math

  • +, -, *, /

  • **, pow, square

  • sqrt

  • //, %

  • exp, log

  • @, matmul

  • linear layer

element-wise

  • +, -, *, /

matrix-wise

  • @, matmul

dim-wise

  • reduce_mean/max/min/sum