Longformer Torch2Paddle
PaddlePaddle-Longformer-model-base-4096
PyTorch | Shape | Paddle | Shape |
---|---|---|---|
embeddings.word_embeddings.weight | [50265, 768] | embeddings.word_embeddings.weight | |
embeddings.position_embeddings.weight | [4098, 768] | embeddings.position_embeddings.weight | |
embeddings.token_type_embeddings.weight | [1, 768] | embeddings.token_type_embeddings.weight | |
embeddings.LayerNorm.weight | [768] | embeddings.layer_norm.weight | |
embeddings.LayerNorm.bias | [768] | embeddings.layer_norm.bias | |
encoder.layer.0.attention.self.query.weight | [768, 768] | encoder.layers.0.self_attn.query.weight | T |
encoder.layer.0.attention.self.query.bias | [768] | encoder.layers.0.self_attn.query.bias | |
encoder.layer.0.attention.self.key.weight | [768, 768] | encoder.layers.0.self_attn.key.weight | T |
encoder.layer.0.attention.self.key.bias | [768] | encoder.layers.0.self_attn.key.bias | |
encoder.layer.0.attention.self.value.weight | [768, 768] | encoder.layers.0.self_attn.value.weight | T |
encoder.layer.0.attention.self.value.bias | [768] | encoder.layers.0.self_attn.value.bias | |
encoder.layer.0.attention.self.query_global.weight | [768, 768] | encoder.layers.0.self_attn.query_global.weight | T |
encoder.layer.0.attention.self.query_global.bias | [768] | encoder.layers.0.self_attn.query_global.bias | |
encoder.layer.0.attention.self.key_global.weight | [768, 768] | encoder.layers.0.self_attn.key_global.weight | T |
encoder.layer.0.attention.self.key_global.bias | [768] | encoder.layers.0.self_attn.key_global.bias | |
encoder.layer.0.attention.self.value_global.weight | [768, 768] | encoder.layers.0.self_attn.value_global.weight | T |
encoder.layer.0.attention.self.value_global.bias | [768] | encoder.layers.0.self_attn.value_global.bias | |
encoder.layer.0.attention.output.dense.weight | [768, 768] | encoder.layers.0.self_attn.out.weight | T |
encoder.layer.0.attention.output.dense.bias | [768] | encoder.layers.0.self_attn.out.bias | |
encoder.layer.0.attention.output.LayerNorm.weight | [768] | encoder.layers.0.norm1.weight | |
encoder.layer.0.attention.output.LayerNorm.bias | [768] | encoder.layers.0.norm1.bias | |
encoder.layer.0.intermediate.dense.weight | [3072, 768] | encoder.layers.0.linear1.weight | T [768, 3072] |
encoder.layer.0.intermediate.dense.bias | [3072] | encoder.layers.0.linear1.bias | |
encoder.layer.0.output.dense.weight | [768, 3072] | encoder.layers.0.linear2.weight | T [3072, 768] |
encoder.layer.0.output.dense.bias | [768] | encoder.layers.0.linear2.bias | |
encoder.layer.0.output.LayerNorm.weight | [768] | encoder.layers.0.norm2.weight | |
encoder.layer.0.output.LayerNorm.bias | [768] | encoder.layers.0.norm2.bias | |
pooler.dense.weight | [768, 768] | pooler.dense.weight | T |
pooler.dense.bias | [768] | pooler.dense.bias |
Paddle gather index_select
gather实现torch数组花式索引
https://github.com/PaddlePaddle/Paddle/issues/42554 [受到启发]如果要多个list做索引建议一个一个来分开处理
https://github.com/PaddlePaddle/Paddle/issues/35072
Paddle.nn.functional.unfold
Pytorch tensor.stride & tensor.as_strided
tensor.stride()¶
Stride is the jump necessary to go from one element to the next one in the specified dimension dim.
一个元素到另一个元素,元素粒度
任意维度上的步长,是其低维度乘积。
shape: (12, 512, 768) stride: (512x768x1, 768x1, 1x1)
tensor.as_strided()¶
input (Tensor) – the input tensor.
size (tuple or ints) – the shape of the output tensor
stride (tuple or ints) – the stride of the output tensor
more ...
Pytorch View vs Reshape
torch.view has existed for a long time. It will return a tensor with the new shape. The returned tensor will share the underling data with the original tensor. See the documentation here.
On the other hand, it seems that torch.reshape has been introduced recently in version 0.4. According to the document more ...
pytorch.nn.functional.pad
torch¶
从输入input的最后一个维度向前padding
输入input的$\left\lfloor\frac{\text{len(pad)}}{2}\right\rfloor$个维度进行padding
如果只padding输入张量input的最后1个维度,pad的形式如:(padding_left, padding_right)
如果只padding输入张量input的最后2个维度,pad的形式如:(padding_left, padding_right, padding_top, padding_bottom)
如果只padding输入张量input的最后3个维度,pad的形式如:(padding_left, padding_right, padding_top, padding_bottom, padding_front, padding_back)
Creating folds properly
Longformer BigBird
allenai/longformer-large-4096¶
epoch 3
with pretrained Lead 0.7826552462526767 Position 0.6857142857142857 Claim 0.6016325707951224 Evidence 0.6062992125984252 Concluding Statement 0.7744827586206896 Counterclaim 0.5159301130524152 Rebuttal 0.43537414965986393
Overall 0.6288697623847826
========================================¶
epoch4
witout pretrained Lead 0.7926960257787325 Position 0.6743119266055045 Claim 0.5527019174898314 Evidence 0.6058080479229067 Concluding Statement 0.7251962883654532 Counterclaim 0.4868686868686869 Rebuttal 0.39381153305203936
Overall 0.6044849180118792
with pretrained Lead 0.7948164146868251 Position 0.6745484400656815 Claim 0.5881818181818181 Evidence 0.5861433087460485 Concluding Statement 0.7867698803659395 Counterclaim 0.5420207743153919 Rebuttal 0.43478260869565216
Overall 0.6296090350081938
========================================¶
epoch5
witout pretrained Lead 0.7926565874730022 Position 0.6712629269821373 Claim 0.5932255111382362 Evidence 0.6297068563718876 Concluding Statement 0.7207586933614331 Counterclaim 0.48604860486048607 Rebuttal 0.42297650130548303
Overall 0.6166622402132379 online: 0.612
more ...