深入理解 BERT 与推理优化

2026-02-20 · AI

1. Transformer 基础

Self-Attention 是核心机制...

torch.onnx.export(model, ...)

通过 FP16 / INT8 量化优化推理速度。