oneAPI Deep Neural Network Library (oneDNN)
oneAPI Deep Neural Network Library (oneDNN)
Performance library for Deep Learning
2.0.0
C++ API example demonstrating how one can use
MatMul
fused with ReLU in INT8 inference.
Concepts:
Asymmetric quantization
Run-time output scales:
dnnl::primitive_attr::set_output_scales()
and
DNNL_RUNTIME_F32_VAL
Run-time zero points:
dnnl::primitive_attr::set_zero_points()
and
DNNL_RUNTIME_S32_VAL
Operation fusion
Create primitive once, use multiple times
Run-time tensor shapes:
DNNL_RUNTIME_DIM_VAL
Weights pre-packing: use
dnnl::memory::format_tag::any