![]() |
CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Default kernel-level GEMM definitions combine threadblock-scoped matrix multiply-add with the appropriate threadblock-scoped epilogue. More...
#include "cutlass/cutlass.h"#include "cutlass/layout/matrix.h"#include "cutlass/numeric_types.h"#include "cutlass/arch/wmma.h"#include "cutlass/epilogue/threadblock/epilogue.h"#include "cutlass/epilogue/thread/linear_combination.h"#include "cutlass/gemm/gemm.h"#include "cutlass/gemm/kernel/gemm.h"#include "cutlass/gemm/kernel/gemm_pipelined.h"#include "cutlass/gemm/threadblock/default_mma_core_sm75.h"#include "cutlass/gemm/threadblock/default_mma_core_sm70.h"#include "cutlass/gemm/threadblock/default_mma.h"#include "cutlass/gemm/threadblock/default_mma_core_simt.h"#include "cutlass/gemm/threadblock/threadblock_swizzle.h"#include "cutlass/epilogue/threadblock/default_epilogue_tensor_op.h"#include "cutlass/epilogue/threadblock/default_epilogue_volta_tensor_op.h"#include "cutlass/epilogue/threadblock/default_epilogue_simt.h"#include "cutlass/transform/threadblock/predicated_tile_iterator.h"

Go to the source code of this file.
Namespaces | |
| cutlass | |
| cutlass::gemm | |
| cutlass::gemm::kernel | |
Note, CUTLASS epilogues universally target row-major outputs. Column-major outputs are accommodated by exchanging A and B operands and assuming transposed layouts. Partial specializations here choose 'device::GemmTransposed' to implement this functionality.
1.8.11