Mixed F16 / F32 precision; 4-bit and 5-bit integer quantization support; Low memory usage (Flash Attention); Zero memory allocations at runtime; Runs on the CPU ... ... <看更多>
Search
Search
Mixed F16 / F32 precision; 4-bit and 5-bit integer quantization support; Low memory usage (Flash Attention); Zero memory allocations at runtime; Runs on the CPU ... ... <看更多>