A possible hardware solution for ultra speed (73x faster than H200) self hosted small models that is not dependent on RAM
A possible hardware solution for ultra speed (73x faster than H200) self hosted small models that is not dependent on RAM
dev.to
404: Page Not Found
Approach hardwires model weights into transistors, and uses older 6nm process. Targetting 70b model sizes (presumably 16 bit) by year end. It should cost much less than a 140gb card. but I don't know details.