Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)E
Posts
976
Comments
903
Joined
3 yr. ago

  • Correct.

  • It would be really cool if they didn't do that this time.

  • There’s also an option to bring your own LLM, with fields for model name, endpoint, and API token available for entry when the manual option is enabled. However, the page itself warns local models may not work correctly.

    It looks like there's an option for people to self-host too. You won't have to send your history to someone else's computer.

  • I don't really understand the hostility. I just asked you a question and rather than an explanation you answer with this and downvote. I guess that was my mistake for giving you the benefit of the doubt.

  • No one said it had to go in someone's home. Would it not be peossible if they were able to be acquired second hand, to find use by someone or someones that weren't the original buyer(s)?

  • As long as the hardware is cheap enough and gets into the right hands people will train and research to improve open models.

  • It depends on what you mean by improve.

  • Don't mind the downvotes, they're from people who don't even use this community.

  • Calling it a "Distillation Attack" is wild. Get fucked Anthropic.

  • Yeah.

  • Yeah, it's a different form of quantization. I think it's supposed to look a little better than conventional quants, but it still has all the tradeoffs of a quant.

  • The license is Apache 2.0, but I don't know how much VRAM it takes.

  • VAEs are used in image generation too at the end of generation to convert latent images to pixel space.

  • It's basically when you use a larger model to train a smaller one. You use a dataset of data generated by the teacher model and ground truth data to train the student model, and by some strange alchemy I don't quite understand you get a much smaller model that resembles the teacher model.

    It's really hard training on a distilled model without breaking it, so people prefer models undistilled whenever possible. Without the teacher model, distilled models are basically cripple-ware.

  • Damn, that's really cool.