Okay. So the AI bubble bursts.
-
@swelljoe Fucking hell.
-
@mos_8502 This is not entirely accurate, but this is the cheapest way I've found to run something close to a reasonable coding model (quantized, so still not comparable to the frontier models, but comparable to frontier models from a couple years ago). Someone on YouTube coupled two of them to run a ~200GB model that gets close to modern standards. So, $4200.
A Mac Pro with 512GB of unified memory gets you in the ballpark of the frontier. (Though there aren't open models at quite that level.)
-
@swelljoe Fucking hell.
@mos_8502 that's a quarter what it cost a year ago. 🤷
-
@mos_8502 that's a quarter what it cost a year ago. 🤷
@mos_8502 cheapest option a year or two ago was probably 4 3090s on a server motherboard.
-
@mos_8502 This is not entirely accurate, but this is the cheapest way I've found to run something close to a reasonable coding model (quantized, so still not comparable to the frontier models, but comparable to frontier models from a couple years ago). Someone on YouTube coupled two of them to run a ~200GB model that gets close to modern standards. So, $4200.
A Mac Pro with 512GB of unified memory gets you in the ballpark of the frontier. (Though there aren't open models at quite that level.)
@mos_8502 GLM 4.7 can run in 205GB at the 2-bit quantization. Some of that can be system memory, so if you had a system with a couple of large GPUs and a ton of system RAM, you could run a very good open model...among the best, comparable to Sonnet 4.5. Still not Opus/Codex 5.2 level, but it'll write working code. https://unsloth.ai/docs/models/glm-4.7
-
@vwbusguy I have an old GPU, but 64GB of system RAM and 24 logical cores in my Xeon workstation?
@mos_8502 pytorch can use cpu. You'll just have to be a little more patient.
-
@mos_8502 GLM 4.7 can run in 205GB at the 2-bit quantization. Some of that can be system memory, so if you had a system with a couple of large GPUs and a ton of system RAM, you could run a very good open model...among the best, comparable to Sonnet 4.5. Still not Opus/Codex 5.2 level, but it'll write working code. https://unsloth.ai/docs/models/glm-4.7
@mos_8502 so, it continues to be cheapest to buy the investor-subsidized compute and GPU that OpenAI, Anthropic, Google, Microsoft, etc. want to provide. But, it does feel bad to me to trust in that or become dependent on that, especially since the models are all proprietary and I don't know what they're doing exactly or what they're doing with my data or that they'll do in the future.
-
@vwbusguy I have an old GPU, but 64GB of system RAM and 24 logical cores in my Xeon workstation?
-
-
-
@mos_8502 so, it continues to be cheapest to buy the investor-subsidized compute and GPU that OpenAI, Anthropic, Google, Microsoft, etc. want to provide. But, it does feel bad to me to trust in that or become dependent on that, especially since the models are all proprietary and I don't know what they're doing exactly or what they're doing with my data or that they'll do in the future.
@mos_8502 nvidia has a $3999 computer, the DGX Spark, that is roughly the same speed and specs as the AMD option, in benchmarks, but the nvidia ecosystem is more mature. And, ASUS makes a mini PC with the same nvidia chipset and specs for $2999. So, if you need the nvidia level of compatibility, you have to spend more for 128GB. But, that's become less important in the past year or so as AMD has invested in AI tooling. So, for now, AMD is a bargain, relative to Apple or nvidia.