• 0 Posts
  • 6 Comments
Joined 1 month ago
cake
Cake day: January 26th, 2025

help-circle

  • Yeah, the cache hierarchy is behaving kinda wonky lately. Many AI workloads (and that’s what’s driving development lately) are constrained by bandwidth, and cache will only help you with a part of that. Cache will help with repeated access, not as much with streaming access to datasets much larger than the cache (i.e. many current AI models).

    Intel already tried selling CPUs with both on-package HBM and slotted DDR-RAM. No one wanted it, as the performance gains of the expensive HBM evaporated completely as soon as you touched memory out-of-package. (Assuming workloads bound by memory bandwidth, which currently dominate the compute market)

    To get good performance out of that, you may need to explicitly code the memory transfers to enable prefetch (preferably asynchronous) from the slower memory into the faster, á la classic GPU programming. YMMW.





  • Apparently AMD couldn’t make the signal integrity work out with socketed RAM. (source: LTT video with Framework CEO)

    IMHO: Up until now, using soldered RAM was lazy and cheap bullshit. But I do think we are at the limit of what’s reasonable to do over socketed RAM. In high performance datacenter applications, socketed RAM is on it’s way out (see: MI300A, Grace-{Hopper,Blackwell},Xeon Max), with onboard memory gaining ground. I think we’ll see the same trend on consumer stuff as well. Requirements on memory bandwidth and latency are going up with recent trends like powerful integrated graphics and AI-slop, and socketed RAM simply won’t work.

    It’s sad, but in a few generations I think only the lower end consumer CPUs will be possible to use with socketed RAM. I’m betting the high performance consumer CPUs will require not only soldered, but on-board RAM.

    Finally, some Grace Hopper to make everyone happy: https://youtube.com/watch?v=gYqF6-h9Cvg