I’ve been running Ollama on my Mac Studio for local AI experiments. I followed advice to try oMLX instead and it’s ludicrously faster, like maybe 5-10x for both time to first token and completing the response. I haven’t benchmarked it, but it subjectively feels like when I replaced a hard drive with an SSD.