Yes and this also calls into question Epicor’s assertion that individual businesses “can’t afford anough gpu” to run the AI that they feel “everyone” needs to run, and that is why its ok for them to force everyone into Epicor’s cloud where said resources are provisioned by Epicor.
Looking at the price difference between on prem and what they want for cloud I can safely say I can spin up an AI node with a few RTX 6000’s that would make a few people weep and still have a good chunk of change.
Ultimately, we are using AI for more of our business that just the ERP system. I dont need my ERP system to do AI I need my AI system to do it. Epicor’s track record with BI hasn’t been the smoothest and AI being good at basically writing itself it has the potential to marginalise what’s in the ERP.
It is just a matter of time before you can run AI on a single GPU. Both Google, and DeepSeek are fine tuning the algorithms.
Indeed. Compute will stop becoming the problem but a competent solution around AI will be the key. But even then I suspect as they get better they will just sort themselves out. One of my team Vibe codded a Near Miss app that blows most of them out of the water and has embedded AI for health and safety analysis. All hosted on prem (in IIS, he really likes IIS for some reason)
That is what is keeping ERP execs up all night.
Running local AI is possible now. Needs a good gpu (16Gb vram), but definitely in reach.
Look up LM Studio and Ollama for desktop apps that can run local models that are not bad, not as good as the cloud but still showing potential. Things are rapidly evolving as well so give it 12 months and see where we are at.
If you absolutely need to run LLM locally, NVIDIA DGX Spark hardware could do it. Don’t expect miracles though. You might want good controls and some agentic safety layers if you go that route.
Nvidia just announced a consumer level RGX Spark they lack the NIC of the DGX but should be better priced for someone who wants to run local models.
As someone who bought a 128GB M5 Max MacBook Pro then realized that decent Nvidia GPUs are still often more than 5x quicker for a lot of prompts and workflows with a decent quality model loaded at a decent quant (i.e. Qwen3.6-27B @ Q8_K_XL), I would suggest checking for proper long context benchmarks with a decent quantization (i.e. 8bit). While the M5 Max MacBook Pro does well with small MOE (lower quality) models with lower quality quants, the limited memory bandwidth and lack of optimized options really limits its performance with better models and quants.
The best options by far at the moment, in my opinion, are the Nvidia 48GB/72GB RTX PRO 5000 and 96GB RTX PRO 6000 GPUs.
During my testing and workflows so far, my 72GB RTX PRO 5000 is over 4 times faster than my MBP which is fairly similar to the DGX Spark performance wise (similarly memory limited). And the RTX PRO 6000 should be at least 20% faster again for larger context prompts with a decent quality LLM.