Notion migrated from Spark on EMR to Ray, cutting embedding costs 80% and improving query latency 10x. Uber and Salesforce shared similar AI infrastructure winsNotion migrated from Spark on EMR to Ray, cutting embedding costs 80% and improving query latency 10x. Uber and Salesforce shared similar AI infrastructure wins

Notion Slashes AI Embedding Costs 80% After Ditching Spark for Ray

2026/04/10 00:48
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Notion Slashes AI Embedding Costs 80% After Ditching Spark for Ray

James Ding Apr 09, 2026 16:48

Notion migrated from Spark on EMR to Ray, cutting embedding costs 80% and improving query latency 10x. Uber and Salesforce shared similar AI infrastructure wins.

Notion Slashes AI Embedding Costs 80% After Ditching Spark for Ray

Notion has slashed its AI embedding pipeline costs by more than 80% after migrating from Apache Spark to Ray, the distributed computing framework backed by Anyscale. The productivity software company also achieved 10x improvements in query latency while consolidating three separate jobs per region into one.

The migration details emerged at Ray Day Seattle on April 9, 2026, where ML engineers from Notion, Uber, Salesforce, and Apple shared hard-won lessons about scaling AI infrastructure.

What Notion Actually Changed

Mickey Liu, a software engineer on Notion's search platform team, walked through the overhaul. Their original setup used a three-step Spark pipeline running on Amazon EMR: data chunking, third-party API calls for embedding generation, and writes to a vector store.

The pain points were predictable but severe. Double compute costs. Third-party API rate limits throttling throughput. Debugging nightmares when failures occurred across tools—driver and executor logs weren't even persisted in YARN.

The new architecture streams Kafka data directly into a Ray cluster handling CPU chunking, GPU embedding generation, and vector store writes in a single pipeline. No intermediate S3 handoffs. What started as the backend for a Q&A feature in 2023 now powers all of Notion AI and custom agents.

Uber and Salesforce Report Similar Gains

Uber's Peng Zhang detailed how their Michelangelo ML platform evolved from TensorFlow/Horovod to Ray with PyTorch. The standout move: separating CPU data-loading nodes from GPU training nodes in a heterogeneous cluster design. Result? GPU utilization jumped 20%, and training time dropped roughly 50% in select pipelines.

Salesforce tackled a different beast—summarizing documents up to 200,000 tokens long (roughly a short novel) with P95 latency under 15 seconds. Their team used Ray to chunk documents and run parallel inference across a distributed actor pool with vLLM, then merge results. They landed on 1-2 GPU data parallelism as the sweet spot after running scaling experiments directly on Ray.

Why This Matters Beyond These Companies

Robert Nishihara, Ray's co-creator and Anyscale co-founder, opened the event by framing the core problem: AI infrastructure keeps getting harder. Multimodal data processing, reinforcement learning workloads, and multi-node LLM inference are pushing existing tools past their limits.

Every speaker landed on the same conclusion from different angles—their previous tooling ran out of road.

Apple engineers Charlie Chen and Haocheng Bian highlighted foundation model training challenges: massive unstructured data, billion-plus parameters, and sparse architectures like Mixture of Experts. Traditional engines fail because data pipelines and training frameworks run in separate environments with no shared context.

What's Next

Ray Day Seattle kicked off Anyscale's 2026 "Ray on the Road" tour—eight cities across three countries. The company is also running invite-only customer roundtables at each stop to preview their product roadmap.

For teams hitting similar walls with Spark or other distributed frameworks, Notion's full technical writeup is available on their engineering blog under "Two Years of Vector Search at Notion." The 80% cost reduction and 10x latency improvement offer a concrete benchmark for anyone evaluating similar migrations.

Image source: Shutterstock
  • ai infrastructure
  • ray
  • machine learning
  • enterprise tech
  • cost optimization
Market Opportunity
Raydium Logo
Raydium Price(RAY)
$0.6015
$0.6015$0.6015
+3.10%
USD
Raydium (RAY) Live Price Chart

World Cup Combo: Aim for 200x

World Cup Combo: Aim for 200xWorld Cup Combo: Aim for 200x

Combine up to 20 World Cup matches in one order

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Could You Retire On Florida’s Space Coast And Watch Rocket Launches From Your Backyard?

Could You Retire On Florida’s Space Coast And Watch Rocket Launches From Your Backyard?

Can a million-dollar nest egg buy you a front-row seat to America’s new space race? Along Florida’s Space Coast, retirees can sip coffee on the patio, hear the
Share
247 Wall St.2026/06/19 03:47
Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC

Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC

The post Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC appeared on BitcoinEthereumNews.com. Franklin Templeton CEO Jenny Johnson has weighed in on whether the Federal Reserve should make a 25 basis points (bps) Fed rate cut or 50 bps cut. This comes ahead of the Fed decision today at today’s FOMC meeting, with the market pricing in a 25 bps cut. Bitcoin and the broader crypto market are currently trading flat ahead of the rate cut decision. Franklin Templeton CEO Weighs In On Potential FOMC Decision In a CNBC interview, Jenny Johnson said that she expects the Fed to make a 25 bps cut today instead of a 50 bps cut. She acknowledged the jobs data, which suggested that the labor market is weakening. However, she noted that this data is backward-looking, indicating that it doesn’t show the current state of the economy. She alluded to the wage growth, which she remarked is an indication of a robust labor market. She added that retail sales are up and that consumers are still spending, despite inflation being sticky at 3%, which makes a case for why the FOMC should opt against a 50-basis-point Fed rate cut. In line with this, the Franklin Templeton CEO said that she would go with a 25 bps rate cut if she were Jerome Powell. She remarked that the Fed still has the October and December FOMC meetings to make further cuts if the incoming data warrants it. Johnson also asserted that the data show a robust economy. However, she noted that there can’t be an argument for no Fed rate cut since Powell already signaled at Jackson Hole that they were likely to lower interest rates at this meeting due to concerns over a weakening labor market. Notably, her comment comes as experts argue for both sides on why the Fed should make a 25 bps cut or…
Share
BitcoinEthereumNews2025/09/18 00:36
BREAKING: XRP Drops as BNB Slides in Market Chaos

BREAKING: XRP Drops as BNB Slides in Market Chaos

BNB, USDC, and XRP Show Mixed Performance as Crypto Market Remains Volatile The cryptocurrency market continues to show uneven performance across major digit
Share
Hokanews2026/06/19 03:33

Score Your Share of 50K USDT

Score Your Share of 50K USDTScore Your Share of 50K USDT

Complete DEX+ tasks to unlock the Champion Wheel