Our Notes from "Scaling AI/ML with Anyscale and Ray"

Event: Scaling AI/ML with Anyscale and Ray
Date: March 29, 2023
Location: AWS Startup Loft in San Francisco, CA
Commercial Sponsors: AWS & Anyscale

Summary by Nick Lee, Enterprise AI Consulting LLC

I attended “Scaling AI/ML with Anyscale and Ray” at the AWS Startup Loft in San Francisco on March 29, 2023. The event was well attended, attracting about 20-30 people from startups and consulting companies (a strong turnout for an event held in the middle of the workday on a rainy Wednesday!). The talks were all all high quality and delivered by reputable speakers from well-known companies:

Rob Ferguson, Global AI & ML Startups & VC, AWS
Robert Nishihara, Co-founder and CEO, Anyscale
Holden Karau, Open Source Engineer, Netflix
Patrick Ames, Principal Engineer, Amazon
M Waleed Kadous, Head of Engineering, Anyscale

My Takeaways from Various Talks

Presentation by Robert Nishihara:

Ray has strong, mainstream adoption across Tech. Uber uses Ray for its Deep Learning workloads.
Most common workloads seen on Ray at Anyscale:
- Model Training
- Model Serving and Deployment
- Batch Processing & Inference
Most popular use-cases for Ray & Anyscale:
- Computer Vision
- Natural Language Processing & LLMs (e.g. Chat-GPT)
- Time series forecasting
If you want to submit a feature request to Ray, check out the Ray Enhancement Proposal github page

Presentation by Patrick Ames:

I enjoyed Patrick Ames’ presentation about how he and his team used Ray at AWS to power BI tools and perform a massive migration off of Apache Spark into Ray. Notably, this is the first time I’ve seen Ray painted as a potential Spark-killer.

Example of some of Ames’ work: https://aws.amazon.com/solutions/case-studies/amazon-migration-analytics/

Ames said the Business Intelligence (BI) department at Amazon is starting to replace Apache Spark with Ray-based applications. They are innovating on top of Apache Iceberg and Ray DeltaCat. The additional code they are writing at AWS will hopefully, according to Ames, be open-sourced by the end of 2023.

They (AWS BI team) rely heavily on Ray Datasets and Apache Arrow.

Note: I sync’d with Ames after his talk and asked why they decided to build on Iceberg instead of a more established technology like Delta Lake (I disclosed I previously worked at Databricks and was curious from that perspective). He told me AWS already had a lot of infrastructure built around Iceberg and it made sense at the time, and he knew some of the folks who worked on Iceberg and trusted their work. He does hope they can provide support for more table formats like Delta Lake, eventually.

M Waleed Kadous’ talk:

Co:here and OpenAI use Ray internally
Kadous foresees the need to bring LLM development in-house because it is possible to rack up huge bills for simple interactions with commercial LLMs. For example, a simple conversation with an OpenAI LLM could easily cost $2. If every customer chat costs you $2, you would quickly burn cash as you scaled up.
Tale of Two Tech Stacks (expensive vs affordable):
- Expensive Stack: Used by big co’s with millions of dollars to throw at LLM problems
  - Alpa for Automatic model parallelization
  - Ray for orchestration
  - The “expensive stack” looks like this:

https://www.anyscale.com/blog/training-175b-parameter-language-models-at-1000-gpu-scale-with-alpa-and-ray

Alternative stack: Convenient & cheaper
- Hugging Face for models
- DeepSeed for optimized training
- Pytorch as a framework
- Ray for orchestration
- GPU or other hardware for compute
I noticed that Ray is a commonality between the “expensive” and the “alternative” stacks.

Nick’s (Subjective) Thoughts

Ray is seeing incredible adoption and capability expansion thanks to huge open-source efforts. It truly is a valuable new technology for AI/ML.
Since Ray is purposely designed to be easy to use in its open-source form, it’s unclear to me how compelling the value prop is to use Anyscale. Personally, I am in favor of using hosted solutions rather than spending cycles building/maintaining an in-house environment, but it looked like all the presenting companies (Netflix, Amazon) were happily using Ray in its open-source state on their own hardware.
- Would have loved to hear from a company who decided to use Anyscale-hosted Ray and understand how much faster/cheaper/better they were able to get into production than stand up their own clusters and manage Ray themselves
I was amazed to hear that at least 1 department at Amazon is starting to move away from Apache Spark. It seemed to require a substantial engineering effort to migrate away, but Patrick Ames’ talk is proof that a company can tackle Big Data without Spark. I don’t think Ray will pose an existential threat to Spark in the near future (e.g. 0-2 years), but perhaps we’ll see Ray become a more accessible alternative to Spark, or at least a strong complement to Spark. If this happens, perhaps Anyscale can grab some percent-of-wallet at the cost to Databricks or EMR. I think Ray will pose an even more compelling alternative if it can support SQL, similar to SparkSQL in Spark.
I’m also aware that Databricks recently announced Ray support in their clusters. That means you can run your Ray workloads on Databricks, although you probably don’t get everything you might get from Anyscale. If Databricks sees enough revenue coming in from Ray workloads, they might be incentivized to compete with Anyscale on providing the best place to run Ray… Or perhaps, acquire Anyscale for its founding team and technology?

Saying hello to Robert Nishihara from Anyscale!

These notes are provided to the public, free of charge by Enterprise AI Consulting LLC. We attempt to provide the most accurate information possible, but we do not offer any warranty on this information and it is offered “as is.” Use at your own discretion. Opinions expressed by the author do not necessarily represent the views of the company. This document does not contain any confidential information.

Our Notes from “Scaling AI/ML with Anyscale and Ray”

Categories

Archives

Tags