I had been trying to fire-up RAY for inference on Italian supercomputer leonardo
before doing it, I have been experimenting with some demo code on the colab to test my inference pipeline.
I have tested my inference pipeline without RAY and it can produce 7061 tokens per seconds and ends batch inference in 0.065 seconds for 20, 359 words prompts. I can obviously increase the worker as currently only using 16 workers.
I have been experimenting with below code:
import torch
import asyncio
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import itertools
import time
import ray
from typing import Dict
import numpy as np
model_name = "HuggingFaceTB/SmolLM-1.7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
quantization_config=quantization_config,
torch_dtype=torch.float16
)
texts = [
"Once upon a time", "In a galaxy far, far away", "The quick brown fox jumps over the lazy dog",
"Deep learning models are transforming AI research", "Artificial intelligence is revolutionizing the tech industry",
"The future of AI holds limitless possibilities", "Quantum computing will change everything",
"The mysteries of the universe are waiting to be discovered", "The rise of autonomous vehicles is inevitable",
"Natural language processing is a breakthrough in AI", "The development of self-driving cars is progressing rapidly",
"Space exploration is reaching new heights", "Climate change is one of the biggest challenges of our time",
"Blockchain technology is altering the financial landscape", "The Internet of Things is connecting the world in new ways",
"Robotic automation is reshaping industries", "5G networks will enable faster, more reliable connections",
"Biotechnology holds the key to future medical breakthroughs", "Virtual reality will transform the entertainment industry",
"Machine learning is driving the future of healthcare", "Augmented reality is changing how we interact with the world",
"Big data is fueling the next generation of innovations", "AI ethics will be critical in shaping the future",
"The metaverse will redefine online interaction", "The future of work is hybrid and flexible", "AI-powered tools are enhancing productivity",
"Sustainability is becoming a central focus for businesses", "Artificial general intelligence is a topic of intense debate",
"The human brain is still the most powerful computer", "Neural networks mimic the way the human brain processes information",
"Data privacy will continue to be a major concern", "Autonomous drones are revolutionizing logistics", "The digital economy is on the rise",
"Cybersecurity is more important than ever before", "The ethical implications of AI cannot be ignored", "Social media has drastically altered communication",
"Personalized medicine will improve healthcare outcomes", "Smart cities are becoming a reality", "The advancement of AI in finance is transforming the industry",
"Wearable technology is making life more convenient", "Autonomous robots will revolutionize manufacturing", "Artificial intelligence is unlocking new insights in science",
"The future of education is digital and personalized", "The role of women in tech is growing rapidly", "AI-driven innovation is a major competitive advantage",
"The role of big data in decision-making is growing", "AI research is opening up new frontiers in medicine", "The future of transportation is electric",
"Deep learning is enabling breakthroughs in various industries", "AI systems will augment human intelligence", "Smartphones have become an essential part of daily life",
"The role of artificial intelligence in automation is immense", "Autonomous vehicles are creating new challenges for regulators", "The power of cloud computing is enabling global collaboration",
"The growing importance of social impact in tech companies", "The rise of renewable energy is changing the global landscape",
"AI-based language translation is breaking down communication barriers", "Self-learning machines will revolutionize education", "Energy efficiency will drive future technological advancements",
"Space tourism is becoming more accessible", "The future of the internet is decentralized", "The rise of facial recognition technology is changing security",
"The development of AI is outpacing regulation", "Smart homes are enhancing convenience and energy efficiency", "The role of data in artificial intelligence is crucial",
"Digital twins are revolutionizing manufacturing processes", "The convergence of AI and IoT will create smarter ecosystems", "Health-tech innovations are improving patient care",
"The impact of AI on job markets is a growing concern", "The evolution of virtual assistants is reshaping consumer behavior", "AI is powering the next generation of entertainment",
"The potential of AI in disaster response is enormous", "AI is helping to solve some of the world’s toughest problems", "New breakthroughs in AI are accelerating drug discovery",
"The role of AI in cybersecurity is becoming more pronounced", "The future of AI will be collaborative, not competitive", "Digital currencies are challenging traditional banking systems",
"Sustainable technologies are gaining momentum worldwide", "The use of AI in agriculture is improving crop yields", "The role of AI in space exploration is expanding",
"Biotechnology and AI are converging to improve healthcare", "AI is enhancing the precision of medical diagnoses", "The impact of AI on the environment is a growing concern",
"The future of work will require new skill sets", "Smart robots are becoming a common sight in various industries", "Artificial intelligence will change the way we learn",
"AI-driven personalization is transforming retail experiences", "Human augmentation technology is advancing rapidly", "Blockchain is enabling transparent and secure transactions",
"The role of AI in creative industries is expanding", "Data science is an essential skill for the modern workforce", "The integration of AI in government services will improve efficiency",
"AI will play a key role in solving global challenges", "The use of AI in predictive analytics is transforming businesses", "The advancement of AI in diagnostics will save lives",
"The use of AI in supply chains is driving efficiency", "The future of communication is powered by AI", "AI is transforming the way we interact with technology",
"Autonomous systems are being integrated into military operations", "Artificial intelligence is enabling the next generation of space exploration", "Robotics is changing the future of healthcare",
"AI models are improving decision-making processes", "The rise of AI-driven content creation is reshaping the media landscape", "The future of AI is both exciting and uncertain",
"Technology will continue to evolve at a rapid pace", "The integration of AI in mobile apps is enhancing user experience", "The role of AI in reducing environmental impact is growing",
"Self-healing materials are a potential future breakthrough", "The combination of AI and 3D printing is opening new possibilities", "Technology is reshaping the way we interact with the world",
"AI will soon be capable of emotional intelligence", "Autonomous vehicles will redefine how we travel", "The power of AI in education is limitless"
]
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
and then part of this doc
https://docs.ray.io/en/latest/data/batch_inference.html
when I run prediction.show()
my session crashes. While 30 batch normal code works but this ray inference break my session. Looking at this medium article I think I am doing good: Scalable Batch Inference on Large Language Models Using Ray | by Büşra Korkmaz | KoçDigital | Medium
But, I am struggling to understand what’s actually happening in here. also, I had another question: how did you guys implement continous batching
using ray? Because I saw this code: llm-continuous-batching-benchmarks/benchmark_throughput.py at master · anyscale/llm-continuous-batching-benchmarks · GitHub
Not seem to understand. Cuz I tried a few variants and all were unsuccessful and the code itself feels like an implementation of asyncio and nothing else.
is it possible to show an example using model.generate() and continous batching as well?
thanks!