Concurrency in Ruby: Thread and Fiber
The content of this article is my last tech sharing with my team at https://pixta.vn/.
Fibers and Threads
Thread
thread = Thread.new do
#...
end
thread.join
Fiber
fiber = Fiber.new do
#...
end
fiber.resume # transfer / Fiber.schedule
As you can see, they have quite similar syntax, so what are the differences between them?
-
The level:
- Threads are created 1:1 with threads on OS.
- Fibers are implemented at the programming language level, multiple fibers can run inside a thread.
-
Scheduling mechanism:
- Threads are run pre-emptive by almost modern OS.
- Fibers are referred to as a mechanism for cooperative concurrency.
Threads will run automatically, they are scheduled by OS.
With Thread, programmers are just allowed to create new Threads, make them do some tasks, and use the join
method to get the return from execution. The OS will run threads and decide when to run and pause to achieve concurrency.
[
Thread.new { # code },
Thread.new { # code }
].each(&:join)
Meanwhile, Fiber gives us more control With Fiber, programmers are free to start, pause, and resume them.
-
Fiber.new { }
: create new fiber, started withresume
-
Fiber.yield
: pause current Fiber, moves control to where fiber was resumed - After suspension, Fiber can be resumed later at the same point with the same execution state.
fib2 = nil
fib = Fiber.new do
puts "1 - fib started"
fib2.transfer
Fiber.yield
puts "3 - fib resumed"
end
fib2 = Fiber.new do
puts "2 - control moved to fib2"
fib.transfer
end
fib.resume
puts ""
fib.resume
1 - fib started
2 - control moved to fib2
3 - fib resumed
Fiber over Thread
- A fiber is lighter-weight than a thread, so we can spawn more fibers than threads
- Less context-switching time ( the advantages of cooperative scheduling compare to preemptive scheduling
Fiber scheduler
Fibers were released in Ruby 1.9, but before Ruby 3, Fibers lacked the scheduler implementation to be useful.. Now it is officially supported from Ruby 3. The Fiber Scheduler consists of two parts:
- Fiber Scheduler interface ( what ruby 3 implements )
- Fiber Scheduler implementation
If you want to enable the asynchronous behavior in Ruby, you need to set a Fiber Scheduler object.
Fiber.set_scheduler(scheduler)
The list of Fiber Scheduler implementations and their main differences can be found at Fiber Scheduler List project.
Async gem
- One of the most mature and common Fiber Scheduler implementations is by Samuel Williams.
- Furthermore, he not only implemented a Fiber Scheduler but created the gem called Async has the robust API to write concurrency code.
The next part will help you understand more about how to use Thread, Fiber, and Async gem to write concurrent HTTP requests.
HTTP requests example
For example, we will get a list of uuid
from this site
require "net/http"
def get_uuid
url = "https://httpbin.org/uuid"
response = Net::HTTP.get(URI(url))
JSON.parse(response)["uuid"]
end
This request will take about 1s to finish.
Sequentially version
def get_http_sequently
results = []
10.times.map do
results << get_uuid
end
results
end
now = Time.now
puts get_http_sequently
puts "Fiber runtime: #{Time.now - now}" # about 11-12s
One request took about 1s so if we call sequentially, this code will take about 10s.
Concurrency version with thread
def get_http_via_threads
results = []
10.times.map do
Thread.new do
results << get_uuid
end
end.map(&:value)
results
end
# => 1.3s
Concurrency version with fiber
require "async"
def get_http_via_fibers
Fiber.set_scheduler(Async::Scheduler.new)
results = []
10.times do
Fiber.schedule do
results << get_uuid
end
end
results
ensure
Fiber.set_scheduler(nil)
end
# => 1.2s
Because all requests are called concurrently, the total time is about the time of the slowest request.
More about Async
Another implementation uses Async gem like that, we use Kernel#Async
method instead of Async::Scheduler
def get_http_via_async
results = []
Async do
10.times do
Async do
results << get_uuid
end
end
end
results
end
The general structure of Async Ruby programs:
- You always start with an
Async
block which is passed a task. - That main task is usually used to spawn more Async tasks with
task.async
. - These tasks run concurrently with each other and the main task.
The task is built on top of each Fiber.
HTTP server example
The minimal HTTP server in Ruby can be implemented by using the built-in class TCPServer
, it'll look like this:
socket = TCPServer.new(HOST, PORT)
socket.listen(SOCKET_READ_BACKLOG)
loop do
conn = socket.accept # wait for a client to connect
request = RequestParser.call(conn)
#... status, headers, body
end
Now we'll make the server handle more than 1 request per time.
Thread pool version
pool = ThreadPool.new(size: 5)
loop do
conn = socket.accept # wait for a client to connect
pool.schedule do
# handle each request
request = RequestParser.call(conn)
end
end
The idea is to use a thread pool to limit the number of threads running concurrently.
Async version
Async do
loop do
conn = socket.accept # wait for a client to connect
Async do
# handle each request
request = RequestParser.call(conn)
end
end
end
The Falcon is the most-known app server that uses async for connection pool Falcon.
More detail about implementation and benchmark testing on this repo
Concluding
- Threads and fibers allow programmers to write concurrent code, it's very useful for handling blocking-IO operations.
- As a Ruby developer, we don't use Thread directly most of the time. But in reality, for web development, a lot of tools use threads.
- A web server like Puma or Webrick
- A background job like Sidekiq, GoodJob, and SolidQueue
- An ORM like ActiveRecord or Sequel
- A Http client HTTParty or RestClient
- Fiber (+ FiberScheduler) is just been released from Ruby 3 maybe may have a bright future due to its advantages compared to Thread. Here's a couple of the most useful tools on top of fiber:
-
async-http
a featureful HTTP client -
falcon
HTTP server built around Async core - ...
-