Even though the first stable version of Rust was released in 2015, there are still some holes in its ecosystem for solving common tasks. One of which is background processing.
In software engineering background processing is a common approach for solving several problems:
- Carry out periodic tasks. For example, deliver notifications, update cached values.
- Defer expensive work so your application stays responsive while performing calculations in the background
Most programming languages have go-to background processing frameworks/libraries. For example:
- Ruby - sidekiq. It uses Redis as a job queue.
- Python - dramatiq. It uses RabbitMQ as a job queue.
- Elixir - oban. It uses a Postgres DB as a job queue.
The async programming (async/await) can be used for background processing but it has several major disadvantages if used directly:
- It doesn’t give control of the number of tasks that are being executed at any given time. So a lot of spawned tasks can overload a thread/threads that they’re started on.
- It doesn’t provide any monitoring which can be useful to investigate your system and find bottlenecks
- Tasks are not persistent. So all enqueued tasks are lost on every application restart
To solve these shortcomings of the async programming we implemented the async processing in the fang library.
Threaded Fang
Fang is a background processing library for rust. The first version of Fang was released exactly one year ago. Its key features were:
- Each worker is started in a separate thread
- A Postgres table is used as the task queue
This implementation was written for a specific use case - el monitorro bot. This specific implementation of background processing was proved by time. Each day it processes more and more feeds every minute (the current number is more than 3000). Some users host the bot on their infrastructure.
You can find out more about the threaded processing in fang in this blog post.
Async Fang
Async provides significantly reduced CPU and memory overhead, especially for workloads with a large amount of IO-bound tasks, such as servers and databases. All else equal, you can have orders of magnitude more tasks than OS threads, because an async runtime uses a small amount of (expensive) threads to handle a large amount of (cheap) tasks
For some lightweight background tasks, it’s cheaper to run them on the same thread using async instead of starting one thread per worker. That’s why we implemented this kind of processing in fang. Its key features:
- Each worker is started as a tokio task
- If any worker fails during task execution, it’s restarted
- Tasks are saved to a Postgres database. Instead of diesel, tokio-postgres is used to interact with a db. The threaded processing uses the diesel ORM which blocks the thread.
- The implementation is based on traits so it’s easy to implement additional backends (redis, in-memory) to store tasks.
Usage
The usage is straightforward:
- Define a serializable task by adding
serde
derives to a task struct. - Implement
AsyncRunnable
runnable trait for fang to be able to run it. - Start workers.
- Enqueue tasks.
Let’s go over each step.
Define a job
use fang::serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize)]
#[serde(crate = "fang::serde")]
pub struct MyTask {
pub number: u16,
}
impl MyTask {
pub fn new(number: u16) -> Self {
Self { number }
}
}
Fang re-exports serde
so it’s not required to add it to the Cargo.toml
file
Implement the AsyncRunnable trait
use fang::async_trait;
use fang::typetag;
use fang::AsyncRunnable;
use std::time::Duration;
#[async_trait]
#[typetag::serde]
impl AsyncRunnable for MyTask {
async fn run(&self, queue: &mut dyn AsyncQueueable) -> Result<(), Error> {
let new_task = MyTask::new(self.number + 1);
queue
.insert_task(&new_task as &dyn AsyncRunnable)
.await
.unwrap();
log::info!("the current number is {}", self.number);
tokio::time::sleep(Duration::from_secs(3)).await;
Ok(())
}
}
- Fang uses the typetag library to serialize trait objects and save them to the queue.
- The async-trait is used for implementing async traits
Init queue
use fang::asynk::async_queue::AsyncQueue;
let max_pool_size: u32 = 2;
let mut queue = AsyncQueue::builder()
.uri("postgres://postgres:postgres@localhost/fang")
.max_pool_size(max_pool_size)
.duplicated_tasks(true)
.build();
Start workers
use fang::asynk::async_worker_pool::AsyncWorkerPool;
use fang::NoTls;
let mut pool: AsyncWorkerPool<AsyncQueue<NoTls>> = AsyncWorkerPool::builder()
.number_of_workers(10_u32)
.queue(queue.clone())
.build();
pool.start().await;
Insert tasks
let task = MyTask::new(0);
queue
.insert_task(&task1 as &dyn AsyncRunnable)
.await
.unwrap();
Pitfalls
The async processing is suitable for lightweight tasks. But for heavier tasks it’s advised to use one of the following approaches:
- start a separate tokio runtime to run fang workers
- use the threaded processing feature implemented in fang instead of the async processing
Future directions
There are a couple of features planned for fang:
- Retries with different backoff modes
- Additional backends (in-memory, redis)
- Graceful shutdown for async workers (for the threaded processing this feature is implemented)
- Cron jobs
Conclusion
The project is available on GitHub
The async feature and this post is written in collaboration between Ayrat Badykov (github) and Pepe Márquez Romero (github)
Comments