Export large file in Rails with Sidekiq
In some systems, you'll have to do something very slow. Maybe you need to export a large dataset to a CSV file, or need to generate a thousand row Exel file that takes some minutes to complete.
Instead of try building these kinds of reports in a normal Rails controller action, this task should be moved into a background job for processing outside the life cycle of a single HTTP request.
But when you move work to a background job, code execution does not stop and you need a way to “connect” back to the user that caused the job to be created. It’s outside of a single request-response life cycle so you cannot rely on normal Rails controllers.
We often need a way to report back progress and send back information once the job is completed.
Example: Export large xlsx file
This sample uses Sidekiq for Backgroud Job and a Stimulus controller to handle client-side polling/updating.
First, we use this command to generate new job rails generate sidekiq:job ExportUser
.
Now we move the export file logic to the export_job.rb file, and call the job inside the controller instead.
class UsersController < ApplicationController
def export
respond_to do |format|
format.json do
job_id = ExportUserJob.perform_async
render json: {
jid: job_id
}
end
end
end
So our controller will now be like this, the csv generation will now be handle in the background instead of in our main app
class ExportUserJob
include Sidekiq::Job
include Sidekiq::Status::Worker
def perform(*args)
users = User.pluck :id, :name, :email, :address, :phone
total users.size
# Create Axlsx package and workbook
xlsx_package = Axlsx::Package.new
xlsx_workbook = xlsx_package.workbook
xlsx_workbook.add_worksheet(name: "Users") do |worksheet|
worksheet.add_row %w(ID Name Email Address Phone)
users.each.with_index(1) do |user, idx|
worksheet.add_row user
# Caculate percentage
at idx
end
end
# Save file into tmp with suffix is jobId
xlsx_package.serialize Rails.root.join("tmp", "users_export_#{self.jid}.xlsx")
end
end
However, there was a problem with using Sidekiq, the job is executed in the background so it doesn’t inform the main server when it is completed. So we executed the job, got the exported file, and now what? We cannot give the file back to the server to return it to our client.
Polling + Progress
It is a repeating AJAX call to check the job status and a block of text
The first approach is to combine polling (periodically checking on the job via AJAX calls) with progress reporting. This approach is good in case the user wants to export large reports and wants to monitor progress.
In UserController
, We need 2 methods, one for tracking job's status and one for downloading file when job is completed:
class UsersController < ApplicationController
...
def export_status
respond_to do |format|
format.json do
job_id = params[:job_id]
# Check job status and percentage using JobId
job_status = Sidekiq::Status.get_all(job_id).symbolize_keys
render json: {
status: job_status[:status],
percentage: job_status[:pct_complete]
}
end
end
end
def export_download
job_id = params[:id]
exported_file_name = "users_export_#{job_id}.xlsx"
filename = "UserData_#{DateTime.now.strftime("%Y%m%d_%H%M%S")}.xlsx"
respond_to do |format|
format.xlsx do
send_file Rails.root.join("tmp", exported_file_name), type: :xlsx, filename: filename
end
end
end
end
Our Stimulus controller will be looks like this:
import { Controller } from "@hotwired/stimulus"
export default class extends Controller {
static targets = ["download", "status"];
export() {
fetch('/export_user')
.then(response => response.json())
.then(data => {
const jobId = data.jid;
this.statusTarget.textContent = "Exporting ...";
this.timer = setInterval(() => {
this.checkJobStatus(jobId)
}, 1000);
});
}
checkJobStatus(jobId) {
fetch(`/export_status?job_id=${jobId}`)
.then(response => response.json())
.then(data => {
const percentage = data.percentage;
this.statusTarget.textContent = `Exporting ${percentage}%`;
if(data.status == "error") {
this.stopCheckJobStatus();
}else if(data.status === "complete") {
this.stopCheckJobStatus()
this.downloadTarget.href = `/export_download.xlsx?id=${jobId}`;
this.downloadTarget.classList.remove("hidden");
}
})
}
stopCheckJobStatus() {
if(this.timer) {
clearInterval(this.timer);
}
}
disconnect() {
this.stopCheckJobStatus();
}
}
This approach is extendable and flexible. You can implement the progress reporting in a variety of ways: a numeric progress bar, a single “status” message, or even a detailed log of progress.
Action Cable to broadcast
In order to return the csv file to the client, we will create a channel using Action Cable and broadcast the file back when it’s ready.
Use rails g channel VideoConversion
to generate the new channel
class ExportUserChannel < ApplicationCable::Channel
def subscribed
stream_from "export_user"
end
end
Then we will add the code to broadcast our file in the job so it will return the csv when the job is performed.
class ExportUserJob
include Sidekiq::Job
def perform(*args)
....
# Save file into tmp with suffix is jobId
xlsx_package.serialize Rails.root.join("tmp", "users_export_#{self.jid}.xlsx")
ActionCable.server.broadcast "export_user", self.jid
end
end
In controller, we just need 2 methods:
class UsersController < ApplicationController
def export
ExportUserJob.perform_async
head :accepted
end
def export_download
end
end
Next, let’s add the Stimulus controller that will listen for events from this channel and update the UI as it receives them. The important parts are the channel we subscribe to in the export method and the received method
import { Controller } from "@hotwired/stimulus"
import consumer from "../channels/consumer"
export default class extends Controller {
static targets = ["status"];
export() {
fetch('/export_user')
.then(() => {
this.statusTarget.textContent = "Exporting ...";
this.subscribe_channel(this.downloadTarget)
})
.catch((error) => {
console.error('Error:', error);
});
}
subscribe_channel() {
this.channel = consumer.subscriptions.create("ExportUserChannel", {
connected() {
console.log("hello")
},
disconnected() {},
received(data) {
console.log(data)
window.location.href = `/export_download.xlsx?id=${data}`
}
});
}
}
You can find the full source code for this guide on this repo, or for action cable approach in branch export_action-cablel