Export large file in Rails with Sidekiq

steve lee
·
·
最初发布于 hungle00.github.io

In some systems, you'll have to do something very slow. Maybe you need to export a large dataset to a CSV file, or need to generate a thousand row Exel file that takes some minutes to complete.

Instead of try building these kinds of reports in a normal Rails controller action, this task should be moved into a background job for processing outside the life cycle of a single HTTP request.

But when you move work to a background job, code execution does not stop and you need a way to “connect” back to the user that caused the job to be created. It’s outside of a single request-response life cycle so you cannot rely on normal Rails controllers.

We often need a way to report back progress and send back information once the job is completed.

Example: Export large xlsx file

This sample uses Sidekiq for Backgroud Job and a Stimulus controller to handle client-side polling/updating.
First, we use this command to generate new job rails generate sidekiq:job ExportUser.

Now we move the export file logic to the export_job.rb file, and call the job inside the controller instead.

class UsersController < ApplicationController
  def export
    respond_to do |format|
      format.json do
        job_id = ExportUserJob.perform_async

        render json: {
          jid: job_id
        }
      end
    end
  end

So our controller will now be like this, the csv generation will now be handle in the background instead of in our main app

class ExportUserJob
  include Sidekiq::Job
  include Sidekiq::Status::Worker

  def perform(*args)
    users = User.pluck :id, :name, :email, :address, :phone
    total users.size
    # Create Axlsx package and workbook
    xlsx_package = Axlsx::Package.new
    xlsx_workbook = xlsx_package.workbook
    
    xlsx_workbook.add_worksheet(name: "Users") do |worksheet|
      worksheet.add_row %w(ID Name Email Address Phone)
      
      users.each.with_index(1) do |user, idx|
        worksheet.add_row user
        # Caculate percentage 
        at idx
      end
    end
    
    # Save file into tmp with suffix is jobId
    xlsx_package.serialize Rails.root.join("tmp", "users_export_#{self.jid}.xlsx")
  end
end

However, there was a problem with using Sidekiq, the job is executed in the background so it doesn’t inform the main server when it is completed. So we executed the job, got the exported file, and now what? We cannot give the file back to the server to return it to our client.

Polling + Progress

It is a repeating AJAX call to check the job status and a block of text

The first approach is to combine polling (periodically checking on the job via AJAX calls) with progress reporting. This approach is good in case the user wants to export large reports and wants to monitor progress.

In UserController, We need 2 methods, one for tracking job's status and one for downloading file when job is completed:

class UsersController < ApplicationController
 ...

  def export_status
    respond_to do |format|
      format.json do
        job_id = params[:job_id]
        # Check job status and percentage using JobId
        job_status = Sidekiq::Status.get_all(job_id).symbolize_keys

        render json: {
          status: job_status[:status],
          percentage: job_status[:pct_complete]
        }
      end
    end
  end
  
  def export_download
    job_id = params[:id]
    exported_file_name = "users_export_#{job_id}.xlsx"
    filename = "UserData_#{DateTime.now.strftime("%Y%m%d_%H%M%S")}.xlsx"

    respond_to do |format|
      format.xlsx do
        send_file Rails.root.join("tmp", exported_file_name), type: :xlsx, filename: filename
      end
    end
  end
end

Our Stimulus controller will be looks like this:

import { Controller } from "@hotwired/stimulus"

export default class extends Controller {
  static targets = ["download", "status"];

  export() {
    fetch('/export_user')
      .then(response => response.json())
      .then(data => {
        const jobId = data.jid;
        this.statusTarget.textContent = "Exporting ...";

        this.timer = setInterval(() => {
          this.checkJobStatus(jobId)
        }, 1000);
      });
  }

  checkJobStatus(jobId) {
    fetch(`/export_status?job_id=${jobId}`)
      .then(response => response.json())
      .then(data => {
        const percentage = data.percentage;
        this.statusTarget.textContent = `Exporting ${percentage}%`;
        if(data.status == "error") {
            this.stopCheckJobStatus();
        }else if(data.status === "complete") {
          this.stopCheckJobStatus()
          this.downloadTarget.href = `/export_download.xlsx?id=${jobId}`;
          this.downloadTarget.classList.remove("hidden");
        }
      })
  }

  stopCheckJobStatus() {
    if(this.timer) {
      clearInterval(this.timer);
    }
  }

  disconnect() {
    this.stopCheckJobStatus();
  }
}

This approach is extendable and flexible. You can implement the progress reporting in a variety of ways: a numeric progress bar, a single “status” message, or even a detailed log of progress.

Action Cable to broadcast

In order to return the csv file to the client, we will create a channel using Action Cable and broadcast the file back when it’s ready. Use rails g channel VideoConversion to generate the new channel

class ExportUserChannel < ApplicationCable::Channel
  def subscribed
    stream_from "export_user"
  end
end

Then we will add the code to broadcast our file in the job so it will return the csv when the job is performed.

class ExportUserJob
  include Sidekiq::Job

  def perform(*args)
    ....
    # Save file into tmp with suffix is jobId
    xlsx_package.serialize Rails.root.join("tmp", "users_export_#{self.jid}.xlsx")

    ActionCable.server.broadcast "export_user", self.jid
  end
end

In controller, we just need 2 methods:

class UsersController < ApplicationController
  def export
    ExportUserJob.perform_async
    head :accepted
  end

  def export_download
  end
end

Next, let’s add the Stimulus controller that will listen for events from this channel and update the UI as it receives them. The important parts are the channel we subscribe to in the export method and the received method

import { Controller } from "@hotwired/stimulus"
import consumer from "../channels/consumer"

export default class extends Controller {
  static targets = ["status"];

  export() {
    fetch('/export_user')
      .then(() => {
        this.statusTarget.textContent = "Exporting ...";
        this.subscribe_channel(this.downloadTarget)
      })
      .catch((error) => {
        console.error('Error:', error);
      });
  }

  subscribe_channel() {
    this.channel = consumer.subscriptions.create("ExportUserChannel", {
      connected() {
        console.log("hello")
      },

      disconnected() {},

      received(data) {
        console.log(data)
        window.location.href = `/export_download.xlsx?id=${data}`
      }
    });
  }
} 

You can find the full source code for this guide on this repo, or for action cable approach in branch export_action-cablel

References

1
评论
登录后评论

I think it can create a Model to store the export status, so that it can be queried at any time.

Then use the turbo stream broadcast to update the page.

I recently recorded a video (in Chinese) showing the usage of turbo stream broadcast: https://ruby-china.org/topics/42959

社区准则 博客 联系 社区 状态
主题