Tag Archives: I/O scheduling

CFQ IO Scheduler tunning

CFQ IO Scheduler

Recall that the CFQ scheduler was written in response to some potential problems with the deadline controller. In particular, from an interview with Jens Axboe, the developer of CFQ, deadline, and a few other IO schedulers, Jens stated, “While deadline worked great from a latency and hard drive perspective, it had no concept of individual process fairness.” So one process that was doing a great deal of IO could cause other applications to have their IO starve.

The CFQ scheduler created the concept of having queues for each process. These queues are created as needed for a particular process. Also, while perhaps not new, the scheduler divided the concept of IO into two parts, Synchronous IO and Asynchronous IO (AIO). Synchronous IO is important because the application stops running until the IO request is finished. In other words, synchronous IO “blocks” the execution of the application until it is done. This is fairly common in read operations because an application may need to read some input data prior to continuing execution.

On the other hand, AIO allows an application to continue operation because the result of the IO operation returns immediately. So rather than wait for confirmation that the application’s IO request has succeeded as in the case of synchronous IO, for AIO the result is returned immediately even if the IO operation has not yet finished. This is potentially very useful for write operations and allows the application to overlap IO with computation (helps efficiency – that is, it can improve run time).

CFQ was designed to separate out synchronous and asynchronous IO operations, favoring synchronous operations (naturally). It also favors read operations for a couple of reasons: (1) reads have a tendency to block execution because the application needs the data to continue, (2) it’s possible with the elevator approach for schedulers to “starve” a read operation that is fairly far out on disk geometry (near the outside of the disk). By favoring read operations it improves read responsiveness and greatly reduces the possibility of a far-out read starvation.

CFQ goes even further by keeping the concept of deadlines from the Deadline IO Scheduler to prevent IO operations from being starved. Jens’ wrote the deadline scheduler and realized that for good performance for some applications it needed the concept of an IO operation “timing out.” That is, an IO operation may be put into a queue for execution but subsequent IO operations may be put ahead of it in the queue. Therefore the IO request at the end of the queue may never get executed or it’s execution may be seriously delayed. The deadline IO scheduler has the concept of a “time-out” period. If the IO request is not executed in during this period, it will be executed immediately. This keeps IO operations from starving in the queue.

Jens combined all of these concepts along with the per-process concept to create CFQ. It can be rather complicated to define exactly how these concepts interact and it’s beyond this article to go into detail, but understanding the concepts that go into CFQ are very important. This is particularly important if we are going to try tuning the scheduler for performance.

 

Tunable Parameters in CFQ

In addition to open-source being a great way to have access to the code to make changes yourself to adapt it to your requirements, many times the developers of software allowing you to “tune” the application for your situation without having to hack the code base yourself. The IO schedulers in the Linux kernel are no exception. In particular, the CFQ scheduler has 9 parameters for tuning performance. While discussing the parameters can get long, it is worthwhile to take a look at these parameters in a bit more depth:

  1. back_seek_max
    This parameter, given in Kbytes, sets the maximum “distance” for backward seeking. By default, this parameter is set to 16 MBytes. This distance is the amount of space from the current head location to the sectors that are backward in terms of distance. This idea comes from the Anticipatory Scheduler (AS) about anticipating the location of the next request. This parameter allows the scheduler to anticipate requests in the “backward” or opposite direction and consider the requests as being “next” if they are within this distance from the current head location.
  2. back_seek_penalty
    This parameter is used to compute the cost of backward seeking. If the backward distance of a request is just (1/back_seek_penalty) from a “front” request, then the seeking cost of the two requests is considered equivalent and the scheduler will not bias toward one or the other (otherwise the scheduler will bias the selection to “front direction requests). Recall, the CFQ has the concept of elevators so it will try to seek in the current direction as much as possible to avoid the latency associated with a seek. This parameters defaults to 2 so if the distance is only 1/2 of the forward distance, CFQ will consider the backward request to be close enough to the current head location to be “close”. Therefore it will consider it as a forward request.
  3. fifo_expire_async
    This particular parameter is used to set the timeout of asynchronous requests. Recall that CFQ maintains a fifo (first-in, first-out) list to manage timeout requests. In addition, CFQ doesn’t check the expired requests from the fifo queue after one timeout is dispatched (i.e. there is a delay in processing the expired request). The default value for this parameter is 250 ms. A smaller value means the timeout is considered much more quickly than a larger value.
  4. fifo_expire_sync
    This parameter is the same as fifo_expire_async but for synchronous requests. The default value for this parameter is 125 ms. If you want to favor synchronous request over asynchronous requests, then this value should be decreased relative to fifo_expire_asynchronous.
  5. slice_sync
    Remember that when a queue is selected for execution, the queues IO requests are only executed for a certain amount of time (the time_slice) before switching to another queue. This parameter is used to calculate the time slice of the synchronous queue. The default value for this parameter is 100 ms, but this isn’t the true time slice. Rather the time slice is computed from the following: time_slice = slice_sync + (slice_sync / 5 * 4 – io_priority)). If you want the time slice for the synchronous queue to be longer (perhaps you have more synchronous operations), then increase the value of slice_sync.
  6. slice_async
    This parameter is the same as slice_sync but for the asynchronous queue. The default is 40 ms. Notice that synchronous operations are preferred over asynchronous operations.
  7. slice_asyn_rq
    This parameter is used to limit the dispatching of asynchronous requests to the device request-queue in queue’s slice time. This limits the number of asynchronous requests are executed (dispatched). The maximum number of requests that are allowed to be dispatched also depends upon the io priority. The equations for computing the maximum number of requests is, max_nr_requests = 2 * (slice_async_rq + slice_async_rq * (7 – io_priority)). The default for slice_async_rq is 2.
  8. slice_idle
    This parameter is the idle time for the synchronous queue only. In a queue’s time slice (the amount of time operations can be dispatched), when there are no requests in the synchronous queue CFQ will not switch to another queue but will sit idle to wait for the process creating more requests. If there are no new requests submitted within the idle time, then the queue will expire. The default value for this parameter is 8 ms. This parameters can control the amount of time the schedulers waits for synchronous requests. This can be important since synchronous requests tend to block execution of the process until the operation is completed. Consequently, the IO scheduler looks for synchronous requests within the idle window of time that might come from a streaming video application or something that needs synchronous operations.
  9. quantum
    This parameter controls the number of dispatched requests to the device queue, request-device (i.e. the number of requests that are executed or at least sent for execution). In a queue’s time slice, a request will not be dispatched if the number of requests in the device request-device exceeds this parameter. For the asynchronous queue, dispatching the requests is also restricted by the parameter slice_async_rq. The default for this parameter is 4.

You can see that the CFQ scheduler prefers synchronous IO requests. The reason for this is fairly simple – synchronous IO operations block execution. So until that IO operation is executed the application cannot continue to run. These applications can include streaming video or streaming audio (who wants their movie or music to be interrupted?), but there are a great deal more applications that perform synchronous IO.

 

On the other hand, Asynchronous IO (AIO) can be very useful because execution immediately returns to the application immediately without waiting for confirmation that the operation has completed. This allows the application to “overlap” computation and IO. This can be very useful for many operations depending upon the goals and requirements. There is quite a good article that talks about synchronous and asynchronous and blocking and non-blocking IO requests.

This article from: http://www.linux-mag.com/id/7572/

Linux I/O Scheduler

I/O scheduling controls how input/output operations will be submitted to storage.

The Completely Fair Queuing (CFQ) scheduler is the default algorthim in Red Hat Enterprise Linux and RHE family OS like Centos/Fedora and etc.. As the name implies, CFQ maintains a scalable per-process I/O queue and attempts to distribute the available I/O bandwidth equally among all I/O requests. CFQ is well suited for mid-to-large multi-processor systems and for systems which require balanced I/O performance over multiple LUNs and I/O controllers.

The Deadline elevator uses a deadline algorithm to minimize I/O latency for a given I/O request. The scheduler provides near real-time behavior and uses a round robin policy to attempt to be fair among multiple I/O requests and to avoid process starvation. Using five I/O queues, this scheduler will aggressively re-order requests to improve I/O performance.

The NOOP scheduler is a simple FIFO queue and uses the minimal amount of CPU/instructions per I/O to accomplish the basic merging and sorting functionality to complete the I/O. It assumes performance of the I/O has been or will be optimized at the block device (memory-disk) or with an intelligent HBA or externally attached controller.

The Anticipatory elevator introduces a controlled delay before dispatching the I/O to attempt to aggregate and/or re-order requests improving locality and reducing disk seek operations. This algorithm is intended to optimize systems with small or slow disk subsystems. One artifact of using the AS scheduler can be higher I/O latency.

cat /sys/block/sda/queue/scheduler

noop deadline [cfq]

cfq is default on Fedora 19, I mean my current OS 🙂

If you want to change scheduler: echo noop > /sys/block/sda/queue/scheduler

 

A little bit more:

  • noop is often the best choice for memory-backed block devices (e.g. ramdisks) and other non-rotational media (flash) where trying to reschedule I/O is a waste of resources
  • as (anticipatory) is conceptually similar to deadline, but with more heuristics that often improve performance (but sometimes can decrease it)
  • deadline is a lightweight scheduler which tries to put a hard limit on latency
  • cfq tries to maintain system-wide fairness of I/O bandwidth

The default was anticipatory for a long time, and it received a lot of tuning. cfq became the default some while ago, as its performance is reasonable and fairness is a good goal for multi-user systems (and even single-user desktops). For some scenarios — databases are often used as examples, as they tend to already have their own peculiar scheduling and access patterns, and are often the most important service (so who cares about fairness?) — anticipatory has a long history of being tunable for best performance on these workloads, and deadline very quickly passes all requests through to the underlying device.