Sunday, April 5, 2009

learning kernel -- 进程调度

====================================
learning kernel -- 进程调度
====================================

linux调度算法
==============

Linux调度特点
--------------------
* Implement fully O(1) scheduling. Every algorithm in the new scheduler completes in constant-time, regardless of the number of running processes.

* Implement perfect SMP scalability. Each processor has its own locking and individual runqueue.

* Implement improved SMP affinity. Attempt to group tasks to a specific CPU and continue to run them there. Only migrate tasks from one CPU to another to resolve imbalances in runqueue sizes.

* Provide good interactive performance. Even during considerable system load, the system should react and schedule interactive tasks immediately.

* Provide fairness. No process should find itself starved of timeslice for any reasonable amount of time. Likewise, no process should receive an unfairly high amount of timeslice.

* Optimize for the common case of only one or two runnable processes, yet scale well to multiple processors, each with many processes.

runqueue结构和O(1) scheduling
--------------------------------------
runqueue的数据结构:
struct runqueue {
spinlock_t lock; /* spin lock that protects this runqueue */
unsigned long nr_running; /* number of runnable tasks */
unsigned long nr_switches; /* context switch count */
unsigned long expired_timestamp; /* time of last array swap */
unsigned long nr_uninterruptible; /* uninterruptible tasks */
unsigned long long timestamp_last_tick; /* last scheduler tick */
struct task_struct *curr; /* currently running task */
struct task_struct *idle; /* this processor's idle task */
struct mm_struct *prev_mm; /* mm_struct of last ran task */
struct prio_array *active; /* active priority array */
struct prio_array *expired; /* the expired priority array */
struct prio_array arrays[2]; /* the actual priority arrays */
struct task_struct *migration_thread; /* migration thread */
struct list_head migration_queue; /* migration queue*/
atomic_t nr_iowait; /* number of tasks waiting on I/O */
};

其中里面active有间接的相关的进程信息,它的数据结构:
struct prio_array {
int nr_active; /* number of tasks in the queues */
unsigned long bitmap[BITMAP_SIZE]; /* priority bitmap */
struct list_head queue[MAX_PRIO]; /* priority queues */
};

Linux就是靠这个实现O(1)算法的,BITMAP_SIZE为5(对于32位机和MAX_PRIO为140的情况),这样就5×32=160位,每一位对应一个priority的任务列,要找出最高priority的任务列,只需要找到第一个为1的位。所以不管进程量有多少,通过这个位就保证了O(1)的算法。
还有一个比较有意思的地方就是存在active和expired两个数据结构。active结构里维护着系统中状态为TASK_RUNNING的进程,然后保证所有这些进程的时间片都用完之后再交给系统重新给每个进程分配时间片。但是一下子给这么多进程重新分配时间片,效率是O(n),还有就是多了你就要考虑锁来锁去,效率更低。linux想了个方法,每当active中一个进程的时间片用完,linux就给他重新分配时间片,然后添加到相应priority的expired结构中。这样当active全部进程的时间片都使用完的时候,只要把active和expired的指针换一下,active结构里又是系统重新分配过时间片的进程。
schedule()是负责从一个进程调度到另一个进程的kernel api

Sleeping and Waking Up
-----------------------
Linux为了使那些IO比较多的进程具有比较好的交互性,选择了给这些进程比较高的priority和比较久的时间片,但是这些进程其实是使用CPU比较少的。当这些进程等待IO的过程中,这些进程暂时就变成TASK_INTERRUPTIBLE(或TASK_UNINTERRUPTIBLE)的状态,处于sleeping中,从runqueue中暂时扔出去加入到waitqueue中,这样下次就不用对这些进程进行调度了。但他们等待到IO结束,它们又从waitqueue中移除加入到runqueue中,由于这些进程的priority比较高,所以马上可以抢占其他进程使自己得到执行,这样就既保证了CPU不浪费,又保证了这些进程的好的交互性。

The Load Balancer
------------------
按照前面所讲,每个CPU都有自己的runqueue,自己调度自己的进程。但是对于多cpu的情况,如果每个cpu上的任务分配不均匀,就会导致cpu利用率低的问题。为此linux还有个loadbalance的策略去平衡每个cpu上的任务数。方法不外乎是把某些具有比较多任务的runqueue元素扔到任务比较少的runqueue里。

No comments: