鸿 网 互 联 www.68idc.cn

当前位置 : 服务器租用 > 服务器相关 > linux > >

《Linux内核修炼之道》精华分享与讨论(15)——子系统的初始化

来源:互联网 作者:佚名 时间:2018-02-05 15:25
首先感谢国家。其次感谢上大的钟莉颖,让我知道了大学不仅有校花,还有校鸡,而且很多时候这两者其实没什么差别。最后感谢清华女刘静,让我深刻体会到了素质教育的重要性,让我感到有责任写写子系统的初始化。 各个子系统的初始化是内核整个初始化过程必然要

 首先感谢国家。其次感谢上大的钟莉颖,让我知道了大学不仅有校花,还有校鸡,而且很多时候这两者其实没什么差别。最后感谢清华女刘静,让我深刻体会到了素质教育的重要性,让我感到有责任写写子系统的初始化。

各个子系统的初始化是内核整个初始化过程必然要完成的基本任务,这些任务按照固定的模式来处理,可以归纳为两个部分:内核选项的解析以及那些子系统入口(初始化)函数的调用。

内核选项

Linux允许用户传递内核配置选项给内核,内核在初始化过程中调用parse_args函数对这些选项进行解析,并调用相应的处理函数。

parse_args函数能够解析形如 变量名=值 的字符串,在模块加载时,它也会被调用来解析模块参数。

内核选项的使用格式同样为 变量名=值 ,打开系统的grub文件,然后找到kernel行,比如:

  kernel /boot/vmlinuz-2.6.18 root=/dev/sda1 ro splash=silent vga=0x314 pci=noacpi

其中的 pci=noacpi 等都表示内核选项。

内核选项不同于模块参数,模块参数通常在模块加载时通过 变量名=值 的形式指定,而不是内核启动时。如果希望在内核启动时使用模块参数,则必须添加模块名做为前缀,使用 模块名.参数=值 的形式,比如,使用下面的命令在加载usbcore时指定模块参数autosuspend的值为2。

 $ modprobe usbcore autosuspend=2

若是在内核启动时指定,则必须使用下面的形式:

 usbcore.autosuspend=2

从Documentation/kernel-parameters.txt文件里可以查询到某个子系统已经注册的内核选项,比如PCI子系统注册的内核选项为:
 pci=option[,option...] [PCI] various PCI subsystem options:
  off [X86-32] don't probe for the PCI bus
  bios [X86-32] force use of PCI BIOS, don't access
  the hardware directly. Use this if your machine
  has a non-standard PCI host bridge.
  nobios [X86-32] disallow use of PCI BIOS, only direct
  hardware access methods are allowed. Use this
  if you experience crashes upon bootup and you
  suspect they are caused by the BIOS.
  conf1 [X86-32] Force use of PCI Configuration
  Mechanism 1.
  conf2 [X86-32] Force use of PCI Configuration
  Mechanism 2.
  nommconf [X86-32,X86_64] Disable use of MMCONFIG for PCI
  Configuration
  nomsi [MSI] If the PCI_MSI kernel config parameter is
  enabled, this kernel boot option can be used to
  disable the use of MSI interrupts system-wide.
  nosort [X86-32] Don't sort PCI devices according to
  order given by the PCI BIOS. This sorting is
  done to get a device order compatible with
  older kernels.
  biosirq [X86-32] Use PCI BIOS calls to get the interrupt
  routing table. These calls are known to be buggy
  on several machines and they hang the machine
    when used, but on other computers it's the only
  way to get the interrupt routing table. Try
  this option if the kernel is unable to allocate
  IRQs or discover secondary PCI buses on your
  motherboard.
  rom [X86-32] Assign address space to expansion ROMs.
  Use with caution as certain devices share
  address decoders between ROMs and other
  resources.
  irqmask=0xMMMM [X86-32] Set a bit mask of IRQs allowed to be
  assigned automatically to PCI devices. You can
  make the kernel exclude IRQs of your ISA cards
  this way.
  pirqaddr=0xAAAAA [X86-32] Specify the physical address
  of the PIRQ table (normally generated
  by the BIOS) if it is outside the
  F0000h-100000h range.
  lastbus=N [X86-32] Scan all buses thru bus #N. Can be
  useful if the kernel is unable to find your
  secondary buses and you want to tell it
  explicitly which ones they are.
  assign-busses [X86-32] Always assign all PCI bus
  numbers ourselves, overriding
  whatever the firmware may have done.
  usepirqmask [X86-32] Honor the possible IRQ mask stored
  in the BIOS $PIR table. This is needed on
  some systems with broken BIOSes, notably
  some HP Pavilion N5400 and Omnibook XE3
  notebooks. This will have no effect if ACPI
  IRQ routing is enabled.
  noacpi [X86-32] Do not use ACPI for IRQ routing
  or for PCI scanning.
  routeirq Do IRQ routing for all PCI devices.
  This is normally done in pci_enable_device(),
  so this option is a temporary workaround
  for broken drivers that don't call it.
  firmware [ARM] Do not re-enumerate the bus but instead
  just use the configuration from the
  bootloader. This is currently used on
  IXP2000 systems where the bus has to be
  configured a certain way for adjunct CPUs.
  noearly [X86] Don't do any early type 1 scanning.
  This might help on some broken boards which
 machine check when some devices' config space
  is read. But various workarounds are disabled
  and some IOMMU drivers will not work.
  bfsort Sort PCI devices into breadth-first order.
  This sorting is done to get a device
  order compatible with older ( = 2.4) kernels.
  nobfsort Don't sort PCI devices into breadth-first order.
  cbiosize=nn[KMG] The fixed amount of bus space which is
  reserved for the CardBus bridge's IO window.
  The default value is 256 bytes.
  cbmemsize=nn[KMG] The fixed amount of bus space which is
  reserved for the CardBus bridge's memory
  window. The default value is 64 megabytes.

注册内核选项

就像我们不需要明白钟莉颖是如何走上校鸡的修炼之道,我们也不必理解parse_args函数的实现细节。但我们必须知道如何注册内核选项:模块参数使用module_param系列的宏注册,内核选项则使用__setup宏来注册。

__setup宏在include/linux/init.h文件中定义。

171 #define __setup(str, fn) \
172   __setup_param(str, fn, fn, 0)

__setup需要两个参数,其中str是内核选项的名字,fn是该内核选项关联的处理函数。__setup宏告诉内核,在启动时如果检测到内核选项str,则执行函数fn。str除了包括内核选项名字之外,必须以 = 字符结束。

不同的内核选项可以关联相同的处理函数,比如内核选项netdev和ether都关联了netdev_boot_setup函数。

除了__setup宏之外,还可以使用early_param宏注册内核选项。它们的使用方式相同,不同的是,early_param宏注册的内核选项必须要在其他内核选项之前被处理。

两次解析

相应于__setup宏和early_param宏两种注册形式,内核在初始化时,调用了两次parse_args函数进行解析。

parse_early_param();
parse_args( Booting kernel , static_command_line, __start___param,
 __stop___param - __start___param,
  unknown_bootoption);

parse_args的第一次调用就在parse_early_param函数里面,为什么会出现两次调用parse_args的情况?这是因为内核选项又分成了两种,就像现实世界中的我们,一种是普普通通的,一种是有特权的,有特权的需要在普通选项之前进行处理。

现实生活中特权的定义好像很模糊,不同的人有不同的诠释,比如哈医大二院的纪委书记在接受央视的采访 老人住院费550万元 时如是说: 我们就是一所人民医院 就是一所贫下中农的医院,从来不用特权去索取自己身外的任何利益 我们不但没有多收钱还少收了。
人生就是如此的复杂和奇怪。内核选项相对来说就要单纯得多,特权都是阳光下的,不会藏着掖着,直接使用early_param宏去声明,让你一眼就看出它是有特权的。使用early_param声明的那些选项就会首先由parse_early_param去解析。

 

网友评论
<