Random Solaris driver developer tips

This page has some hints and tips for folks who are looking to start out in solaris driver development. I suggest you read **ALL** of this file. But for your convenient reference, it has now been converted to HTML, with quick-access jump points.

Sections

Sections so far:

RESOURCES

GENERAL

MUTEXES

DISASTER RECOVERY (and preparation)

PCI REGISTERS

KERNEL DEBUGGING FEATURES

RESOURCES

"Generally speaking", a good top level jumping off point for other information, is on developers.sun.com (Hmm., wonder how long that will last). Particularly of note, is Sun^H^H^H Oracle's Device Driver tutorial.

A "seminal work" for writing solaris device drivers, is called, oddly enough, "Writing Device Drivers". It has been around for a long time, but is still highly valid. Now available on the oracle site, as http://download.oracle.com/docs/cd/E18752_01/html/816-4854/index.html (2010 version, Solaris 10) There are older versions referenced from the developers.sun.com page, above.

There is also the more basic Device Driver Tutorial. Oddly, this seems to have a different docID as the "device driver tutorial" higher up, but looks to have the same table of contents. Very different formatting though, so pick which one you prefer.

And yet another driver oriented primer, Solaris (...) Driver writer orientation.

Since there are many different types of device driver, and a few different ways of thinking about them, you may well benefit from reading all of the above, rather than just picking one. Some may cover particular areas better than others.

GENERAL

The rest of this page has tips that are applicable to all driver development, not just specific types.

A quickie "right mindset" tip to start off: If you run into a bug (either in software or hardware) and things blow up because your error checking is inadequate: do NOT fix the bug yet! You're getting a free "test harness" here, so use it to improve your error checking routines. Once they are bulletproof, THEN go back and fix/work around the error that triggered the problem. This is driver writing, not WIMP programming. Your code *must* continue to work, or at minimum not crash, when EVERYTHING ELSE dies!!!

If you think you're being paranoid about error checking -- you're not being paranoid enough.

MUTEXES

Some general musings on mutexes:

Mutexes arent always the one and only way to go. Sometimes, a semaphore could be simpler.
Usually, mutexes are SLOOOW. So avoid using when possible.
Be really really really careful about where you enter and exit mutexes. Sit down ahead of time, and write down in which functions you plan to use them. This will make it easier for you to see if your mutex scheme makes sense or not
Always try to exit a mutex in the same routine you enter it from. Do NOT exit a mutex from a subroutine. Keeping it at the same level makes it a lot simpler to debug
Dont forget to exit a mutex if you have an early return from a function (eg: an error condition) !
Similiar to the above, sometimes a mutex panic is not directly caused by re-entering a mutex, but by forgetting to exit a mutex the last time you were in a particular function. The OS will then interpret that as similar to recursively entering a mutex, if the thread that held it is now dead.

DISASTER RECOVERY

Its a really really really really really good idea to back up everything you care about, >> BEFORE << starting development on kernel modules.

Make an emergency boot area

Normally, if things completely blow up, you'll have to boot off cdrom. But for a quicker resolution, you can prepare a backup copy of the usual driver environment that you can use. (Assuming your crash hasnt corrupted your filesystems). To make the emergency area, do

# cd /platform/`uname -m`
# cp -r kernel safety

You now have a directory tree starting at /platform/`uname -m`/safety/

At boot-time, the OS automatically goes into the uname -m part. So when you need to , you can choose to do the appropriate alternate boot:

# Intel
b safety/unix
# sparc
boot safety/unix

This assumes that you install your driver into /platform/`uname -m`/kernel/drv. If you put it in /kernel/drv, or /usr/kernel/drv, you're up the creek without a paddle. The Makefile for my PCIbase driver skeleton does this by default.

Separate your root partition

If you're extra paranoid, I highly recommend having separate / and /usr partitions. (and making /usr read-only) Then as root, do a ufsdump of /, and putting it in another filesystem that's easy to get to, like /var, or /export/home. [it'll only be about 30 megs]

If you're not feeling THAT paranoid, then at minimum, do

# cd / ; ufsdump 0f /var/root.dump etc dev devices

This way, rather than restore/reinstall your ENTIRE SYSTEM if something goes wrong, you can probably boot off cdrom, mount your root filesystem, and just use ufsrestore to get back those most critical directories.

== If you've forgotten to do this,and your system blows up, and goes into a loop... Rather than do a reinstall, you might try removing that hardware device, if it is indeed removable. That way, maybe your driver wont attach, and thus wont corrupt the system.

But be warned, your system might be corrupted so much you may have to reinstall anyway.

PCI REGISTERS

This section gives more details on how to map PCI registers. This assumes your hardware HAS "mappable" registers. Otherwise, you'll have to look at sun's "programmable IO" example, "pio", in the sample drivers code online.

You can use the "printregs" script here, to look at the registers for your hardware. printregs is used in relation to a specific device, so you first need to find the pci ID for the device you care about.
PCI IDs look like [vendornumber],[cardnumber]

(OoOOLD school solaris x86) These are present on the old DCA (boot floppy) if you poke around on it. Otherwise, on a running system, you can look at the output of

 prtconf -pv

== from the command line.

"printregs" actually uses that same output, but automatically splits out the register information for you into slightly more readable form.

Example for printregs:

Intel's vendor ID is 8086. So the pci identifier for a particular revision of Intel Pro100 Fast Ethernet card happens to be "8086,c".

# printregs 8086,c

on a system with that card install, gives the following output:

  Looking for device 'pci8086,c'
  We will print the possible register mappings, that can be used with
  ddi_regs_map_setup, and also any physical mappings.
  The physical mappings are only to help you FIGURE OUT what they are
  !!!!  DO NOT TRY TO ACCESS THAT ADDRESS DIRECTLY !!!!!
   -------------------------
  register set 1 has tag 02006810 and length 00001000
  register set 2 has tag 01006814 and length 00000040
  register set 3 has tag 02006818 and length 00100000
  register set 4 has tag 02006830 and length 00100000
  tag 82006810 is mapped at addr ed100000
  tag 81006814 is mapped at addr 0000c800
  tag 82006818 is mapped at addr ed000000
  tag 82006830 is mapped at addr ea000000

If you ignore the leading '8' for the second set of tags, you can match them up to realize that

register set 1 is mapped at physaddr ed100000 for length 0x1000
register set 2 is mapped at physaddr 0000c800 for length 0x0040
register set 3 is mapped at physaddr ed000000 for length 0x00100000
register set 4 is mapped at physaddr ea000000 for length 0x00100000

But as the warnings in the output say, do NOT try to directly do in your driver code,


void*ptr=0x000c801; *ptr=1;

or anything like that. You actually need to use

ddi_regs_map_setup(dip, 2, &reg2_ptr, 0, 0x40, &attrptr, &handleptr);

to map register set 2 into your own kernel memory. You will then have reg2_ptr pointing to the start of it.

But even then, you should not normally directly assign values to the pointer!! (There are some occasions where you can get away with this. But at least for initial testing, ...) You should use one of ddi_put8/ddi_put16/ddi_put32/ddi_put64 to change the memory contents of a mapped register

"man ddi_regs_map_setup" for more details on things like attrptr

KERNEL DEBUGGING FEATURES

First of all, put the following in your /etc/system,and reboot

**************************************************************
* "full" memory checking == 0xf ; "light" checking == 0x100
* CF: /usr/include/sys/kmem_impl.h
set kmem_flags=0x1f

* To auto-generate a coredump on a locked system.
* This is the "deadman kernel" enabler. Allegedly.
* It doesnt work under x86 very well, if at all.
set snooping=1
***************************************************************

When you do have a kernel panic, you can find out the routine in which it crashed by doing

# adb -k /var/crash/`uname -n`/unix.0 /var/crash/`uname -n`/vmcore.0
$C

Yes, type dollarsign, C. You wont have a prompt, but just type it anyway. (You can also use mdb the same way)
See lower down, for more details about $C usage

If you want lots and LOTS of info about the kernelstate, at the time of a crash, you can use

$<threadlist

to look at every single kernel thread.

$<panicbuf

will tell you messages from when the system paniced

::findleaks

in mdb, if you had kmem_flags = 0xf set, will help you with memory leaks. You might also be interested in ::kmastat, and ::memstat

*panicstr/s

should print the "panic string" from a kernel panic, if available.

If you have started the system with "b kadb", there are some other options. [To drop down to debugger in x86, use control-alt-D]

18::more	Turns on screen paging (like running "more")
:c		Continue running UNIX
lbolt/X		Shows "system tick" counter (system "clock" tick,64bit needs /J)
freemem/X	Shows number of free memory pages (64bit needs /J)
$<cpus		Shows thread id on each cpu
$<kmastat	Shows free memory (::kmastat in mdb)
[		Like "next" in gdb
]		Like "step" in gdb
module#f_name:b	Set a "breakpoint" at function f_name(), kernel module 'module'

 === General Memory examination, mdb or adb or kadb == 
addrname/D	Print 4byte int, decimal format
addrname/E	Print 8byte unsigned int, decimal format
addrname/X	Print 4byte unsigned int, hex format
addrname/J	Print 8byte unsigned int, hex format

Note that "man mdb" gives other modifiers to print memory, under the "Formatting dcmds" section.

Unfortunately, the $C trick gives you a location in a format like:

  functionname:+0x104

as the place where it crashed, and you're wondering what the heck line that is. But you CAN FIND OUT, if you grab the GNU binutils, and just install "objdump" from it.

Assuming you compiled your driver with -g,

  objdump -d -S yourdriver

will give you some assembly output, along with your C code in comments, AND hex offsets!

Adjusting variables in a live kernel

If there is a variable you'd like to adjust on the fly, on a running system, you can use either adb or mdb, to change it. Example:

adb -kw (or mdb -kw)
> my_var?W1

sets the contacts of the 4 bytes at symbolic address "my_var" to be '1'.

However, sometimes, you have a non-unique variable name, and thats where mdb comes in handy.
First, use it to find the address of all the symbols in a specific driver. Then you can adjust the address directly. Example:

mdb -kw
> ::nm wacom
0x00000000|0x00000000|NOTY |LOCL |0x0  |UNDEF   |
0xfea08754|0x00000073|FUNC |LOCL |0x0  |1       |wacom_detach
  [....]
0xfed0a50c|0x00000004|OBJT |LOCL |0x0  |4       |debug_level
  [....]

Oh, look! There's my debug_level variable, at address 0xfed0a50c. It's a 4byte value, with current value 0x0. But I want to tweak up the debug_level, without having to reload my driver. No problem..

> 0xfed0a50c?W1

and I've just changed the 4byte value to be 1. Let the debug messages roll!

If you need to write an 8byte value, use Z instead of W

memory allocation debugging

It turns out there is a whole special built-in tool to help debug kernel memory allocation. A detailed writeup can be found in the Solaris Modular Debugger Guide, Kernel Memory Allocator chapter.

Written by:Philip Brown
Visit the Solaris driver pages at bolthole.com. Or Search Bolthole.com