Random Solaris driver developer tips

This page has some hints and tips for folks who are looking to start out in solaris driver development. I suggest you read **ALL** of this file. But for your convenient reference, it has now been converted to HTML, with quick-access jump points.

Sections

Sections so far:
GENERAL
MUTEXES
DISASTER RECOVERY (and preparation)
PCI REGISTERS
KERNEL DEBUGGING FEATURES

GENERAL

"Generally speaking", most information you will need that is specific to a particular type of driver, can be found at Sun's Driver Development site. But the rest of this page has tips that are applicable to all driver development, not just specific types.

A quickie "right mindset" tip to start off: If you run into a bug (either in software or hardware) and things blow up because your error checking is inadequate: do NOT fix the bug yet! You're getting a free "test harness" here, so use it to improve your error checking routines. Once they are bulletproof, THEN go back and fix/work around the error that triggered the problem. This is driver writing, not WIMP programming. Your code *must* continue to work, or at minimum not crash, when EVERYTHING ELSE dies!!!

If you think you're being paranoid about error checking -- you're not being paranoid enough.

MUTEXES

Some general musings on mutexes:

DISASTER RECOVERY

Its a really really really really really good idea to back up everything you care about, >> BEFORE << starting development on kernel modules.

Make an emergency boot area

Normally, if things completely blow up, you'll have to boot off cdrom. But for a quicker resolution, you can prepare a backup copy of the usual driver environment that you can use. (Assuming your crash hasnt corrupted your filesystems). To make the emergency area, do

# cd /platform/`uname -m`
# cp -r kernel safety
You now have a directory tree starting at /platform/`uname -m`/safety/

At boot-time, the OS automatically goes into the uname -m part. So when you need to , you can choose to do the appropriate alternate boot:

# Intel
b safety/unix
# sparc
boot safety/unix

This assumes that you install your driver into /platform/`uname -m`/kernel/drv. If you put it in /kernel/drv, or /usr/kernel/drv, you're up the creek without a paddle. The Makefile for my PCIbase driver skeleton does this by default.

Separate your root partition

If you're extra paranoid, I highly recommend having separate / and /usr partitions. (and making /usr read-only) Then as root, do a ufsdump of /, and putting it in another filesystem that's easy to get to, like /var, or /export/home. [it'll only be about 30 megs]

If you're not feeling THAT paranoid, then at minimum, do

# cd / ; ufsdump 0f /var/root.dump etc dev devices
This way, rather than restore/reinstall your ENTIRE SYSTEM if something goes wrong, you can probably boot off cdrom, mount your root filesystem, and just use ufsrestore to get back those most critical directories.

== If you've forgotten to do this,and your system blows up, and goes into a loop... Rather than do a reinstall, you might try removing that hardware device, if it is indeed removable. That way, maybe your driver wont attach, and thus wont corrupt the system.

But be warned, your system might be corrupted so much you may have to reinstall anyway.

PCI REGISTERS

This section gives more details on how to map PCI registers. This assumes your hardware HAS "mappable" registers. Otherwise, you'll have to look at sun's "programmable IO" example, "pio", in the sample drivers code online.

You can use the "printregs" script in this directory to look at the registers for your hardware. You first need to find the pci ID for it. PCI IDs are [vendornumber],[cardnumber]

You can do this either by going through the "device tasks" option of the DCA (boot floppy) to look at details, or by running

 prtconf -pv
from the command line, and looking through the results. "printregs" actually uses that same output, but automatically splits out the register information for you.


Example for printregs:

Intel's vendor ID is 8086. So the pci identifier for a particular revision of Intel Pro100 Fast Ethernet card happens to be "8086,c".

# printregs 8086,c

on a system with that card install, gives the following output:

  Looking for device 'pci8086,c'
  We will print the possible register mappings, that can be used with
  ddi_regs_map_setup, and also any physical mappings.
  The physical mappings are only to help you FIGURE OUT what they are
  !!!!  DO NOT TRY TO ACCESS THAT ADDRESS DIRECTLY !!!!!
   -------------------------
  register set 1 has tag 02006810 and length 00001000
  register set 2 has tag 01006814 and length 00000040
  register set 3 has tag 02006818 and length 00100000
  register set 4 has tag 02006830 and length 00100000
  tag 82006810 is mapped at addr ed100000
  tag 81006814 is mapped at addr 0000c800
  tag 82006818 is mapped at addr ed000000
  tag 82006830 is mapped at addr ea000000

If you ignore the leading '8' for the second set of tags, you can match them up to realize that
register set 1 is mapped at physaddr ed100000 for length 0x1000
register set 2 is mapped at physaddr 0000c800 for length 0x0040
register set 3 is mapped at physaddr ed000000 for length 0x00100000
register set 4 is mapped at physaddr ea000000 for length 0x00100000
But as the warnings say, do NOT try to directly do void*ptr=0x000c801; *ptr=1; or anything like that. You actually need to use
ddi_regs_map_setup(dip, 2, &reg2_ptr, 0, 0x40, &attrptr, &handleptr);
to map register set 2 into your own kernel memory. You will then have reg2_ptr pointing to the start of it.

But even then, you can't really directly assign values to the pointer!! You should use one of ddi_put8/ddi_put16/ddi_put32/ddi_put64 to change the memory contents of a mapped register

"man ddi_regs_map_setup" for more details on things like attrptr

KERNEL DEBUGGING FEATURES

First of all, put the following in your /etc/system,and reboot
**************************************************************
* "full" memory checking == 0xf ; "light" checking == 0x100
* CF: /usr/include/sys/kmem_impl.h
set kmem_flags=0x1f

* To auto-generate a coredump on a locked system.
* This is the "deadman kernel" enabler. Allegedly.
* It doesnt work under x86 very well, if at all.
set snooping=1
***************************************************************

When you do have a kernel panic, you can find out the routine in which it crashed by doing
# adb -k /var/crash/`uname -n`/unix.0 /var/crash/`uname -n`/vmcore.0
$C
Yes, type dollarsign, C. You wont have a prompt, but just type it anyway. (You can also use mdb the same way)

If you want lots and LOTS of info about the kernelstate, at the time of a crash, you can use

$<threadlist
to look at every single kernel thread.
$<panicbuf	
will tell you messages from when the system paniced

::findleaks
in mdb, if you had kmem_flags = 0xf set, will help you with memory leaks. You might also be interested in ::kmastat, and ::memstat
*panicstr/s
should print the "panic string" from a kernel panic, if available.

If you have started the system with "b kadb", there are some other options. [To drop down to debugger in x86, use control-alt-D]

18::more	Turns on screen paging (like running "more")
:c		Continue running UNIX
$<cpus		Shows thread on each cpu
freemem/X	Shows number of free memory pages
$<kmastat	Shows free memory
lbolt/X		Shows "system tick" counter (like system "clock" tick)
[		Like "next" in gdb
]		Like "step" in gdb
module#f_name:b	Set a "breakpoint" at function f_name(), kernel module 'module'


Unfortunately, the $C trick gives you a location in a format like:

  functionname:+0x104
as the place where it crashed, and you're wondering what the heck line that is. But you CAN FIND OUT, if you grab the GNU binutils, and just install "objdump" from it.

Assuming you compiled your driver with -g,

  objdump -d -S yourdriver
will give you some assembly output, along with your C code in comments, AND hex offsets!

Adjusting variables live

If there is a variable you'd like to adjust on the fly, on a running system, you can use either adb or mdb, to change it. Example:

adb -kw (or mdb -kw)
> my_var?X1
sets the contacts of the 4 bytes at symbolic address "my_var" to be '1'.

However, sometimes, you have a non-unique variable name, and thats where mdb comes in handy.
First, use it to find the address of all the symbols in a specific driver. Then you can adjust the address directly. Example:

mdb -kw
> ::nm wacom
0x00000000|0x00000000|NOTY |LOCL |0x0  |UNDEF   |
0xfea08754|0x00000073|FUNC |LOCL |0x0  |1       |wacom_detach
  [....]
0xfed0a50c|0x00000004|OBJT |LOCL |0x0  |4       |debug_level
  [....]

Oh, look! There's my debug_level variable, at address 0xfed0a50c. It's a 4byte value, with current value 0x0. But I want to tweak up the debug_level, without having to reload my driver. No problem..

> 0xfed0a50c?W1

and I've just changed the 4byte value to be 1. Let the debug messages roll!



Written by:Philip Brown
Visit the Solaris driver pages at bolthole.com. Or Search Bolthole.com