When booting over the network in th supernetted environment, one may come over the following error:
Rebooting with command: boot /pci@8,700000/pci@3/SUNW,qfe@0,1 Boot device: /pci@8,700000/pci@3/SUNW,qfe@0,1 File and args: 2ae00 Requesting Internet address for 0:3:ba:34:a3:12 SunOS Release 5.8 Version Generic_108528-13 64-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. whoami: no domain name rtioctl: kstr_ioctl failed: error 128 whoami: couldn't add route: error 128. WARNING: nfsdyn_mountroot: NFS3 mount_root failed: error 128 Cannot mount root on /pci@8,700000/pci@3/SUNW,qfe@0,1 fstype nfsdyn panic[cpu0]/thread=10408000: vfs_mountroot: cannot mount root 0000000010407970 genunix:vfs_mountroot+70 (10435c00, 0, 0, 10410918, 10, 14) %l0-3: 0000000010435c00 0000000010439250 000000007e000000 0000000010435e38 %l4-7: 0000000000000000 00000000104136b0 00000000000b7798 0000000000001798 0000000010407a20 genunix:main+94 (10410160, 2000, 10407ec0, 10408030, fff2, 1004ec8c) %l0-3: 0000000000000001 0000000000000001 0000000000000015 0000000000000e9a %l4-7: 0000000010428c38 0000000010462318 00000000000cd4c0 0000000000000540 skipping system dump - no dump device configured rebooting... Resetting ...
This bug is registered as Bug ID 4832595 on Sun. Last time, I've checked it it was in the status "closed, because not a bug". This is actually not true. And unfortunatly we have a "superneted" environment. So I had to help myself.
System panics during adding default route in the
<install_image>/Solaris_8/Tools/Boot/etc/rcS. Look for
the line /sbin/hostconfig -p bootparams 2> /dev/null. At
this point network interface is already up but is configured with the
"classful" netmask. Netmask configuration itself happens a few lines
later. The program /sbin/get_netmask will get netmask via
ICMP type 17 message sent to the server.
So, at the moment of
running hostconfig in the case if your default gateway is
in the other network assuming classful netmask on the interface, you
will get the panic. Sure! You are trying to add as your default gateway a host,
and you don't know, how to reach it - it is in the other network!
Solution? Network mask should be set before configuring the default gateway. Sounds logical, doesn't it?
In theory, you can try to figure out the ip address of the machine to ask the netmask via
hostconfig -p bootparams -n -v or by looking where you've mounted your root partition from.
I have not tried these "clean" ways. I have a "quick-and-durty" hack.
The interface configuration part looks like this:
old_ifs=$IFS
IFS=":"
set -- $net_device_list
for i
do
#
# skip the auto-revarp for the loopback device
#
if [ "$i" = "lo0" ]; then
continue
fi
/sbin/ifconfig $i auto-revarp -trailers >/tmp/dev.$$ 2>&1
ipaddr=`/sbin/ifconfig $i |grep inet |awk '{print $2;}'`
if [ "X$ipaddr" != "X0.0.0.0" ] ; then
# The interface configured itself correctly
echo "Configured interface $i"
/sbin/ifconfig $i up
else
echo "Skipping interface $i"
fi
done
IFS=$old_ifs
Let's rewrite it like this:
old_ifs=$IFS
IFS=":"
set -- $net_device_list
for i
do
#
# skip the auto-revarp for the loopback device
#
if [ "$i" = "lo0" ]; then
continue
fi
/sbin/ifconfig $i auto-revarp -trailers >/tmp/dev.$$ 2>&1
ipaddr=`/sbin/ifconfig $i |grep inet |awk '{print $2;}'`
if [ "X$ipaddr" != "X0.0.0.0" ] ; then
# The interface configured itself correctly
echo "Configured interface $i"
#
# Netmask workaround: set it up right here!
#
if [ -f /tmp/._set_supernet ] ; then
echo "Supernet workaround is applied on the interface $i"
/sbin/ifconfig $i netmask 0xfffffc00 up
else
/sbin/ifconfig $i up
fi
else
echo "Skipping interface $i"
fi
done
IFS=$old_ifs
What happens here? If a semaphore file /tmp/._set_supernet
exists, we set up the netmask right in the script. You know
netmask of your network. If the semaphore doesn't exist, we proceed
normaly.
Now, where to create the semaphore file? This is a long topic itself, and the best source of information is the Blueprints book "JumpStart Technology: Effective Use in the Solaris Operating Environment" by John S. Howard and Alex Noordergraaf. Information about this book is available on the Sun Blueprints pages. I'll just tell you what to do.
When you boot your system, you can pass parameters to the kernel.
Normally you would start your Jumpstart installation like this:
boot net - install nowin. Kernel doesn't proceed all parameters
and they are passed to the init and then to the startup scripts. In the same
rcS script look for the /sbin/getbootargs. After it, you will see
the "case" operator and parameters processing.
So, you can define your own parameter and include it's processing there like this:
supernet)
cat < /dev/null > /tmp/._set_supernet
shift
;;
Now, just add one more parameter to your boot command: boot net - install nowin supernet
and that's it! Again, for the detailed discussion about the parameter's
processing refer to the "Jumpstart" book (ISBN 0-13-062154-4) or to the Sun
Blueprints
articles.
More fun...
Well... It appears not to solve all the problems.
Let's review, how network boot process works. With snoop you should be able
to observer following traffic:
- initial RARP broadcast and response (this is done by OBP)
- TFTP transfer of the inetboot file
- second RARP pair - this time is done by kernel
- BPARAM WHOAMI - kernel gets workstation's parameters
- BPARAM GETFILE root - kernel requests location of the root fs
- MOUNT and NFS traffic
- ..
- somewhere BPARAM WHOAMI - final configuration - is done by hostconfig in rcS
BPARAM dump looks like this (produced with snoop -v)
BPARAM: ----- Boot Parameters ----- BPARAM: BPARAM: Proc = 1 (Who am I?) BPARAM: Client name = client01 BPARAM: Domain name = my.domain.name BPARAM: Router addr = 10.0.0.1 BPARAM:
First, you will get a problem, if the router's address doesn't belong to the classful
network of the machine being installed. Note, this is kernel phase, so you cannot work
around it with rcS patching. You could remove default gateway entry on
the jumpstart server (route delete default <ip.of.your.gateway>
check with netstat -rn) but then you will get a second problem - accessing
root partition.
Be sure, that jumpstart server could be reached directly - this means, it has to belong to the same classful network as you workstation. Alternatevly, you can get a "router" in every classful network and then "route" between classful and classless parts.
Another solution it to separate Boot and Install servers as described in Advanced Installation Guide. This may help, I didn't test it. Finally, you can use DHCP to boot - I'm not sure how buggy is it however.
Last change: Thursday, 28-Aug-2008 10:01:58 MSD