Saturday, October 14, 2017

HPE are the worst.

Everyone knows that HP is just terrible, right? The latest one that bit me is this:

ProLiant G7 Series Servers -VMWare ESXi 5.5 and 6.0 Host Loses Connection to the Network

What they say is that, basically, if you want to use VMware on a BL465 G7 and you've made the stupid decision to upgrade the firmware on that BL465 G7, bad luck. It's now a brick. And HPE don't care - "Note: HPE will not provide any additional fixes moving forward."  This is the same BL465G7 that I had to fight with to upgrade the Power Controller firmware

There are two problems. First, they say you need firmware 4.9.416.15, which was (Yay me for tracking it down!) almost impossible to find. Then they explicitly say not to use the HPE Images. Sigh.

But, Good News, Everyone! I did the hard work for you. The Emulex Firmware 4.9.416.15 is contained in Legacy_OneConnect-Flash-10.7.110.38. Just download that ISO, boot your blade from it, and it'll upgrade the firmware of the NC551i to 4.9.416.15

That's step one.  But it's still not going to work, as the drivers on the HPE ISOs are wrong, and come up with 'No Network Detected'.  So you need to roll your own ISO, and go down the chain of problems and issues that I had. But, if you don't want to actually do it, I've (again) done the hard work for you, so here is the complete, working ISO (build 5310538)

Here's what I had to do (and this is assuming you have the powercli tools already installed)

Get the bits and pieces from VMware.

1. The Latest VMW 6.5 HPE custom image
2. The elxnet 10.7.110.44 driver (Which works in 6.5) specified by HPE

Open up the driver, and extract the -offline-bundle.zip file and put that and the image into C:\HP (or wherever. I put them into C:\HP because I'm lazy. If you change the path, please engage brain before typing.)

Import them into Powershell

Remember, you can use tab expansion for the URL, you don't need to type them all in.

Add-EsxSoftwareDepot -DepotUrl C:\HP\VMware-ESXi-6.5.0-5310538-HPE-650.10.1.3.5-Oct2017-depot.zip
Add-EsxSoftwareDepot -DepotUrl C:\HP\VMW-ESX-6.0.0-elxnet-10.7.110.44-offline_bundle-4014430.zip

Make your new Image

You clone the HPE image to make your own. Let's call it 'BL465G7' to be imaginative

New-EsxImageProfile -CloneProfile HPE* -vendor xrobau -name "BL465G7"

Update (Downgrade, actually) the driver

You should now see both elxnet drivers

PowerCLI C:\hp> Get-EsxSoftwarePackage | findstr elx
elx-esx-libelxima.so     11.2.1238.0-03                 ELX        2/05/2017 4:2...
elxiscsi                 11.2.1238.0-1OEM.650.0.0.45... EMU        2/05/2017 4:2...
emulex-esx-elxnetcli     11.1.28.0-0.0.4564106          VMware     27/10/2016 4:...
elxnet                   10.7.110.44-1OEM.600.0.0.27... EMU        3/06/2016 7:4...
elxnet                   11.2.1149.0-1OEM.650.0.0.42... EMU        2/11/2016 8:3...

We want to REMOVE the one that's in the clone, and replace it with the older one

Remove-EsxSoftwarePackage BL465G7 elxnet
Add-EsxSoftwarePackage -imageprofile BL465G7 -SoftwarePackage "elxnet 10.7.110.44-1OEM*"

Fix HP's other mistakes

Well, we can't fix ALL of them. They've made so many. So, so many. But we can fix the one that's causing our machine to PSOD.  There's something broken with the Mellanox 4 drivers in the HP Image that causes the machine to PSOD on the HPE image. So you need to remove them.

Remove-EsxSoftwarePackage BL465G7 nmlx4-en
Remove-EsxSoftwarePackage BL465G7 nmlx4-core
Remove-EsxSoftwarePackage BL465G7 nmlx4-rdma

There's also an ongoing issue with the 'smx-provider' (Smart Array, aka your P410i) which also causes PSOD's, so rip that out, too.

Remove-EsxSoftwarePackage BL465G7 smx-provider

Finally, Build your ISO

Export-EsxImageProfile -ImageProfile BL465G7 -ExportToIso -filepath c:\HP\VMware-ESXi-BL465.G7-6.5.0

Just boot from that ISO, and you're off and running. No PSODs, and a NC551i that works!







Monday, December 5, 2016

Replacing disks in the Thecus ZFS Nas

Expanding on my previous post about the Thecus, I'm finally at the stage where I'm going to be swapping out some of the disks. To start with, I had left an empty slot in the chassis, so I've thrown a new 4TB drive in there (/dev/sdi)

I then want to replace one of my existing smaller disks (I picked /dev/sdd, which was a 500gb drive) with the new one.

ZFS loves this sort of stuff.

root@thecus:~# zpool status
  pool: n8800pool
 state: ONLINE
  scan: scrub repaired 0 in 5h6m with 0 errors on Sun Nov 13 05:30:41 2016
config:

        NAME        STATE     READ WRITE CKSUM
        n8800pool   ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdh     ONLINE       0     0     0

errors: No known data errors
root@thecus:~# zpool replace n8800pool sdd sdi
invalid vdev specification
use '-f' to override the following errors:
/dev/sdi does not contain an EFI label but it may contain partition
information in the MBR.
root@thecus:~# zpool replace -f n8800pool sdd sdi
root@thecus:~# zpool status
  pool: n8800pool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Dec  6 12:27:22 2016
    248M scanned out of 1.94T at 4.50M/s, 125h42m to go
    35.1M resilvered, 0.01% done
config:

        NAME             STATE     READ WRITE CKSUM
        n8800pool        ONLINE       0     0     0
          raidz2-0       ONLINE       0     0     0
            sdb          ONLINE       0     0     0
            sdc          ONLINE       0     0     0
            replacing-2  ONLINE       0     0     0
              sdd        ONLINE       0     0     0
              sdi        ONLINE       0     0     0  (resilvering)
            sde          ONLINE       0     0     0
            sdf          ONLINE       0     0     0
            sdg          ONLINE       0     0     0
            sdh          ONLINE       0     0     0

errors: No known data errors
root@thecus:~#

Now after performing this for every 500gb device (sigh) you just need to expand them with 'zpool online -e sdi' or whatever the new device name is.

Saturday, October 15, 2016

Updating firmware on BL465c G7

HP Hardware is really good. I have a couple of C7000 and C3000 blade chassis, because they're amazing value for money, and are super reliable.

So yeah. HP Hardware = Really good.

HP Software and Firmware, on the other hand, is amazingly, terribly AMAZINGLY bad.

I spent far too much time this morning trying to upgrade the BIOS on a HP465c G7 Blade. It didn't work.

I'd get the error  "Can't execute the Program table" when running the BIOS upgrade, and that was it.

After doing a bunch of research, I came to the following conclusions:
  1. HP knows it doesn't work. 
  2. HP previously had published the fix. 
  3. HP have removed the fix (for windows). 
  4. HP proudly proclaim that the fix won't be made available for Windows again, and use the Linux fix. 
  5. The Linux fix doesn't exist.
Did I mention how awful HP is?

Anyway, here's how to solve it.
  • You'll have to coldboot your blade (as in, physically pull it out of the chassis. Sorry)
  • Download this ISO - That was made by a user in this thread on the HP forums. I can only assume that HP missed it, because it's one of the few useful things in the entire place.
  • Boot from that ISO. It will boot and look like this
  • Select yes. Let it flash.
  • You will get kicked out of ILO. This wasn't an issue, I immediately reconnected without a problem.
  • You'll then see this screen. Do what it says.  (Note the ISO does *not* shut the machine down, you'll have to do that manually - shut it down, and then pull the blade)
  • Re-run the SPP and everything will work.
Note that I am using the SPP with the exact filename of SPP2015100.2015_0921.6.iso to do this, and it works.

Sunday, April 24, 2016

Thecus N8800 Pro reborn anew, with ZFS.

I have a Thecus N8800 Pro that I picked up for $100 on eBay. (No, not the new N8800 ProV2, I have the old and busted one). I was thinking that something with 8 SATA bays would HAVE to be useful. Unfortunately, I was quite disappointed when I saw that the latest release for the old one was in 2013 (3 years ago!), and nothing had happened to the firmware since then.

However, it's a NAS, how hard can it be to make it work and just leave it alone?

Well, it seems like it's quite hard.  For some reason known only to themselves and their crazy programmers, Thecus had hard-coded the number of iSCSI connections that the NAS could handle to 8.  With multipathing, that meant that you could only connect 4 devices to it, and no more.

That left me somewhat disheartened with the device and I exported a few disks via NFS (which was already unusually slow), but pretty much gave up on it.

That was, until Ubuntu 16.04 turned up, with built in ZFS and clever things like that, *and* a nice easy way to install it. So here's the walkthrough.

Parts Required

1 x N8800
1 x Screwdriver to undo the screws holding the top on
1 x USB Device to turn into a Ubuntu 16.04 Installer
1 x USB Device to run the system from
1 x Ubuntu Server 16.04 LTS amd64 ISO file
1 x Something to talk to the serial port of the Thecus

I happened to have a couple of identical 16GB USB2 thumb drives lying around, but it looks like even a 4G would be sufficient. Note that something that small is unlikely to be USB2, and will be USB1, and will take *forever* to install. Go spend a few dollars and get a few USB3 thumb drives. They won't work at USB3 speed, but they'll at least be faster than that USB1 thing you found behind the couch.

Serial Port?

Yeah. Serial port. Unless you're extremely lucky, your Thecus doesn't have a VGA port. If you don't have anything that can talk serial, your other option is grabbing a cheap VGA card and plugging it into the onboard PCIe slot. (If  you do do that, then just install as per normal, and you can skip down to the 'Set up ZFS' section below.  If you're not comfortable messing around with serial connections, that may be the best idea)

This is, actually, the fiddliest bit, if you don't have the correct cable.  If you *don't* have the correct cable, but you can join a few wires together, you need to make a very basic null modem cable. Simply join pin 3 from one end to pin 2 on the other end (both ways), and pin 5 to pin 5.  That's all you need. (That's sending the 'Transmit' from each side to the 'Receive' of each other, and then joining up the ground wire)

Preparing your N8800

Remove all the HDD's in it. You *will* be able to access the data on it when you finish, but as soon as you do, you're going to blow it all away anyway. So make sure everything's  backed up and copied across somewhere else. You then need to open it up, and remove the little flash drive. This is what it looks like:

Just pull it directly up, wiggling it from side to side. It'll pop off. Discard it, you'll never need it again. It's an interesting design, actually. It's an old-style Parallel ATA (PATA) interface, with two 128MB drives as the master and slave.

These days you'd just use a USB device.. Oh wait, that's what we're doing next!

Create a bootable USB drive

This is pretty easy. Download the Ubuntu 16.04 ISO and then (if you're on Windows) follow the 'How to make a bootable USB drive' instruction here. Don't use the DD method, as you need to edit some files slightly.

Enable serial installation on the drive

This is extremely simple.  When you look at your USB drive, it'll have a file called 'syslinux.cfg' in the root. Add this line to the start:

SERIAL 0 115200 0x003

That must be the first line in the file, before the 'DEFAULT loadconfig' option.

Then, because I was lazy, I bypassed all the menu options, because I knew what I wanted to do. I changed the 'CONFIG' line to point straight to the 'txt.cfg' file. This is what my 'syslinux.cfg' file looked like:

SERIAL 0 115200 0x003
DEFAULT loadconfig
LABEL loadconfig
  CONFIG /isolinux/txt.cfg
  APPEND /isolinux

Then opening up the /isolinux/txt.cfg file, I had to tell Linux that I wanted to use a serial console, too:

default install
label install
 menu label ^Install Ubuntu Server
 kernel /install/vmlinuz
 append file=/cdrom/preseed/ubuntu-server.seed initrd=/install/initrd.gz console=ttyS0,115200n8 ---

Those are the only changes needed.  (This may be unclear - the 'append' line starts with 'file=' and ends with '---', it is not over two lines)

Boot from it!

All you need to do now is plug it in, and wait for it to boot.  Do not plug in your other drive just yet. You don't want to have ANY other drives plugged in, to confuse the machine into booting from something else (you DID remove your HDDs earlier, right?)

You may also note that I also disabled 'quiet' mode, because I really want to be sure that it's booting when I'm sitting in front of a USB stick that's flashing without any other explanation!  You should see a bunch of things fly up the console, and then it'll ask you what language you want to use. Only after you see that question should you plug your new USB Drive in

Install as per normal.

This is pretty uneventful. Install as per normal, and don't forget to turn on SSH when you're at the package selection page, otherwise you'll have a really bad time trying to log into it!

That was the hardest bit. Honestly. Now it's EASY!

Install our needed packages

We want zfsutils-linux, of course, and iSCSI.

root@thecus:~# apt-get install zfsutils-linux iscsitarget targetcli

Plug the HDDs back in and Configure ZFS

The HDD's that I used were still perfectly visible, with all the data on them. Once I plugged them back in, they all came back up and re-established the RAID settings they were using previously. This is not what I wanted.  I had to remove the RAID partitions and volumes manually, using 'vgemove' (you can run 'vgdisplay' to get a list, you'll have at least vg0, maybe vg1 and vg2)

root@thecus:~# vgremove vg1
Do you really want to remove volume group "vg1" containing 3 logical volumes? [y/n]: y
Do you really want to remove and DISCARD active logical volume syslv? [y/n]: y
  Logical volume "syslv" successfully removed
Do you really want to remove and DISCARD active logical volume lv0? [y/n]: y
  Logical volume "lv0" successfully removed
Do you really want to remove and DISCARD active logical volume iscsi0? [y/n]: y
  Logical volume "iscsi0" successfully removed
  Volume group "vg1" successfully removed
root@thecus:~#

Then stop the RAIDs (note, I've edited out a lot of stuff, I had to delete three RAIDs, and there's 7 HDD's in this machine, but this should be enough for you to get the idea)

root@thecus:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md125 : active (auto-read-only) raid10 sdh2[0] sdg2[1] sdf2[2] sde2[3]
      972674688 blocks super 1.0 64K chunks 2 near-copies [4/4] [UUUU]
root@thecus:~# mdadm --manage /dev/md125 --stop
mdadm: stopped /dev/md125
root@thecus:~# mdadm --zero-superblock /dev/sdh2 /dev/sdg2 /dev/sdf2 /dev/sde2
root@thecus:~# 

You will probably have to repeat that for any number of md12?'s ..

Then nuke the partitions on those drives (where we're going, we don't NEED partitions!)


root@thecus:~# fdisk /dev/sdh
Welcome to fdisk (util-linux 2.27.1)
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): d
Partition number (1,2, default 2):
Partition 2 has been deleted.
Command (m for help): d
Selected partition 1
Partition 1 has been deleted.
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@thecus:~# 

And finally, you can now assign them to a ZFS pool! (Yes, use raidz2, there's no reason not to. That's perfect for a device of this size).

Just fill it up with as many disks as you can get your hands on, and upgrade the disks as you want to. The pool will automatically grow as you replace disks with larger ones!

root@thecus:~# zpool create -o ashift=12 n8800pool raidz2 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
root@thecus:~#

You now have a ZFS pool that's ready to go!   (You almost always want to have 'ashift=12' - even if you're adding an old disk, writing to THAT in 4k chunks won't slow it down. Writing to a NEW disk in 512b chunks WILL slow it down.)

Create an iSCSI target for your new ZFS pool.

This is the easy part. I'm now going to create a 1TB iSCSI volume that we actually use to store our data.

zfs create -o compression=off -o dedup=off -o volblocksize=32K -V 1024G n8800pool/iscsi-1
zfs set sync=disabled n8800pool/iscsi-1

Now all we need to do is set up the iSCSI Target on our thecus. Note that I'm disabling all authentication in targetctl because I'm comfortable that this machine will never be accessible by any nefarious hacker.  You may not want to do that.

/> cd backstores/iblock
/backstores/iblock> create name=iscsi1  dev=/dev/zvol/n8800pool/iscsi-1
Created iblock storage object iscsi1 using /dev/zvol/n8800pool/iscsi-1.
/backstores/iblock> cd /iscsi
/iscsi> create
Created target iqn.2003-01.org.linux-iscsi.thecus.x8664:sn.0a4b33134abb.
Selected TPG Tag 1.
Created TPG 1.
/iscsi> cd iqn.2003-01.org.linux-iscsi.thecus.x8664:sn.0a4b33134abb/tpg1/


Remember, you can use tab expansion here.  You don't need to copy and paste that huge string. The keypresses I used were 'create[enter]cd iq[tab][tab][enter]'.

You now need to link the target to the physical device you created earlier (both highlighted green)

/iscsi/iqn.20...33134abb/tpg1> cd luns
/iscsi/iqn.20...abb/tpg1/luns> create /backstores/iblock/iscsi1
Selected LUN 0.
Created LUN 0.
/iscsi/iqn.20...abb/tpg1/luns> cd ..
/iscsi/iqn.20...33134abb/tpg1> set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1
Parameter demo_mode_write_protect is now '0'.
Parameter authentication is now '0'.
Parameter generate_node_acls is now '1'.
Parameter cache_dynamic_acls is now '1'.
/iscsi/iqn.20...33134abb/tpg1> cd portals
/iscsi/iqn.20.../tpg1/portals> create
Using default IP port 3260
Automatically selected IP address 10.91.80.189.
Created network portal 10.91.80.189:3260.
/iscsi/iqn.20.../tpg1/portals> saveconfig
Save configuration? [Y/n]:
Saving new startup configuration
/iscsi/iqn.20.../tpg1/portals> exit
Comparing startup and running configs...
Startup config is up-to-date.

One thing I have noticed is that the 'demo_mode' sometime's won't start working until you reboot the machine. If you DO want to enable authentication, then configure that up as per normal, but if you do NOT want auth and discover things like this in your 'dmesg', then you'll need to reboot the thecus:

[ 6779.673114] iSCSI Initiator Node: iqn.1998-01.com.vmware:ssd-4142c0bc is not authorized to access iSCSI target portal group: 1.
[ 6779.684631] iSCSI Login negotiation failed.

Congratulations, you're done!

That is 100% it. You now have a 1TB iSCSI target that's visible from your thecus. You can now do other things, like create a NFS store as part of your pool, but that's well documented elsewhere. From here,  you have a fully functional Ubuntu 16.04 machine, with ZFS, and super fast iSCSI.

Enjoy!