Blog Archives

Oracle Sweden User Group (ORCAN) Event

I have been in Sweden (in a japanese spa hotel near Stockholm City) between Monday and Wednesday to join ORCAN event. Thanks to Patrik Norlander and his friends, the event was really perfect. I had two presentations and joined presentations of other ACEs and experts.

I spent my time in talking with Dan Morgan on a possible Turkey Oracle User Group Event, with Jose Senegacnik on Oracle and planes, with Dimitri Gielis whether APEX 4.0 is sufficiently mature to grow large scale applicaitons, and finally with Luca Canali about recent Oracle Streams projects in CERN.

Thanks guys,

it was a perfect time for me

My Presentations

How to Install Oracle 11g Release 2 on OEL 5.4 on VirtualBox: Installing Grid Infrastructure

In Oracle 11g Release 2 you will find that things have changed even for single instance database installation. I will try to illustrate in this series of posts how to install a single instance Oracle 11g Release 2 database to your Linux machines.

As the first part of our installation series, we will start by installing brand new Grid Infrastructure which you might think to be a fancy name for CRS+ASM but you will find out later that it is a bit more.

VirtualBox Configuration

Here is the sufficient VirtualBox virtual hardware configuration for your 11g Release 2 playground (Keep in mind that this is the bare minimum configuration to have a painless installation. More resource is obviously better):

Hardware Amount Description
Memory 512MB Although minimum memory requirement for Oracle 11g Release 2 in a real production environment is documented to be 1024M, for all practical requirements of your playground 512MB will be sufficient.
Root Disk 16GB 16 GB root disk for OS+SWAP+Oracle Binary space will be sufficient
ASM Disks 6x2GB SCSI Disks Oracle 11g Release 2 Beta 2 AyarlarWe will be doing an ASM based installation so 6 disks over SCSI interface will be enough to simulate a real life experience with ASM.

You are now ready to start your VirtualBox for Oracle 11g Release 2 installation.

Install Grid Infrastructure

Pre-work

There are a few important tasks in OS level we should complete before starting grid infrastructure installation.

Physical Partition Creation

The first thing is to create physical partitions over your virtual SCSI devices. Actually this is not crucial for ASM installation because ASM can use physical disk as a whole without any partition. However if you wish to create ASMLIB you will need those partitions.

[root@localhost ~]# fdisk /dev/sda
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won’t be recoverable.Warning: invalid flag 0x0000 of partition table will be corrected by w(rite)Command (m for help): n
Command action
e     extended
p     primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-261, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-261, default 261):
Using default value 261Command (m for help): w
The partition table has been altered!

 

Calling ioctl() to re-read partition table.
Synching disks.

Changing ASM devices ownership to oracle

Physical partitions we have created are owned by root. They should be changed to be owned by ASM user (oracle in our case) in order to make them visible for ASM discovery.

[root@localhost ~]# lsla /dev/sd?1
brw-r—– 1 root disk 8,  1 Sep 21 05:03 /dev/sda1
brw-r—– 1 root disk 8, 17 Sep 21 05:03 /dev/sdb1
brw-r—– 1 root disk 8, 33 Sep 21 05:03 /dev/sdc1
brw-r—– 1 root disk 8, 49 Sep 21 05:03 /dev/sdd1
brw-r—– 1 root disk 8, 65 Sep 21 05:03 /dev/sde1
brw-r—– 1 root disk 8, 81 Sep 21 05:03 /dev/sdf1[root@localhost ~]# chown oracle:dba /dev/sd?1[root@localhost ~]# ls –la /dev/sd?1
brw-r—– 1 oracle dba 8,  1 Sep 21 05:03 /dev/sda1
brw-r—– 1 oracle dba 8, 17 Sep 21 05:03 /dev/sdb1
brw-r—– 1 oracle dba 8, 33 Sep 21 05:03 /dev/sdc1
brw-r—– 1 oracle dba 8, 49 Sep 21 05:03 /dev/sdd1
brw-r—– 1 oracle dba 8, 65 Sep 21 05:03 /dev/sde1
brw-r—– 1 oracle dba 8, 81 Sep 21 05:03 /dev/sdf1

In order to make those changes permanent (if you reboot the system Linux will set all device owners back to root otherwise) you should create a udev permission file

[root@localhost ~]# more /etc/udev/rules.d/99-oracle.rules
#ASM disks
KERNEL==”sda”, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″
KERNEL==”sdb”, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″
KERNEL==”sdc”, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″
KERNEL==”sdd”, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″
KERNEL==”sde”, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″
KERNEL==”sdf”, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

Ensure /dev/shm is sufficiently sized

You should ensure that /dev/shm is minimum 256MB for a successful ASM installation (and 512-750M for RDBMS installation) with MEMORY_TARGET parameter.

[root@localhost ~]# dfha
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00 15G 9.9G 3.9G 72% /
proc 0 0 0 /proc
sysfs 0 0 0 /sys
devpts 0 0 0 /dev/pts
/dev/hda1 99M 32M 63M 34% /boot
tempfs 125M 0 125M 0% /dev/shm
none 0 0 0 /proc/sys/fs/binfmt_misc
sunrpc 0 0 0 /var/lib/nfs/rpc_pipefs
oracleasmfs 0 0 0 /dev/oracleasm

You can resize tempfs online by using

[root@localhost ~]# mount /dev/shm –o size=750M,remount

In order to make this mount operation persistent you should modify the tempfs line in /etc/fstab file as follows:

[root@localhost ~]# cat /etc/fstab | grep tempfs
tmpfs                   /dev/shm                tmpfs   size=750m        0 0

Notice that although we have VirtualBox instance with 512M memory, Linux allows us to mount tempfs with a size of 750M. This most probably due to lazy allocation of memory over tempfs.

Create installation directory and set its ownership

Final step is to create our software directory and set the required ownership to it.

[root@localhost ~]# mkdir -p /u01/app
[root@localhost ~]# chown -R oracle:oinstall /u01/app/oracle

Installation

Camtasia Studio (5)
[oracle@localhost ~]$ unzip linux_11gR2_grid.zip
[oracle@localhost ~]$ cd grid
[oracle@localhost grid]$ ./runInstaller
Camtasia Studio (7) Choose Install and Configure Grid Infrastructure for a Standalone Server option and click on Next > button.
Camtasia Studio (14) Set Selected Language language to English and click on Next > button.
Camtasia Studio (9) Next step is to perform ASM configuration for your grid. Choose External as the redundancy of DATA diskgroup. Now click on Change Discovery Path… button to define asm_diskstring parameter.
Camtasia Studio (15) Set Disk Discovery Path to /dev/sd?1 and click on OK
Camtasia Studio (16) Since /dev/sd?1 matches all six SCSI partitions and they are not members of any other diskgroup, installer will list all of them as Candidate disks.Check-out /dev/sda1,/dev/sdb1,/dev/sdc1 devices as members disks of DATA diskgroup then click on Next >Other disks will be used for Flash Recovery Area later on.
Camtasia Studio (17) Choose Use same passwords for these account for a simple configuration then set Specify Password and Confirm Password fields to same password strings and click on Next >.I will be using sysadm throughout the post for any Oracle password required.
Camtasia Studio (18) One of the security enhancements introduced in Release 2 is the separation of different levels of ASM access. This defines different roles for “Who can start/stop ASM instance ?”, “Who can add/drop disks to/from diskgroups?” or “Who can use those diskgroups at RDBMS level ?”For the simplicity of installation we will be setting all roles to dba group.Now set all three select lists to dba and click on Next >.
Camtasia Studio (52) Set Oracle Base to /u01/app/oracle and Software Location to /u01/app/oracle/product/11.2.0/grid. Then click on Next >.
Camtasia Studio (22) Next step is unique to first Oracle software installation as you all may know. You should set Inventory Directory (if it is not already set by installer) to /u01/app/oraInventory. Keep oraInventory Group Name as oinstall and click on Next >
Camtasia Studio (23) In this step installer will check the installation prerequisites as it does in previous releases.
Camtasia Studio (26) By 11g Release 2, if any of the prerequisites fail it will be reported in a tree structure with different categories. In my case majority of the kernel settings are automatically managed since I have installed oracle-validated-configuration rpm during OEL installation.Only problem seems to be insufficiently sized core.net.wmem_max which is defined to be the maximum socket send buffer size.When you click on Fix & Check Again button, installer will generate a single shell script for you to correct all fixable errors and after its execution it will recheck for any possible problems left.
Camtasia Studio (27) Run the generated script as root user

[root@localhost ~]# /tmp/CVU_11.2.0.0.2_oracle/runfixup.sh
Response file being used is :/tmp/CVU_11.2.0.0.2_oracle/fixup.response
Enable file being used is :/tmp/CVU_11.2.0.0.2_oracle/fixup.enable
Log file location: /tmp/CVU_11.2.0.0.2_oracle/orarun.log
Setting Kernel Parameters…
net.core.wmem_max = 262144
net.core.wmem_max = 1048576

When the script is executed click OK to restart the prerequisite check process.

Camtasia Studio (31) As you see kernel parameter problem has gone. Other three errors can not be corrected by installer automatically, but we know that those are not critical ones. First one is PhysicalMemory error due to our VirtualBox 512 MB memory size. The second one is insufficient SwapSize that can be by-passed also for a play ground. And the final problem is RunLevel of Linux which is also not a great deal for us.Now check Ignore All and click on Next > (button will be enabled after checking out Ignore All) to continue.
Camtasia Studio (32) On the summary screen confirm that everything is ok and click on Finish to start installation.
Camtasia Studio (35) After installer successfully completes copy,install,link,etc steps it will pop-up a root.sh execution dialog. Run the required scripts as root.

[root@localhost ~]# /u01/app/oraInventory/orainstRoot.sh
[root@localhost ~]# /u01/app/oracle/product/11.2.0/grid/root.sh

Then switch back to installation dialog and click OK in Execute Configuration scripts dialog to proceed.

Camtasia Studio (45) Final tasks for installer will be to configure new HA service for ASM, diskgroup (DATA) and the default listener (on port 1521) which will be automatically configured also .
Camtasia Studio (48) Finally we are done 🙂 Your ASM, default listener, and HA service is ready to be used.

Post Installation Checks

Login ASM

[oracle@localhost ~]$ export ORACLE_HOME = /u01/app/oracle/product/11.2.0/grid [oracle@localhost ~]$ export ORACLE_SID = +ASM
[oracle@localhost ~]$ export PATH = $ORACLE_HOME/bin:$PATH
[oracle@localhost ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.0.2 Beta on Mon Sep 21 05:45:58 2009 Copyright © 1982, 2009, Oracle. All right reserved. Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.0.2 – Beta
With the Automatic Storage Management option

 

SQL> select name from v$asm_diskgroup;

NAME
——————————
DATA

Check Default Listener Status

[oracle@localhost ~]$ lsnrctl status LSNRCTL for Linux: Version 11.2.0.1.0 – Production on 31-OCT-2009 12:05:36 Copyright (c) 1991, 2009, Oracle.  All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))
STATUS of the LISTENER
————————
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.1.0 – Production
Start Date                31-OCT-2009 11:51:00
Uptime                    0 days 0 hr. 14 min. 37 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/localhost/listener/alert/log.xml
Listening Endpoints Summary…
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost.localdomain)(PORT=1521)))
Services Summary…
Service “+ASM” has 1 instance(s).
Instance “+ASM”, status READY, has 1 handler(s) for this service…
The command completed successfully

Check HA Targets

[oracle@localhost ~]$ crs_stat -t
Name           Type           Target    State     Host
————————————————————
ora.DATA.dg    ora….up.type ONLINE    ONLINE    localhost
ora….ER.lsnr ora….er.type ONLINE    ONLINE    localhost
ora.asm        ora.asm.type   ONLINE    ONLINE    localhost
ora.cssd       ora.cssd.type  ONLINE    ONLINE    localhost
ora.diskmon    ora….on.type ONLINE    ONLINE    localhost

“How to Achieve All in One with Oracle 11g” Material

Here is the content of my first presentation in Open World 2009:

How to Achieve All in One with Oracle 11g

Full Coverage in Infiniband Monitoring with OSWatcher 3.0: IB Monitoring

watch Recently, I need to use OSWatcher in our large data warehouse environment running on Solaris OS. When I have downloaded (Metalink Note  301137.1) and untared the osw3.tar file, I have noticed some new scripts within the bundle. Once I checked the README file in the bundle I have noticed that new scripts are the ones introduced to track infiniband performance and status over Exadata Database Machine (or any other server using IB stack)

In this series of posts, I will try to explain the importance of those scripts for successful IB stack and RDS monitoring

oswib.sh

oswlib.sh is the first script we will be discussing. The script content is

   1: #!/bin/ksh

   2: #

   3: # IB Diagnostics

   4: #

   5: #

   6: echo "zzz ***"`date` >> $1

   7: echo "IB Config on Hosts..." >> $1

   8: echo "ibconfig...." >> $1

   9: ifconfig >> $1

  10: echo "" >> $1

  11: echo "ib-bond..." >> $1

  12: ib-bond --status >> $1

  13: echo "" >> $1

  14: echo "ibstat..." >> $1

  15: ibstat >> $1

  16: echo "" >> $1

  17: echo "ibstatus..." >> $1

  18: ibstatus >> $1

  19: echo "" >> $1

  20: echo "lspci -vv..." >> $1

  21: lspci -vv |grep InfiniBand -A27 >> $1

  22: echo "" >> $1

  23: rm locks/iblock.file

Gather Basic Network Information

Let’s try to explain what each statement does. ifconfig at line 9 as you may expected will display the list of all network interfaces including the bond interfaces. You will see that bold ones are the Infiniband and IB bond devices. You might notice that there are some crude network statistics attached to them. What is important in here is error,dropped,collusions statistics for RX/TX. Ensure that those values are either 0 or negligible with compared to total number of network packets/frames sent/received.

[root@dbkon01:~]# ifconfig

bond0     Link encap:Ethernet  HWaddr 00:22:64:F7:12:BC 
          inet addr:10.210.51.171  Bcast:10.210.51.255  Mask:255.255.255.0

          inet6 addr: fe80::222:64ff:fef7:12bc/64 Scope:Link

          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

          RX packets:2718135321 errors:0 dropped:0 overruns:0 frame:0

          TX packets:2609232342 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:675554897860 (629.1 GiB)  TX bytes:1582459760400 (1.4 TiB)

bond0:1   Link encap:Ethernet  HWaddr 00:22:64:F7:12:BC 
          inet addr:10.210.51.172  Bcast:10.210.51.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond1     Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 
          inet addr:172.16.51.71  Bcast:172.16.51.255  Mask:255.255.255.0

          inet6 addr: fe80::216:35ff:ffbf:2b11/64 Scope:Link

          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

          RX packets:54530368 errors:0 dropped:0 overruns:0 frame:0

          TX packets:53991683 errors:0 dropped:29 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:11648571830 (10.8 GiB)  TX bytes:17492409378 (16.2 GiB)

eth0      Link encap:Ethernet  HWaddr 00:22:64:F7:12:BC 
          inet6 addr: fe80::222:64ff:fef7:12bc/64 Scope:Link

          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1

          RX packets:2242122831 errors:0 dropped:0 overruns:0 frame:0

          TX packets:1554180534 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:573128361579 (533.7 GiB)  TX bytes:1111525331344 (1.0 TiB)

          Interrupt:169 Memory:f8000000-f8012100

eth1      Link encap:Ethernet  HWaddr 00:22:64:F7:12:BE 
          inet6 addr: fe80::222:64ff:fef7:12be/64 Scope:Link

          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1

          RX packets:476012490 errors:0 dropped:0 overruns:0 frame:0

          TX packets:1055051808 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:102426536281 (95.3 GiB)  TX bytes:470934429056 (438.5 GiB)

          Interrupt:177 Memory:fa000000-fa012100

ib0       Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 
          inet6 addr: fe80::216:35ff:ffbf:2b11/64 Scope:Link

          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1

          RX packets:54259452 errors:0 dropped:0 overruns:0 frame:0

          TX packets:53991677 errors:0 dropped:29 overruns:0 carrier:0

          collisions:0 txqueuelen:256

          RX bytes:11625191478 (10.8 GiB)  TX bytes:17492408922 (16.2 GiB)

ib2       Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 
          inet6 addr: fe80::216:35ff:ffbf:2b05/64 Scope:Link

          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1

          RX packets:270916 errors:0 dropped:0 overruns:0 frame:0

          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:256

          RX bytes:23380352 (22.2 MiB)  TX bytes:456 (456.0 b)

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0

          inet6 addr: ::1/128 Scope:Host

          UP LOOPBACK RUNNING  MTU:16436  Metric:1

          RX packets:130512043 errors:0 dropped:0 overruns:0 frame:0

          TX packets:130512043 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:17683019151 (16.4 GiB)  TX bytes:17683019151 (16.4 GiB)

virbr0    Link encap:Ethernet  HWaddr 00:00:00:00:00:00 
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0

          inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:0 (0.0 b)  TX bytes:468 (468.0 b)

Infiniband Bonding Information

Infiniband bonding is somewhat similar to classical network bonding (or aggregation) with some behavioral differences. The major difference is that Infiniband network bonding interface is running in active/passive mode over Infiniband HCAs. No trunking is allowed as it is possible with classical Ethernet network. So if you have two 20 GBit interfaces you will have 20 Gbit theoretical throughput in an active IB network even that you have two (or more) interfaces. This can be seen easily at the output of ifconfig also. While ib0 interface has send/receive statistics, there is almost no traffic running over ib2 interface.

In case of a failure (or it can be done manually) bonding interface will detect the failure in the active component and will failover to the passive one and you will see some informative warning message in the /var/log/messages file just like in Ethernet bonding.

In a successful RAC configuration failover duration should be less than any CRS or watchdog timeout value. That’s because for a period of time no interconnect traffic (heartbeats, or cache fusion) will be available. So if this failover duration is too long due to host CPU utilization, a problem in HCA firmware, a configuration problem at IB switch,or any other problem clusterware or some watchdog will assume that node should be evicted from the cluster to protect cluster integrity.

To check the current status of IB bonding oswib.sh uses the ib-bond command (line 12). However there is a bug in using this command. Look at the output of the ib-bond command with –-status option.

[root@dbkon01:~]# ib-bond –status

bond0: 00:22:64:f7:12:bc 10.210.51.171/24 10.210.51.172/24

slave0: eth0 *

slave1: eth1

bond0 interface has nothing to do with IB bonding but it is the public network bonding (IPs over this bond is public and VIP IPs). This seems to be a problem with ib-bond command because it also displays non-IB bonding interfaces. In order to correct this issue, you can modify your oswib.sh script’s related line as

ib-bond --status-all

[root@dbkon01:~]# ib-bond  –status-all

bond0: 00:22:64:f7:12:bc 10.210.51.171/24 10.210.51.172/24

slave0: eth0 *

slave1: eth1

bond1: 80:00:00:48:fe:80:00:00:00:00:00:00:00:16:35:ff:ff:bf:2b:11 172.16.51.71/24

slave0: ib0 *

slave1: ib2

As you see bond1 is the IB bonding interface and ib0 is the active port.

Infiniband Adapters & Configuration Information

ibstat is yet another important command to display each IB HCA port status. Depending on your configuration different outputs are possible

[root@dbkon01:~]# ibstat

CA ‘mlx4_0’

    CA type: MT25418

    Number of ports: 2

    Firmware version: 2.5.0

    Hardware version: a0

    Node GUID: 0x001635ffffbf2b10

    System image GUID: 0x001635ffffbf2b13

    Port 1:

        State: Active

        Physical state: LinkUp

        Rate: 20

        Base lid: 7

        LMC: 0

        SM lid: 1

        Capability mask: 0x02510868

        Port GUID: 0x001635ffffbf2b11

    Port 2:

        State: Down

        Physical state: Polling

        Rate: 10

        Base lid: 0

        LMC: 0

        SM lid: 0

        Capability mask: 0x02510868

        Port GUID: 0x001635ffffbf2b12

CA ‘mlx4_1’

    CA type: MT25418

    Number of ports: 2

    Firmware version: 2.5.0

    Hardware version: a0

    Node GUID: 0x001635ffffbf2b04

    System image GUID: 0x001635ffffbf2b07

    Port 1:

        State: Active

        Physical state: LinkUp

        Rate: 20

        Base lid: 6

        LMC: 0

        SM lid: 1

        Capability mask: 0x02510868

        Port GUID: 0x001635ffffbf2b05

    Port 2:

        State: Down

        Physical state: Polling

        Rate: 10

        Base lid: 0

        LMC: 0

        SM lid: 0

        Capability mask: 0x02510868

        Port GUID: 0x001635ffffbf2b06

In this sample configuration there are two 20 Gbit Mellanox HCAs with two ports each. Only Port 1 of each HCA is actively connected to an Infiniband switch port and the other ports are not in use. This output is specific to our configuration. For example in Exadata there is only one dual port HCA on each computation/storage cell and those ports are paired for redundant IB network configuration.

ibstatus is another way to check the same thing. Output also includes the port speeds.

[root@dbkon01:~]# ibstatus

Infiniband device ‘mlx4_0’ port 1 status:

    default gid:     fe80:0000:0000:0000:0016:35ff:ffbf:2b11

    base lid:     0x7

    sm lid:         0x1

    state:         4: ACTIVE

    phys state:     5: LinkUp

    rate:         20 Gb/sec (4X DDR)

Infiniband device ‘mlx4_0’ port 2 status:

    default gid:     fe80:0000:0000:0000:0016:35ff:ffbf:2b12

    base lid:     0x0

    sm lid:         0x0

    state:         1: DOWN

    phys state:     2: Polling

    rate:         10 Gb/sec (4X)

Infiniband device ‘mlx4_1’ port 1 status:

    default gid:     fe80:0000:0000:0000:0016:35ff:ffbf:2b05

    base lid:     0x6

    sm lid:         0x1

    state:         4: ACTIVE

    phys state:     5: LinkUp

    rate:         20 Gb/sec (4X DDR)

Infiniband device ‘mlx4_1’ port 2 status:

    default gid:     fe80:0000:0000:0000:0016:35ff:ffbf:2b06

    base lid:     0x0

    sm lid:         0x0

    state:         1: DOWN

    phys state:     2: Polling

    rate:         10 Gb/sec (4X)

As you know lspci displays the details of  PCI devices attached to a host. In order to check details of your HCAs, OSW periodically logs the output of this command also. This output is also specific to HCA you are using. Meaning of some parameters might be interpreted by looking at the manuals of your vendor.

[root@dbkon01:~]# lspci -vv |grep InfiniBand -A27

13:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe 2.0 2.5GT/s] (rev a0)

    Subsystem: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe 2.0 2.5GT/s]

    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B-

    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

    Latency: 0, Cache Line Size: 64 bytes

    Interrupt: pin A routed to IRQ 209

    Region 0: Memory at fdc00000 (64-bit, non-prefetchable) [size=1M]

    Region 2: Memory at f6800000 (64-bit, prefetchable) [size=8M]

    Capabilities: [40] Power Management version 3

        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)

        Status: D0 PME-Enable- DSel=0 DScale=0 PME-

    Capabilities: [48] Vital Product Data

    Capabilities: [9c] MSI-X: Enable+ Mask- TabSize=256

        Vector table: BAR=0 offset=0007c000

        PBA: BAR=0 offset=0007d000

    Capabilities: [60] Express Endpoint IRQ 0

        Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag+

        Device: Latency L0s <64ns, L1 unlimited

        Device: AtnBtn- AtnInd- PwrInd-

        Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-

        Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-

        Device: MaxPayload 256 bytes, MaxReadReq 4096 bytes

        Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8

        Link: Latency L0s unlimited, L1 unlimited

        Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-

        Link: Speed 2.5Gb/s, Width x8

    Capabilities: [100] Unknown (14)



19:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe 2.0 2.5GT/s] (rev a0)

    Subsystem: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe 2.0 2.5GT/s]

    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B-

    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

    Latency: 0, Cache Line Size: 64 bytes

    Interrupt: pin A routed to IRQ 185

    Region 0: Memory at fde00000 (64-bit, non-prefetchable) [size=1M]

    Region 2: Memory at f7000000 (64-bit, prefetchable) [size=8M]

    Capabilities: [40] Power Management version 3

        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)

        Status: D0 PME-Enable- DSel=0 DScale=0 PME-

    Capabilities: [48] Vital Product Data

    Capabilities: [9c] MSI-X: Enable+ Mask- TabSize=256

        Vector table: BAR=0 offset=0007c000

        PBA: BAR=0 offset=0007d000

    Capabilities: [60] Express Endpoint IRQ 0

        Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag+

        Device: Latency L0s <64ns, L1 unlimited

        Device: AtnBtn- AtnInd- PwrInd-

        Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-

        Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-

        Device: MaxPayload 256 bytes, MaxReadReq 4096 bytes

        Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8

        Link: Latency L0s unlimited, L1 unlimited

        Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-

        Link: Speed 2.5Gb/s, Width x8

    Capabilities: [100] Unknown (14)