Friday, September 26, 2008

Troubleshooting Kerberos issues

By Hristo Yankov

Dealing with Kerberos in MOSS/K2 distributed environment is inevitable. First, let's start with a quick overview of what is Kerberos.

From Wiki:
Kerberos is a computer network authentication protocol, which allows individuals communicating over a non-secure network to prove their identity to one another in a secure manner. It is also a suite of free software published by Massachusetts Institute of Technology (MIT) that implements this protocol. Its designers aimed primarily at a client-server model, and it provides mutual authentication — both the user and the server verify each other's identity. Kerberos protocol messages are protected against eavesdropping and replay attacks.



What this means in the MOSS (with K2 worklist web part)/K2 context is:
End client's browser requests MOSS page featuring the K2 web part, MOSS delegates the client's credential to K2, K2 knows who the end user is and retrieves his or her worklist items.

Since this is a 'troubleshooting Kerberos' article and not 'setting up Kerberos for first time' we will assume that:
  • You have supposedly configured your network to work with it, by following the K2 documentation (Getting Started)
  • You went to the Active Directory on the Domain Controller and gave the computers and users participating in the process a Delegate right.
  • You have configured IIS to use Negotiate, rather than NTLM
  • You DO have worklist items waiting for you.
  • If you have more than one domain controller in the network you realize that replication time could be a problem. Make sure that if you do some domain changes (adding/removing SPN, trusting delegation and etc) you either force the replication or wait for the period of time. You might want to decrease it to 10-15 minutes.

Now, if you don't see any of your items in the MOSS K2 web part, that's the first sign your Kerberos setup is not working. Login to your K2 server (using the K2 Service credentials!), go to the Services and stop the BlackPearl service. Then run it in console mode so you can clearly see what the error is. Chances are, you will see error messages telling you that the user 'NT AUTHORITY\ANONYMOUS LOGON' was denied access.


Obviously what happens is - IIS is not passing the end client credentials to the next server in the chain, which in our particular case is the K2 server. Now starts the fun part, trying to determine why your Kerberos configuration is not working. Logically, you would start by enabling the Kerberos logging on all participating machines. You do that by running regedit and navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters. You should add a REG_DWORD entry with value of '1'. Also, on the machine issuing TGT you should add the key KerbDebugLevel with value of '1', in the same Parameters folder. Enabling Kerberos logging is also explained here.

You might notice that Kerberos delays error logging, skips some or just doesn't output when you expect it to. Eventually you will start seeing some logs in you Event Log viewer, System section. There might be gems such as:
  • KDC_ERR_S_PRINCIPAL_UNKNOWN
  • KDC_ERR_BADOPTION
  • KRB_ERR_RESPONSE_TOO_BIG
  • And other...
Microsoft tries to explain them here, but generally, the descriptions of those errors might be unclear or even misleading. During the course of a week-long Kerberos issue troubleshooting, the problem was pinpointed to the following items.

  1. Internet Explorer settings on the client side. Make sure that the client (using IE) has added your MOSS website to the list of Trusted Sites. This is done by going to the IE -> Tools -> Internet Options -> Security -> click on Trusted Sites -> Click on the Sites button -> type the url of the site and click Add. Also, make sure that the Security Level for the Trusted zone is Low. If that's not possible, do a custom level (by pressing its button) and scroll down to the bottom of the pop up screen. Select "User Authentication->Logon->Automatic logon with current user name and password" radio button. Basically this tells the browser to pass the user credentials to the web site. Otherwise it will pass some anonymous user and it will never work.
  2. It is worth mentioning that IE 6 won't work with Kerberos, if the web site is not running on port 80. It is explained in details here, but you can easily overcome this problem by using host headers, insted of ports.
  3. Kerberos UDP fragmentation. Yes, by default Kerberos is running over the unreliable UDP protocol. This means - there is no guarantee that the packages will actually reach the destination. So if you get a lot of KDC_ERR_S_PRINCIPAL_UNKNOWN and KRB_ERR_RESPONSE_TOO_BIG error logs, this might be the reason. It is explained here and the solution to that is right here. By setting the max packet size to 1, you force it to run over TCP. This will require reboot of the system, though.
  4. Duplicate SPNs. That is my favorite and less documented problem! To understand what is considered a duplicate, read this short article carefully! Turns out, you can not have "HTTP/portal.mydomain.com DOMAIN\ServiceA" AND "HTTP/portal.mydomain.com DOMAIN\ServiceB" in the same time! When you are adding a SPN, it should be assigned to only one user!
Getting rid of the duplicate SPNs is no fun at all. Currently, there is no good tool do that for you. You will have to do it manually. First, you will need to output all the current SPNs in your domain. It's all explained here (3rd method).
  1. Get the spnquery.vbs script from here (click on the download button)
  2. Run it by executing "cscript spnquery.vbs * > my_SPNs.txt" in the command prompt
  3. Now you have all your SPNs dumped into the my_SPNs.txt file
Open it in your favorite text editor for review. Let's assume your MOSS server is called "SRV-MOSS" and your domain is "domain.company". Search the file for "SRV-MOSS". You should see an entry like:
CN=SRV-MOSS,[...]
Class: computer
Computer DNS: [...]
-- HOST/SRV-MOSS.domain.company
-- HOST/SRV-MOSS
It is fine. It shows that your MOSS server is trusted for delegation in the AD. Keep searching. You should see an entry similar to:
CN=[Application Pool Running User],[...]
Class: user
User Logon: [ApplicationPoolRunningUser]
-- HTTP/SRV-MOSS
-- HTTP/SRV-MOSS.domain.company
If the [ApplicationPoolRunningUser] is the domain user, running your MOSS web application pool, that is great, because it means that you have set a correct SPN! If you don't find such entries at all, you have missed an important step and you need to add delegation, by running command similar to:
setspn -A HTTP/SRV-MOSS domain.company\ApplicationPoolRunningUser
setspn -A HTTP/SRV-MOSS.domain.company domain.company\ApplicationPoolRunningUser
However, if you find another user entry (different of your MOSS app. pool running user), listing HTTP/SRV-MOSS as service principal name, that's a problem, because it's a duplicate! You will need to remove it by executing:
setspn -D HTTP/SRV-MOSS domain.company\AnotherUser
setspn -D HTTP/SRV-MOSS.domain.company domain.company\AnotherUser
Search for other duplicates and if none, you are one step closer to resolving the problem. At this moment, it is great idea to Purge all current Kerberos tickets in the system, as they (and the lack of?!) are being cached for more than 20 hours. For that purpose, you need to obtain a copy of the free KerbTray program. You might already have it installed, so check in your Program Files\Resource Kits. If not, get it from here and disregard it's saying the program is for Windows 2000 only. Here is a tip - if you are lazy and don't want to install it on all machines, you can intall it on one and access it from the others by opening \\servername\C$\... (if enabled). The program usually runs fine that way, worst case you will have to copy it.

So, after you start KerbTray you will notice a new green icon in the system tray.


If you double click it will show you the current Kerberos tickets obtained for the system.


What you want to do is right click the icon and select 'Purge Tickets'. Don't worry, the system will obtain them again and that's the whole point of the exercise.

Now go ahead and retest the network connectivity. Keep monitoring the K2 blackpearl server output. You should not see anonymous logons any more. Every network environment and its Kerberos configuration has something specific and its own flavor. Explore the references below to get further information and idea on how to troubleshoot Kerberos issues.

References:

Bookmark and Share

No comments: