Troubleshoot SharePoint on Premise High CPU Usage 100% – Part 2

Previously I explained how to collect information while the CPU is running high due to potential custom apps installed on a SharePoint Farm.

This post is about analysing the result and finding the root cause of the problem

Part 1 : https://jeffangama.wordpress.com/2017/04/26/troubleshoot-sharepoint-on-premise-high-cpu-usage-100/

We will in this article :

  1. Analyze the result using DebugDiag 2 Analysis
  2. Analyze even more using WinDbg
  3. Find the root cause and corrective actions

Using DebugDiag 2 Analysis, analyze the results from the dump previously collected

From start menu, run

Click PerfAnalysis and click on the button add data files

Select the files generated from this blog part 1, do not select the biggest file (full dump)

Disconnect from internet, otherwise the analysis DebugDiag Analysis is really slow (it could take 2 / 3 days in some case to analysis…)
Click Start Analysis
Wait 10 min
Once the result are available, it opens in the browser

Save this page, it will save an .mht file, for future analysis
We can see that the top function called in most of the thread are mscorlib.dll

That is some out of the box .net DLL called by the SharePoint code we have implemented.

This analysis is not really helpful as it doesn’t show which function called mscorlib Functions, lets dig into details, using WINDBG

Further analyse the results, using WinDbg – Installation

Since the result from DebugDiag 2 Analysis is not showing which function from mscorlib is taking the most CPU, we need to use an other tool to find out the root cause.

Following this tutorial to install WinDbg

WinDbg is part of Windows SDK that you can download from here

Once installed, move the .exe to a folderF:\Installs\WinDbg\resource”

Below is how to load information in WinDbg and find the root cause

Step Detail
 Load Mini Dump Load a mini dump (from the part 1 of this blog)

 Load psscor4
  1. Download this file psscor4 (some dependencies)
  2. Drop the dll somewhere and load it
  3. In WinDbg console type

.load F:\\Installs\WinDbg\resource\psscor4.dll

Build the sharepoint project

Drop the project .pdb file and .dll in the same previous folder F:\Installs\WinDbg\resource

.sympath+ F:\Installs\WinDbg\resource

.reload

Find the thread list !runaway
Select first thread ~165s
Show call stack !clrstack

If call stack is not showing, call those again

.sympath+ F:\Installs\WinDbg\resource

.reload

!clrstack

Win DBG shows us here that from this thread, this is the function being called.

If we select other thread (using ~NBRs function, and show the call stack, using the previous step), we can also see this same function

This is the root cause : our development, using C# code calls the taxonomy class from SharePoint.

Finding the root cause

Digging into SharePoint code (SharePoint 2013 SP1), we can see that they are using a Dictionary to retrieve terms from term store.

Though a Dictionnary class is not supporting multi Thread calls (a website must support multi thread, as many concurrent users connects to the site)

https://blogs.msdn.microsoft.com/asiatech/2009/05/11/asp-net-application-100-cpu-caused-by-system-collections-generic-dictionary/

From first dump, the thread was finding in the Dictionary, and from the second dump, the same thread was still finding in the same Dictionary.

This is unbelievable as the Dictionary has 3 items only. So, definitely these threads were entered an endless loop.

By review the code, we found this application modify/read the Dictionary object without any lock. This is the cause of the problem by a simple look at the FindEntry code via Reflector.

Below is the information from MSDN – Dictionary is not thread safe.

A Dictionary<(Of <(TKey, TValue>)>) can support multiple readers concurrently, as long as the collection is not modified.

Even so, enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with write accesses,

the collection must be locked during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.

Looking at the code from the Last SharePoint CU (January 2017 at this time) shows that they have fixed this issue regarding fetching the terms in the managed metadata service, by implementing a ConcurrentDictionary

Conclusion

Using this DebugDiag, we have collected information while the CPU is peaking at 100% and using DebugDiag Analysis and WinDBG we analyzed the function calling the out of the box code (MSCorLib).

Looking into Sharepoint code, showed the root cause problem, caused by using Dictionary without lock mechanism to support multi threading.

A Cumulative update fixes this issue using ConcurrentDictionary<TKey, TValue> instead of Dictionary, because it has functions for multi threading.

Have you also fix issues using those tool ? Please share your findings.

Advertisements

One thought on “Troubleshoot SharePoint on Premise High CPU Usage 100% – Part 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s