Considerations for NetTcpBinding/NetNamedPipeBinding you may not be aware

 

NetTcpBinding is a strange beast and chances are you will encounter several problems in production you never experienced in development or staging phases. The information you will see here will be either fragmented or hidden in the fine print throughout  MSDN documentation.

Considerations about net.tcp binding

Port Sharing

Net.tcp services using shared port needs to run under an account that is either SYSTEM, NETWORK or IIS_IUSR or be part of local Administrators. If part of local administrators the application hosting the service must run in elevated privileges.

To add new accounts that may use port sharing edit <allowAccounts> from SMSvcHost.exe.config (normally at C:\Windows\Microsoft.NET\Framework\v4.0.30319 for .NET 4.0+ or C:\Windows\Microsoft.NET\Framework\v3.0\Windows Communication Foundation\). Notice that the users/groups must be in SID format. You may use PowerShell to translate a user account into SID: https://technet.microsoft.com/en-us/library/ff730940.aspx

$objUser = New-Object System.Security.Principal.NTAccount(“fabrikam”, “kenmyer”)
$strSID = $obj
User.Translate([System.Security.Principal.SecurityIdentifier])
$strSID.Value

If you perform an in-place update from Windows 2008 R2 to Windows 2012 R21, net.tcp sharing services may stop working with a error similar to this one because there is as mismatch between .NET Framework and WCF:

Log Name:      System
Source:        SMSvcHost 4.0.0.0
Date:          10/22/2015 11:49:42 AM
Event ID:      7
Task Category: Sharing Service
Level:         Error
Keywords:      Classic
User:          LOCAL SERVICE
Computer:      SERVER1.contoso.local
Description:
A request to start the service failed.  Error Code: System.TypeLoadException: Could not load type ‘System.Runtime.Diagnostics.ITraceSourceStringProvider’ from assembly ‘System.ServiceModel.Internals, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35’.
at System.ServiceModel.Channels.BinaryMessageEncoderFactory..ctor(MessageVersion messageVersion, Int32 maxReadPoolSize, Int32 maxWritePoolSize, Int32 maxSessionSize, XmlDictionaryReaderQuotas readerQuotas, Int64 maxReceivedMessageSize, BinaryVersion version, CompressionFormat compressionFormat)
at System.ServiceModel.Channels.BinaryMessageEncodingBindingElement.CreateMessageEncoderFactory()
at System.ServiceModel.Channels.ConnectionOrientedTransportChannelListener..ctor(ConnectionOrientedTransportBindingElement bindingElement, BindingContext context)
at System.ServiceModel.Channels.NamedPipeChannelListener..ctor(NamedPipeTransportBindingElement bindingElement, BindingContext context)
at System.ServiceModel.Channels.NamedPipeTransportBindingElement.BuildChannelListener[TChannel](BindingContext context)
at System.ServiceModel.Channels.Binding.BuildChannelListener[TChannel](Uri listenUriBaseAddress, String listenUriRelativeAddress, ListenUriMode listenUriMode, BindingParameterCollection parameters)
at System.ServiceModel.Description.DispatcherBuilder.MaybeCreateListener(Boolean actuallyCreate, Type[] supportedChannels, Binding binding, BindingParameterCollection parameters, Uri listenUriBaseAddress, String listenUriRelativeAddress, ListenUriMode listenUriMode, ServiceThrottle throttle, IChannelListener& result, Boolean supportContextSession)
at System.ServiceModel.Description.DispatcherBuilder.BuildChannelListener(StuffPerListenUriInfo stuff, ServiceHostBase serviceHost, Uri listenUri, ListenUriMode listenUriMode, Boolean supportContextSession, IChannelListener& result)
at System.ServiceModel.Description.DispatcherBuilder.InitializeServiceHost(ServiceDescription description, ServiceHostBase serviceHost)
at System.ServiceModel.ServiceHostBase.InitializeRuntime()
at System.ServiceModel.ServiceHostBase.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Activation.SharingService.StartControlService()
at System.ServiceModel.Activation.SharingService.Start()
at System.ServiceModel.Activation.TcpPortSharing.OnStart(String[] args)  Process Name: SMSvcHost  Process ID: 2208

To resolve this issue, install at least .NET 4.5.2 to bring .NET and WCF to the same page (http://www.microsoft.com/en-us/download/details.aspx?id=42642)

Net.tcp/net.pipe channels are sessionful and may leak sessions if configured differently

This problem is also common when using NetPipeBindind (net.pipe). Net.tcp channel is sessionful (i.e. it will establish a session between client and service) and if you configure instance mode differently (e.g. PerCall) it will still create sessions. If a misbehaving client is not closing the connection properly or reliable sessions is enabled (see reliable session later), the service will hold a zombie session for the time the receive time out is set (by default 10 minutes). Clients should instantiate a service proxy (or use the channel factory to get a proxy) so the operation it should do and close or abort the inner channel after that. Reuse a proxy for the lifetime of the client application is always a bad idea. If using load balance, never hold a proxy between calls unless you are indeed using session in your service. This is the snipped of the correct way to invoke a WCF service from a client application:

            var client = new Service1Client();
            try
            {
                var response = client.WhoAmI(); // Call the service
                Console.WriteLine("Response = {0}", response);
                client.Close(); // Close afterwards
            }
            catch (Exception ex)
            {
                Console.WriteLine("\n\nError: {0}", ex.Message);
                var inner = ex.InnerException;
                while (inner != null)
                {
                    Console.WriteLine("Inner: {0}", inner.Message);
                    inner = inner.InnerException;
                }
                if (client.State != CommunicationState.Closed)
                {
                    client.Abort(); // If service is not closed at this point, abort
                }
            }

The common exceptions when WCF sessions are leaking is occasionally receiving are timeout error and communication exceptions on the client side that will only subside when the service is restarted or the 10 minutes receive timeout is elapsed without any new connection. Exceptions in the client side looks like these ones:

System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue.

System.TimeoutException: This request operation sent to net.tcp://contoso:9090/servicemodelsamples/nettcp did not receive a reply within the configured timeout

However, there will be times you cannot control all the client applications using the service, so if this is the case, make sure you set the receiveTimeout property of NetTcpBinding to a short time, as 30 seconds for instance, so the zombies sessions will be dead after 30 seconds, instead of the default 10 minutes.  Don’t worry because this setting is mostly a misname for “session time out” and it is not direct to the time a request may take to be received. This is a sample config settings to make receiveTimout 30 seconds:

<bindings>
  <netTcpBinding>
    <binding receiveTimeout="00:00:30">
      <security mode="Transport">
        <transport clientCredentialType="Windows" protectionLevel="EncryptAndSign" />
      </security>
    </binding>
  </netTcpBinding>
</bindings>

If you have .NET 4.5+, it is easy to troubleshoot the issue using ETL traces. When you see clients stop responding, capture ETL/ETW traces on the service side (server) using a batch file as below (I like to call it get-traces.bat):

ECHO These commands will enable tracing:
@echo on

logman create trace "redist_mspartners" -ow -o %temp%\redist_mspartners.etl -p "Microsoft-Windows-Application Server-Applications" 0xffffffffffffffff 0xff -nb 16 16 -bs 1024 -mode Circular -f bincirc -max 4096 -ets

logman update trace "redist_mspartners" -p {7F3FE630-462B-47C5-AB07-67CA84934ABD} 0xffffffffffffffff 0xff -ets

@echo off
echo
ECHO Reproduce your issue and enter any key to stop tracing
@echo on

pause

logman stop "redist_mspartners" -ets

@echo off

echo Tracing has been captured and saved successfully at %temp%\redist_mspartners.etl

pause

Run only when the problem is happening and for little time as this file grows fast and is circular. Open the resulting trace with Message Analyzer and look for “Concurrent sessions ratio” in the summary. In the example below there are 30/30 possible concurrent connections. While this was happening the client was throwing

image

If you have previous version of .NET you may consider using the custom service behavior I discuss here to monitor your service when necessary: http://blogs.msdn.com/b/rodneyviana/archive/2014/10/08/verifying-current-calls-and-sessions-during-runtime.aspx

Identifying session leak for Advanced Users (also valid for net.pipe)

If you are comfortable analyzing dump files in WinDBG, you may use NetExt to verify the runtime throttling counters as well as look for mismatches between the service session mode and the channel service mode. To learn how to have NetExt installed, see this: Getting started with NetExt

Open the dump file in WinDBG. Load netext and index the heap:


0:000> .load netext
0:000> !windex

List all services:
!windex

0:000> .load netext

netext version 2.1.0.5000 Oct  5 2015

License and usage can be seen here: !whelp license

Check Latest version: !wupdate

(...)

 

0:000> !windex

Starting indexing at 19:52:07 PM

Indexing finished at 19:52:09 PM

7,916,949 Bytes in 43,512 Objects

Index took 00:00:01

0:000> !wservice

Address             State   (...)  Calls/Max   Sessions/Max    ConfigName,.NET Type

00000016d97fd998    Opened  (...)  0n31/0n100     0n30/0n30    "VanillaService.DataContractSample",VanillaService.DataContractSample

 

1 ServiceHost object(s) found

Notice that you see a very similar number of calls and sessions and the service should increase only one of them. If there are calls and sessions instances it is indication that there is a link. Clicking on the object link, it will show the details.

0:000> !wservice 00000016d97fd998

 

Service Info

================================

Address            : 00000016D97FD998

Configuration Name : VanillaService.DataContractSample

State              : Opened

(...)

Calls/Max Calls    : 0n31/0n100

Sessions/Max       : 0n30/0n30 <;-- Max session reached

(...)

Session Mode       : False  <;-- Service level session mode is FALSE

 

Service Behaviors

================================

Concurrency Mode   : Multiple

Instance Mode      : PerCall <;-- Instancing is not PerSession

Add Error in Faults: false

(...)

 

Service Base Addresses

================================

net.tcp://localhost:9090/servicemodelsamples/

 

Channels

================================

Address            : 00000016D9889AA8

Listener URI       : net.tcp://localhost:9090/servicemodelsamples/nettcp

Binding Name       : http://tempuri.org/:NetTcpBinding

Aborted            : No

State              : Opened

Transaction Type   : No transaction

Listener State     : Opened

Timeout settings   : Open [00:01:00] Close [00:01:00] Receive: [00:10:00] Send: [00:01:00]

Server Capabilities: SupportsServerAuth [Yes] SupportsClientAuth [Yes] SupportsClientWinIdent [Yes]

Request Prot Level : EncryptAndSign

Response Prot Level: EncryptAndSign

Events Raised      : No Event raised

Handles Called     : OnOpeningHandle OnOpenedHandle

Session Mode       : True <;-- Tcp Channel is Session oriented by design

(...)

 

 

Endpoints

 

================================

 

Address            : 00000016D98770F8

URI                : net.tcp://localhost:9090/servicemodelsamples/nettcp

Is Anonymous       : False

Configuration Name : VanillaService.IDataContractSample

Type Name          : VanillaService.IDataContractSample

Listening Mode     : Explicit

Class Definition   : 00007ffebf71c160 VanillaService.IDataContractSample

Behaviors          : 00000016d98773c8

Binding            : 00000016d9871b18

(...)

The details are very important because we will compare the service session mode with the net.tcp channel session mode (which is by design PerSession). The service session mode is defined by instancing context mode. By default it is PerCall. To change it you should declare it as an attribute of the service class. See this: https://msdn.microsoft.com/en-us/library/system.servicemodel.servicebehaviorattribute.instancecontextmode(v=vs.110).aspx

Verifying further down in the net.tcp channel session mode setting, it is defined as true. This is by channel design: it is sessionful. There is no way to change this. So, since there is a mismatch between the service session mode (declared by the developer) and the channel session mode (defined by .NET WCF), there will be a leak. The client in this scenario is closing the proxy properly after making the request.

For this situation the only remedy is to decrease the receive timeout setting in the net.tcp binding configuration as mentioned previously. So, if you are using net.tcp and do not mean to leverage sessions, set receive timeout value to 15 or 30 seconds. If you don’t know if your application requires session is because it does not. If you control the WCF Service server side, use PerSession instancing mode (and still keep receive timeout low).

Bottleneck on client side

By default, a net.tcp or net.pipe client will be limited to 10 concurrent outbound connections. It is a good number if the client is a standalone application. If the WCF client is a Web Application, the concurrent outbound connections limited to 10 may become a bottleneck. You may increase that value by changing maxConnections attribute in net.tcp binding in the client side configuration. This set has no effect on the server side. See: https://msdn.microsoft.com/en-us/library/ms731343(v=vs.110).aspx

If you are an advanced user, use this commands in a dump file to see the outbound calls (requires NetExt):

!wfrom -implement System.ServiceModel.Channels.TransportOutputChannel where((!$implement(“*preamble*”))&&($enumname(state)!=”Closed”)) $a(“Address”,$addr()),$a(“Url”,to.uri.m_String),$a(“State”,$enumname(state)),$a(“Open”,channelManager.connectionPool.openCount),$a(“Max”,channelManager.connectionPool.maxCount)

0:000> !wfrom -implement System.ServiceModel.Channels.TransportOutputChannel where((!$implement("*preamble*"))&&($enumname(state)!="Closed")) $a("Address",$addr()),$a("Url",to.uri.m_String),$a("State",$enumname(state)),$a("Open",channelManager.connectionPool.openCount),$a("Max",channelManager.connectionPool.maxCount)

Address: 0000000002C29590

Url: net.tcp://localhost:9090/servicemodelsamples/nettcp

State: Opened

Open: 0n30

Max: 0n30

Address: 0000000002C98238

Url: net.tcp://localhost:9090/servicemodelsamples/nettcp

State: Opened

Open: 0n30

Max: 0n30

Address: 0000000002C9DCC8

Url: net.tcp://localhost:9090/servicemodelsamples/nettcp

State: Opened

Open: 0n30

Max: 0n30

(...)

 

30 Object(s) listed

36 Object(s) skipped by filter

Reliable Session

Reliable Session is the WCF implementation of Oasis WS-RealiableMessaging (RM). Reliable session is an overkill for net.tcp binding because net.tcp already implements everything implemented by reliable sessions except for the reconnection. However, reliable session uses a different and complex code path and at a high performance cost. It really brings no added benefit but on the contrary, it opens a myriad of potential issues. I heard this from WCF product team when working a case where reliable session was being used.

The problem is having channel leak because of session mode mismatch (as explained before) with the extras hassle of reliable messaging. So, please use reconnection logic instead of reliable session if you do need reliable sessions.

Below is the advanced troubleshoot technique of searching for reliable session in a dump file and checking if the inner channel can be reopened (not aborted). This is THE NUMBER ONE source of problem with performance when using WCF with net.tcp or net.pipe. NetExt query:

!wfrom -type System.ServiceModel.Channels.ReliableChannelBinder?ChannelSynchronizer* where (($contains($typename(),”_ChannelSynchronizer<“))&&($enumname(state)!=”Closed”)) $a(“Address”,$addr()), $a(“State”,$enumname(state)), $a(“Inner Channel State”,$enumname(currentChannel.state)),$a(“Fault Mode”,$enumname(faultMode)), $a(“Is Aborted?”,currentChannel.aborted)

0:000> !wfrom -type System.ServiceModel.Channels.ReliableChannelBinder?ChannelSynchronizer* where (($contains($typename(),"_ChannelSynchronizer<"))&&($enumname(state)!="Closed")) $a("Address",$addr()), $a("State",$enumname(state)), $a("Inner Channel State",$enumname(currentChannel.state)),$a("Fault Mode",$enumname(faultMode)), $a("Is Aborted?",currentChannel.aborted)
Address: 01102AF4
State: ChannelOpening
Inner Channel State: Closed
Fault Mode: IfNotSecuritySession
Is Aborted?: 1 <;--- Leak: Inner channel aborted, Reliable Session will timeout with receiveTimeout
Address: 01123844
State: ChannelOpening
Inner Channel State: Closed
Fault Mode: IfNotSecuritySession
Is Aborted?: 1
Address: 01138E74
State: ChannelOpening
Inner Channel State: Closed
Fault Mode: IfNotSecuritySession
Is Aborted?: 1
Address: 01139F30
State: ChannelOpening
Inner Channel State: Closed
Fault Mode: IfNotSecuritySession
Is Aborted?: 1
(...)
Address: 18262A8C
State: ChannelOpened
Inner Channel State: Opened
Fault Mode: IfNotSecuritySession
Is Aborted?: 0
Address: 182EFD90
State: ChannelOpened
Inner Channel State: Opened
Fault Mode: IfNotSecuritySession
Is Aborted?: 0
Address: 184818C4
State: ChannelOpened
Inner Channel State: Opened
Fault Mode: IfNotSecuritySession
Is Aborted?: 0
Address: 185EBB28
State: ChannelOpened
Inner Channel State: Opened
Fault Mode: IfNotSecuritySession
Is Aborted?: 0
Address: 18615B84
State: ChannelOpened
Inner Channel State: Opened
Fault Mode: IfNotSecuritySession
Is Aborted?: 0
 
100 Object(s) listed
208 Object(s) skipped by filter

Load balancing net.tcp

Preventing load balance idle timeout is the only justified reason to use reliable session with net.tcp if you are really leveraging sessions. But, again, the same result can be achieved without reliable session if WCF Service receive timeout matches load balancer session idle timeout. Don’t feel encouraged to follow this path.

The use of load balancer is to provide scalability, so using sessions will require the configuration of stick sessions in the load balancer, most of the modern load balancers offers this type of sticky session that is different from HTTP’s. This sticky sessions will always match a client connection to a server. This defeats the purpose of scalability.