Optimizing the Downloading of Large Files in ASP.NET

Maximizing performance is the holy grail of every web site.  Developers are always looking for that edge…the way to squeeze more bytes of throughput from the same pipeline.  Whether it be large files or small, download speed is of paramount importance.  To facilitate a study of this topic, an examination of how to optimize the downloading of large files is useful, if only because even the smallest difference in speed is magnified if you’re trying to push 200MB files or larger to the client.

What are the Choices?

That, by the way, is the intent of this article.  To describe the different choices that are available to move very large files from a web server to a browser client.  As well as itemizing the choices, the article provides some empirical evidence showing the relative performance of each technique and discusses some of the reasons behind them. The choices that we are faced with are as follows:

·         Allow direct access to the file

·         Using Response.WriteFile

·         Streaming the file using Response.BinaryWrite

·         Use an ISAPI Filter

The next section looks at the mechanics of each of these choices.

Direct Access

The most obvious approach to delivering a file across the Internet is to place it in a directory accessible by a web server.  Then anyone can use a browser to retrieve it.  As easy as that sounds, there are a number of problems that make this alternative unworkable for all but the simplest of applications.  What if, for example, you don’t want to make the file available to everyone?  While adding NTFS-based security would protect the file from unwanted accessed, there is the administrative hassle of creating a local machine account for every user. 

More flexible access control mechanisms run into similar problems.  If you are using authentication schemes for which SQL Server or Oracle provide the data store, direct access is not going to be effective.  There is no easy way to validate the credentials of an incoming request made through a browser against a database (without getting ASP.NET involved, that is). When all of these factors are taken into consideration, it becomes clear that direct access only works in very limited situations.

Response.WriteFile

Since direct access isn’t realistic, ASP.NET must be a part of the solution. Using the WriteFile method on the Response object is the simplest way to programmatically send a file to the client.  The technique is quite simple.  Create an .ASPX page or other HttpHandler to process the request.  As part of the processing for the request, determine which file to download and use the Response.WriteFile method to send it to the client.  The following is a simple .ASPX page that demonstrates this.

<%@ Page language="c#" AutoEventWireup="false" %>

<%@ Page language="c#" AutoEventWireup="false" %>

<html>

   <body>

        <%

           if (Request.QueryString["File"] != null)

                Response.WriteFile (Request.QueryString["File"]);

        %>

   </body>

</html>

One of the benefits of using Response.WriteFile is that the security for the request is much more extensible.  The incoming requests are processed through the normal Internet Information Server (IIS) pipeline, which mean that IIS authentication can be applied to the request. And all of the events necessary to plug in your own custom authentication are available.

Aspnet_wp.exe

Request

 

aspnet_isapi.dll

AppDomain

Response

The Document

Named pipes

Inetinfo.exe


So what is the downside to using WriteFile? It does not work well when large files are involved.  To understand why, a brief description of IIS’s architecture helps.

Figure 1 – IIS 5.0 Request Processing Architecture

Figure 1 illustrates the processes used in IIS 5.0. When a request arrives at the web server, Inetinfo.exe determines how it should be processed.  For .aspx requests, the aspnet_isapi.dll handler is used.  Aspnet_isapi.dll in turn communicates the request to the worker process (aspnet_wp.exe).  The worker process contains one or more application domains (one per virtual directory, typically). The web site actually runs as an assembly loaded into the appropriate AppDomain within the worker process.  It is this assembly that ultimately handles to the request, compiling and transmitting the response as necessary. 

Going into a little more detail, aspnet_isapi.dll is a Win32 (i.e not managed code) DLL.  The purpose of aspnet_isapi.dll is threefold.  First, it is responsible for routing the incoming request to the worker process.  Second, it monitors the health of worker process, killing aspnet_wp off if performance falls below a specified threshold.  Finally, aspnet_isapi.dll is responsible for starting the worker process before passing along the first request after IIS has been reset.  It is the first of these three tasks that is of interest to us.

The routing of the request as performed by aspnet_isapi.dll requires that a communication mechanism be established between it and the worker process. This is accomplished through a set of named pipes. A named pipe is a mechanism that, not surprisingly, works like a pipe.  Data is pushed into one end of the pipe and retrieved from the other.  For local, interprocess communications, pipes are the most efficient available technique.

Given this information, the flow of each .aspx request is: InetInfo.exe to aspnet_isapi.dll through a named pipe to the worker process.  Once the request has been evaluated and a response formulated, that information is pushed back across the named pipe to aspnet_isapi.dll.  Then back through Inetinfo.exe to the requestor.

If you put this architecture into the context of processing requests for large files, you can see why there might be a problem with performance.  As the file is moved back through to pipe to aspnet_isapi.dll, it gets placed into memory.  Once the entire file has been piped, the file is then transmitted to the requestor.  Empirical evidence suggests almost a one-to-one growth in the memory consumed by inetinfo.exe (as shown by perfmon) and the size of the file being retrieved.  Figure 2 contains the perfmon output for inetinfo.exe as two separate requests to retrieve a file 23MB in size is processed using Response.WriteFile.

 


Figure 2 – Memory Spikes in the Transfer of Large Files

Although there is no way to illustrate it here, this memory growth cannot be avoided.  Even when buffering is turned off at every step along the way, starting from the ASP.NET page level and moving out to Response.Buffer = false.  Sure the growth is temporary, but think of how much fun your server will have processing 3 or 4 simultaneous requests for 50 MB files.  Or 30 or 40 requests.  You get the picture. 

Response.BinaryWrite and Response.OutputStream.Write

Given the memory growth that is the symptom of using the WriteFile method, the next logical step is to try to break the outgoing file into pieces.  After all, if inetinfo.exe is placing the entire response into memory, then giving it smaller pieces should minimize the impact of transmitting a large file. Before the next piece comes in, the previous piece is sent on to the client, keeping the overall memory usage down. Fortunately, this is not a challenging problem, as the following code demonstrates.

using( Stream s = new FileStream( fileName, FileMode.Open,

FileAccess.Read, FileShare.Read, bufferSize ) )

{

byte[] buffer = new byte[bufferSize];

      int count = 0;

      int offset = 0;

      while( (count = s.Read( buffer, offset, buffer.Length ) ) > 0 )

      {

            ctx.Response.OutputStream.Write( buffer, offset, count );

      }

}

The code that adds the appropriate headers to the response have been left off for conciseness.

One of the benefits of this approach is that there is a lot more control that the developer can exert over the download process.  Want to change the size of the buffer?  No problem.  Want to put a short pause between chunks in an attempt to give inetinfo.exe a chance to reclaim some memory?  No sweat.  Unfortunately, all of these scenarios are useless when it comes to large files.  Regardless of the how the transmitted files are broken up or at which level buffering is enabled or disabled, large files end up becoming a large memory sink for the ASP.NET process.

ISAPI Filters

Given what has been discussed so far, it seems apparent that the problem with returning large file seems to be rooted in inetinfo.exe.  More accurately, it seems to be found in the area surrounding where the named pipes are used to communicate between the aspnet_isapi.dll and the worker process.  After all, when aspnet_isapi.dll isn’t involved (such as in the first scenario), there is no problem. For ASP.NET requests, large file transfers mean large memory consumption.  So what can be done to reduce the amount of data that moves through the named pipe?  What would be nice is if you could combine the speed of direct access with the authentication and authorization capabilities offered by .ASPX pages. Luckily, that combination is within our power to deliver.

The purpose of an ISAPI filter is suggested by its name.  It is a DLL that sits between the requester and the web service.  With an ISAPI filter, it is possible to intercept both ingoing and outgoing messages.  As part of the interception process, the messages going in either direction can be modified.  As well, because the filter is not an endpoint for a request, there is no need for the client to be aware that the filter is even being used.  The term for this type of functionality is orthogonal, a fact that I mention only because it’s my favorite word.

So let’s consider what the purpose of this ISAPI filter is in the context of our dilemma.  In the .ASPX page, we will take the name of the requested file and perform the necessary authentication and authorization. Then, as part of the process, the path to the requested file is placed into the headers that are part of the response.  The response is then directed back towards the requestor. This is where the ISAPI filter kicks in.

The ISAPI filter in question interposes itself between the worker process and the requestor.  It examines the outgoing message looking for a special header…one that contains a path to a file.  When that header is detected, it extracts the file path, removes the header from the response and adds the contents of the file path to the response.

From the client side, the response now looks exactly like what is expected.  From the server side, the request was authenticated and authorized properly.  From the performance side, the file was streamed into the response as part of the inetinfo.exe process.  Most importantly, it didn’t come through the named pipe that is used to communicate with the worker process.  And the problem with the momentary memory growth goes away.

Comparing the Choices

As part of the investigation process, each of these alternatives was compared from a speed and a memory usage perspective.  The results can be found in the table below

Download Technique

Download Speed (MB/s)

Download.ashx

5.89

WriteFileDownload.aspx

20.69

IsapiDownload.ashx

50.40

WriteFileDownload.aspx with IIS 6.0 Isolation

34.50

Table 1 – Relative Performance of the Different Techniques

Please realize that this is not a formal benchmark.  A client application was created that transmitted a request to an appropriate configured server.  The test was run 20 times for each technique, with an IIS Reset and a dummy request (used to spin up the IIS process) being performed before each of the tests.  The numbers in the table represent the averages across all of the tests.

The More Things Change…

One of the challenges of working with technology is that the pace of change frequently causes old ideas to be tossed and new ones embraced.  The introduction of IIS 6.0 had that effect on the architecture of ASP.NET.  Although IIS 6.0 is capable of functioning using the same architecture (complete with an aspnet_isapi.dll and an aspnet_wp.exe worker process), there is a new model known as process isolation that impacts the solutions to the large download problem that we have been discussing.  As before, a few words on the process isolation architecture are useful.


Figure 3 – IIS 6.0 Request Processing Architecture

When a request arrives at the web server, it is first processed by HTTP.sys, a kernel-mode listener.  It is the responsibility of HTTP.sys to determine whether the request can be satisfied from the cache.  If not, then the request is put into the request queue associated with the virtual directory that is the target of the request.

The actual servicing of the request is done by a worker process. The job of the process is to fulfill the request.  The worker process listens on the appropriate request queue for the incoming requests (the ones placed there by HTTP.sys). The response is generated and sent back to HTTP.sys for transmission back to the original requestor.

Turning our attention back to the problem of large files, remember that the memory growth issue seemed to be rooted in the transfer of files into the InetInfo.exe process. Or specifically, the fact that Inetinto.exe seemed to hold the incoming file before sending it on to the browser.  From the IIS 6.0 Isolation Mode model shown in Figure 3, you can see that the InetInfo.exe process is no longer part of the pipeline.  As a result, when the request is run in this manner, memory growth is no long an issue.  While transfer rate is not as fast as the ISAPI Filter technique (there are still a number of boundaries that need to be crossed), it is certainly an improvement over the same mechanism in IIS 5.0. 

Summary

So there you have it.  To serve up large files to a client, you can go with highest speed and most difficult security administration (direct access), slowest but simplest (Response object methods) or a combination of techniques (with a combination of pros and cons). Or you can improve on this last choice by upgrading to IIS 6.0 and running in Isolation mode.  As always, there is no ‘best’ answer.  All that I’ve done is provide you with the information necessary to make you own decision, whatever the situation.

 

Comments

  • TrackBack April 4, 2005 10:35 AM

  • bruce May 22, 2005 7:00 PM

    Do not miss Response.TransmitFile which is MSFT's response to problems mentioned here. See http://support.microsoft.com/kb/823409/EN-US/

  • TrackBack May 28, 2005 8:13 AM

    Interesting finds this morning

  • bruce October 6, 2005 7:52 AM

    Could you do a performance test of TransmitFile? And review that option?

    ----

    VisualCron - the real alternative task scheduler

    http://www.visualcron.com

  • bruce November 6, 2005 4:28 PM

    Hi Alex,

    I´m getting some trouble in storing files with asp.net. I dont need any code. Just an idéia will help.

    I have a directory that have more than 80.000 subdirectorys. I granted “write permission” to aspnet user to the parent directory and make the child to inherit the permission, but in last few weeks the OS (Windows Server 2003) just not recognize the permission that a i granted, in other words, just don´t “inherit”. In fact if i go to one of 80.000 folders to see the ACL the permission is granted right but any action of change, delete ou insert file get failed.

    I´m having search a lot for a pattern solution, but i didnt find anything.

    I´m thinking about reorganize the tree into small folders (with a few subfolders), but i´m not sure if it will solve my problem.

    Thanks.

    [],
    Charles
    charles@onwaytecnologia.com
    Brazil

  • bruce December 17, 2005 3:50 AM

    sdfsfss

  • bruce February 21, 2006 6:40 AM

    hi Bruce
    How about some code that shows how to involve the ISAPI Filters

  • bruce May 6, 2006 2:36 PM

    Thank you for a fine article. I have been in search for such a great explanation to what all that goes on when downloading huge files, which presently, I am working on writing ASP (not .aspx) code that will download big files.

    Thank you,
    deDogs
    meander@nomadicdog.com

  • bruce May 11, 2006 8:16 AM

    This Article is fantastic.
    We faced this exact situation sans the Handler thign, honestly I didn't know. But the Response.Write and its siblings crashed when the file size to be streamed client grew to say 15-20 MB.
    Then I opted for the Binary Stream Method of the Response Class, which gave me afar better performance and I was able to stream 40 MB Files and also tested it with varying Buffer chunk sizes. Try with the varying chunk sizes would help.
    Regards
    Chiranjiv

  • bruce May 30, 2006 1:49 PM

    Interesting articlle Bruce... but could you elaborate more on ISAPI? Have scoured the web on this subject to no avail.

    Your help would be appreciated!

  • bruce September 6, 2006 1:02 AM

    Response.TransmitFile rocks. It does not buffer bytes before transmitting.

  • bruce September 12, 2006 3:45 AM

    This article is really great. We were looking for the development of the same kind of application, you really help us. Thanks
    Andy

  • bruce September 12, 2006 4:10 AM

    Thanks for your the fantastic article for Optimizing the download of large files in ASP.NET

    Cheers

  • bruce September 13, 2006 3:05 AM

    Hi,
    My simple question is that is there any way that IIS Server or any Asp.Net inbuild feature make us enable to download a large size (approx 200 MB) without any error like Application Server is Unavailable. I already used all the above prcedures that article listed above.

    Please specify if there is any hardware constraints( like RAM Size. etc.)

    Need and hope for a robust solution from all the experts.

    http://www.webserverindia.com/

    Anand Sah

  • bruce October 20, 2006 11:53 AM

    Thanks for your the fantastic article for Optimizing

  • bruce November 24, 2006 1:59 PM

    Thank you for your article "Optimizing the Downloading of Large Files in ASP.NET" - Greetings. Keep up the good work.

  • bruce November 27, 2006 6:25 AM

    Well done, thanks for showing it in images - easy to understand.... Thx a lot and greetz!

  • bruce November 28, 2006 11:44 AM

    Hi there, great article!!!! Does anyone of you experts then now if it is possible to count the number of bytes that were sent with a 100% accuracy? I need to know this in my application because we are supposed to bill by downloaded bytes. Cant really find any good article / doc about this, anyone.

    And again, thanks for your article

  • Aziz ur Rahman December 11, 2007 1:31 AM

    Very nice researched article. It explains the whole big picture in very easy manner.

Leave a Comment

(required) 
(optional)
(required) 

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS