HTTP File Transfer Classes
by Michael Nuwer

THIS ARTICLE is aimed at assisting programmers who need the ability to upload or download binary files in their web applications. I began working on these HTTP File Transfer Classes as a way to manage assignments in my Introduction to Computing course at the College where I teach. Typically the course enrolls 100 students each semester. My problem was rooted in the need to have students retrieve a Microsoft Word or Excel file whose name had to include the student's username.

For example, if my user name is nuwermj and I downloaded the first word processing assignment, the file would be named: wp1nuwermj.doc. After the student retrieved the file and completed their project, they needed a way to submit the file back to the web server so that the instructor (yours truly) could evaluate the work.

By using the HTTP File Transfer Classes, I am able to ensure that the correct username is included in the filename when the student downloads an assignment and control the location of the file when the finished assignment is completed. Since I must manage over 1,000 files each semester, these dBL classes have save me an immeasurable amount of time and aggravation.

While refining the HTTP File Transfer Classes, I have also discovered additional benefits associated with their use. So even if your circumstances are different from mine, you may find these classes useful for your Web applications.

While working on the HTTP File Transfer Classes, I have receive a great deal of help and advice from Bowen Moursund. Although I am responsible for any errors or omissions, his guidance has greatly improved the final results. Moreover, his advice has been a rewarding learning experience, as I now have a much better understanding of Object Oriented Programming.

A copy of the HTTP File Transfer Classes is contained in the archive file that accompanies this article. HTTPFileTransfer.cc contains two classes. The first is CGIUpload which is used for uploading files from a web browser to a Web server via a CGI application. The second class is CGIDownload. This class is used for streaming a file from a CGI application to a Web browser. The remainder of this article will discuss these classes.

Downloading files

The Internet is loaded with hyper-links that cause a binary file to be downloaded to a local computer. These links often use the HTML Anchor tag which looks like the following:
 
 
<A HREF="SomeFile.zip">Click here</A>
   

For downloading files it couldn't be easier, but this method has its limits. First, the file must be stored within the web server’s webroot (or it web space), which, in some circumstances, can be a security risk. Second, the file must be static. That is, the file must exist on the server before the user can download it. Finally, the name of the file must also be static so that it alway matches the HREF value.

The CGIDownload class makes it possible to overcome these limitations. Downloadable files can be stored outside of web space, which can give you better control over who has access to those recourses. Additionally, the content of a file can be customized before it is downloaded. For example, your CGI application may create a temporary table that contains a subset of data from your database (copy to temp for x = y). This temporary table can then be streamed to the user and deleted from the server.

The CGIDownload class is actually quite simple. The code creates an instance of the dBASE file object, reads the binary file to be downloaded, and writes that data to the StdOut port where the web server is waiting. What makes this technique work is the CGI header that is sent to the web server.

In a standard CGI application the CGI Header sent to the web server is:
 
 
Content-type: text/html
   

This “MIME” type tells the web server that the data which follows is HTML and it is then served to the Web browser as such. In the case of our CGIdownload class, the CGI Header is this:
 
 
Content-Type: application/x-unknown
Content-Disposition: attachment; filename=myFile.zip
   

This tells the Web server to handle the subsequent data as an attachment. When the Web browser gets this message, it pops up a Save As dialog box so that the user can store the file on their system. (Note: when steaming any CGI header to a web server the header information must be followed by an empty line. This is required and indicates to the server where the header ends and the data begins. It is typically referred to as a boundary.)

Using the CGIDownload class is quite simple; the following code is all that is required:
 
 
set procedure to HTTPFileTransfer.cc additive
oCgi = new CGIDownload()
oCgi.connect()
oCgi.download("c:\data\someFile.dbf","someFile.dbf")
quit
   

The main work is done by the download() method. This method accepts two arguments. The first is the full path for the target file, including the file name, as it resides on the web server. As noted above, that file can be located anywhere on the web server or even in a network location. (You must be sure that the web server has sufficient permissions to read from the storage location.)

The second parameter is the name you want to assign to the downloaded file. In many case the file name will not change when it is downloaded, but in those case where you do wish to rename the file, this is where it is done. This second parameter is the file name that will appear in the client's Save File As dialog form.

When you use this technique to download files, note that there is no response page.

CGIDownload is subclassed from webclass.cc. This means that the developer has access to all the methods contained in webclass.cc. Connect() is the method that connects your dBL CGI application to the web server and decodes the data sent from the web browser. This means that we can use information passed by the user to process the download file.

Consider, for example, the following form which is similar to an HTML form that I use in my “Introduction to Computing” course.

Computing 101
Retrieve a Project

Select the project you want to Retrieve.


Enter your username 


 

The user must select the project that they wish to download and enter their username. These two parameters are sent to my CGI application and added to the oCGI associative array. I can use the information to get the file identified by the value of p_name and rename it, on the fly, with the username value. (For details see DownloadSample.prg in the archive accompanying this paper.)

Uploading Files

Uploading a file from a Web client to a server is much more problematic than downloading a file. In the early days of the World Wide Web the only way to accomplish an upload was to use an FTP server. This could be rather cumbersome because user accounts needed to be set up and separate software needed to be used. Moreover, it was not possible to integrate binary attachments with a web form. Thus, HTML message boards and Web based email systems could not handle attachments.

The CGIUpload class makes it possible to transfer files from web users without using an FTP server. Moreover, the files can be stored outside of web space and the developer can control filenames. The specification for HTML file uploads are found in rfc1867 (Request For Comments: 1867) which is titled “Form-based File Upload in HTML”. A copy of that document is included in the archive that accompanies this article.

There are two key elements to implementing uploads from an HTML form. First, the TYPE attribute of the HTML INPUT tag must be set to the FILE option. This places an object on an HTML form that lets the user supply a file as input. When the form is submitted, the content of the specified file is sent to the server as the value portion of this object's name/value pair. When this object is used, the Web browser displays a “Browse” button next to the file input field that lets the user select a file from their system. The following is an example of this HTML form element:
 

File name:

The second element necessary for implementing HTML uploads is to declare multipart/form-data as the MIME type of the HTML form. Your form tag will then look similar to this:
 
 
<FORM ENCTYPE="multipart/form-data" ACTION="upload.exe" METHOD=POST>
   

In a standard HTML form, the ENCTYPE attribute of the FORM tag is set to application/x-www-form-urlencoded. If your form does not define this attribute application/x-www-form-urlencoded is assumed. However, in order to use HTML file uploads, the MIME type must be changed to multipart/form-data.

Putting these two elements together, the HTML form that we must use to upload a file is similar to the following:
 
 
<FORM ENCTYPE="multipart/form-data" ACTION="upload.exe" METHOD=POST>
File to process: <INPUT NAME="userfile1" TYPE="file">
<INPUT TYPE="submit" VALUE="Send File">
</FORM>
   

With these modifications to a standard HTML form, the CGI data is sent to the Web server as individual lines separated with boundary markers rather than as a string of name/value pair separated with ampersands. A boundary marker is selected by the Web browser and is sent as part of the header. The multipart form data sent by a Web browser looks similar to the following:
 
 
Content-type: multipart/form-data, boundary=-------16740297823514

    -------16740297823514
    content-disposition: form-data; name="field1"

    Joe Blow
    -------16740297823514
    content-disposition: form-data; name="upload_file1"; filename="file1.gif"
    Content-Type: image/gif

    ... contents of file1.gif ...
    -------16740297823514--

   

The primary task of the dBL CGIUpload class is to parse this multipart data stream.

CGIUpload is subclassed from WebClass.cc, however the connect() method is overridden and the functionality in loadArrayFromCGI is replaced with the ReadMultipart() method. This is an important point because it means that when you call connect(), the oCGI array is not loaded with name/value pairs.

To load the oCGI array, you must call ReadMultipart(). This method begins by reading the entire multipart data stream into a temporary file on the Web server. Then the method starts parsing that file line by line. If the parsing finds an uploaded file, it extracts the data and saves it as a file with the submitted filename. That filename is then added to the oCGI associative array. ReadMultipart can, if needed, handle more than one uploaded file in a session.

For the data depicted above, the name would be "upload_file1" and the value would be "file1.gif". If a second INPUT tag is used in the HTML form so that a second file is uploaded, that name might be "upload_file2" and its value might be "file2.gif". ReadMultipart also loads standard name/value pairs into the oCGI associative array. Access to the name/value pairs then works the same as with a standard CGI application.

After the temporary file is parsed by ReadMultipart, it is deleted. By default the temporary file and the uploaded file(s) are saved in the same folder as the application. If you would like to change this location, use the TempPath and SavePath properties of the CGIUpload Class.

These paths can be hard coded into your program file, or they can be stored in the application's INI file and read with INI.cc. (For examples, see UploadSample1.prg and UploadSample2.prg found in the archive accompanying this article.) If the desired location for saving the uploaded file is determined by information contained in the HTML form data, then you will need to let ReadMultipart save the file on the Web server and then copy it to the desired location. This is because the oCGI associative array is loaded at the same time as the uploaded file is saved to disk. We therefore can not know the save location dictated by the web form prior to saving the uploaded file.

Consider, for example, the following form. The user selects a project to upload in the HTML Select list. And the file name is entered in the HTML INPUT element. The name of the former element is p_name and the name of the latter one is FileToUpload. On the server, after ReadMultipart is finished, these parameters are used to copy the uploaded file from the default location to the appropriate project folder (see UploadSample1.prg for details):
 
 
copyFrom = oCGI.SavePath + oCgi['FileToUpload']

copyTo = "d:\cmpt301\projects\" + oCGI['p_name'] + "\" ;
         + oCgi['FileToUpload']

   

Computing 101
Submit Project

Select the project you want to submit.


  • Type the name of the file you want to submit. 
  • Or, use the browse button to select the file you want to submit. 
  • Be sure the full path to the file is included.

 

There are a few issues regarding CGIUpload that the developer should consider. First, the CGIUpload class does not limit the size of the upload. There is nothing that prevents a user from uploading 100 megabytes or more in a single session. To control this, you may wish to test the size of the upload before calling ReadMultipart. Something like the following would be appropriate.
 
 
#define MAXIMUM_UPLOAD   1000000 

if MAXIMUM_UPLOAD > 0 and val(getEnv("CONTENT_LENGTH")) > MAXIMUM_UPLOAD
   oCGI.SorryPage('The file uploaded is beyond the maximum limits')
endif

   

When MAXIMUM_UPLOAD is 0 (zero), there is no limit on the size of the upload, otherwise the upload would be limited.

Another issue the developer needs to consider is that CGIUpload has no facility for checking whether an uploaded file already exists on the server. If a file is uploaded with the same name as an existing file, the existing file will be overwritten. In some circumstances, a developer will want to obtain confirmation from the user before overwriting the existing file.

Conclusion

The HTTP File Transfer Classes can provide Web developers the ability to upload and download files between the client and the server. The benefits of HTTP file transfers are:

The HTTP File Transfer Classes are distributed in the hope that they will be useful in your dBL Web application. The classes are free software; you can redistribute them and/or modify them under the terms of the GNU General Public License. A copy of this license is included in the archive accompanying this article.

To download the HTTPFileTransfert custom class and its sample applications, click here
(it's a 24Kb zipped file)


The author would like to thank Robert Newman, his proof-reader, for the improvements he brought to this text.