Publish Article

If you are seeing this section and do not see source code download links, this means that you are not logged into our site. If you already are a memeber, click on the login link and login into site and come back to this page for downloading the control files. If you are not a member, click on registration link to become a Winista member and download the control for free.

During the development of our HTML Parser For .Net we came across the issue of parsing documents other than HTML files. For example while crawling a page there may be a link that points to a PDF or Word document. That means we had to detect the mime type of those documents. There are 2 ways to do this detection. First approach is where you trust the file extension and parser the document based on that. Second approach which is robust is by looking at file signatires and then determinig its type.

This problem is not just limited to our parsing situation. A lot of time content management applications allow users to upload documents on the servers. And you need to restrict the type of files a user can upload. If you do not detect the file type by looking at content, a user can simply fool your system by changing the file extensions and bypassing your restrictions.

We came across these wonderful mime reader utility classes in open source Nutch crawler system. These classes are JAVA classes. We decided to convert these classes to C# and bring it to .Net user community. This is our first public release of the converted classes. We have wrapped them in a nice utility library which you can use in any project.

How to use MimeDetector?

There is one XML file "mime-type.xml" that contains information about file types and the signatures used to identify the content type. You will need this file to create instance of MimeTypes object. Once you have created MimeTypes object, then call GetMimeType method to get MimeType of the stream. If the mime type could not be determined then a null object is returned from this method. Following code snippet demonstrates use of the library.

    MimeTypes g_MimeTypes = new MimeTypes("mime-types.xml");
    sbyte [] fileData = null;
    using (System.IO.FileStream srcFile = 
        new System.IO.FileStream(strFile, System.IO.FileMode.Open))
        byte [] data = new byte[srcFile.Length];
        srcFile.Read(data, 0, (Int32)srcFile.Length);
        fileData = Winista.Mime.SupportUtil.ToSByteArray(data);
    MimeType oMimeType = g_MimeTypes.GetMimeType(fileData);							

If you have questions or suggestion please post your comment in our HTML Parser forum.

Try It

You can try the library by uploading a file. And if MimeDetector is able to determing its type from signature, it will report it.

Go Freelance
Home     About us     Contact us    Copyright    Privacy Policy    Return Policy    Advertisers
Copyright © Netomatix