NCS Logo - Click for home page Northstar Developer Center
Platforms
All Platforms
.NET Framework (1.x - 4.x)
Active Server Pages
ASP.NET
C#
SQL Server
VB.NET
Visual Basic

Keywords
.NET Data Types
.NET E-mail
.NET Events
.NET Functions
.NET Object Programming
.NET System.Configuration
.NET System.Diagnostics
.NET System.IO
.NET System.Net
.NET System.Net.Sockets
Active Data Objects
ASP Architecture
ASP Black Belt
ASP Built-in Functions
ASP Built-in Objects
ASP Debugging
ASP Performance
ASP Security
ASP Syntax
ASP.NET Authentication
ASP.NET Controls
ASP.NET Data Access
ASP.NET Features
ASP.NET Master Pages
ASP.NET Page Events
ASP.NET Security
ASP.NET ViewState
Atom
Certifications
COM, DCOM, COM+
Data Access
E-Mail
Errors
Exporting Data
HTML Tips
IIS
Object-Oriented Programming
RSS
SQL
Uncategorized ASP Tips
VB API Programming
VB Forms
VB Syntax
XML

Book Support
Visual Basic 6 Bible
ASP Bible
ASP Weekend Crash Course
ASP.NET At Work
Creating Web Services

Post Data Programmatically with Web-Scraping

Written by Eric Smith, Northstar Computer Systems LLC

Screen-scraping was a popular method for slowly converting mainframe applications into applications that would run on PCs. The application would connect to the mainframe, read data from the screen, and re-display it in a Windows-based application. Data entered into the Windows application would then be transmitted back to the mainframe.

If you have a Web-based application that doesn't support Web services, you can do a Web-based screen scraping using the HttpWebRequest and HttpWebResponse covered in a previous tip. The example in this tip posts a query to the Weather Channel and extracts the current temperature from the data sent back to the Web page. Here's the code you can put into a Web page for testing:

using System;
using System.Data;
using System.Configuration;
using System.Collections;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using System.Net;
using System.IO;
 
public partial class PostingData : System.Web.UI.Page
{
  protected void Page_Load(object sender, EventArgs e)
  {
    string outputBuffer = "where=46038";
 
    HttpWebRequest req = 
      (HttpWebRequest)WebRequest.Create("http://www.weather.com/search/enhanced");
    req.Method = "POST";
    req.ContentLength = outputBuffer.Length;
    req.ContentType = "application/x-www-form-urlencoded";
 
    StreamWriter swOut = new StreamWriter(req.GetRequestStream());
    swOut.Write(outputBuffer);
    swOut.Close();
 
    HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
    StreamReader sr = new StreamReader(resp.GetResponseStream());
    string buffer = sr.ReadToEnd();
    sr.Close();
 
    int start = 0, end = 0;
    string startTag = "<B CLASS=obsTempTextA>";
    string endTag = "";
    start = buffer.IndexOf(startTag, StringComparison.CurrentCultureIgnoreCase);
    end = buffer.IndexOf(endTag, start, StringComparison.CurrentCultureIgnoreCase);
 
    Response.Write("Current temperature in ZIP 46038: " 
      + buffer.Substring(start + startTag.Length, end - start - startTag.Length));
 
//    Response.Write(Server.HtmlEncode(buffer));
  }
}

It starts by creating a HttpWebRequest to weather.com's search URL, which I found by looking at its home page search form. Part of the "fun" of webscraping is trying to figure out what all has to be sent on a post in order to get back valid results. In this case, you have to send only a value of where with the ZIP code you want to query. That information is stored in the outputBuffer variable in POST format, which means each name/value pair is separated by ampersands, similar to what you would see in a query string.

Next, the example populates the request with the post information and then requests the response, which has the effect of sending the data to the remote server. It retrieves the information into a string buffer and closes up the response stream.

This, unfortunately, is the tedious part of webscraping. You have to find the information you want in the response buffer. For this page, the resulting HTML (which can be dumped out to the page using the commented line at the end of the code) is 224KB of HTML to search through. However, the data you want is stashed between a reasonably easy tag to find. Using some simple string manipulation, you can extract the value and show it on the screen.

As you might guess, this is fairly "fragile" code. If the Weather Channel decides to change its page or the tag you're looking for, the code will fail to find the information it needs. That's one of the major reasons why Web services have become popular. The Weather Channel's page is designed for humans to read, not computers. The Web services that handle weather, on the other hand, send back only the relevant content and not all the formatting found in the page. However, if you don't have another option, webscraping can be a handy tool.

Keywords: [ .NET System.IO | .NET System.Net ]

Publication Date: 11/27/2006, Last Update: 3/22/2010