How to make Web Crawler in C#
Web Crawler is a application that export web pages from web. Web crawler has various application in IT, Business and also in Government Agencies. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds.
Let starts:
- Create a windows project in C#
- Drag text box , Lable and button
- Need textbox to take URL from user
- Lable will be use for Output
Now you have to add following libraries in your upper code Section:
using System.Collections; using system.Net; using System.Text.RegularExpressions;
Then go to the click event of button and write following code:
string url=textbox1.text;
//WebClient is a class which provided by C# which do all necessary work to establish a connect:
WebClient webClient = new WebClient();
//use a string to save all data from wedsite
string strSource = webClient.DownloadString(URL);
webClient.Dispose();
Now this string will all html tags, javascript etc, to remove all unnecessary text from string use following syntex;
strSource = Regex.Replace(strSource, “<script.*?</script>”, string.Empty, RegexOptions.Singleline | RegexOptions.IgnoreCase);
strSource = Regex.Replace(strSource, “<style.*?</style>”, string.Empty, RegexOptions.Singleline | RegexOptions.IgnoreCase);
strSource =Regex.Replace(strSource, @”<(.|\n)*?>”, string.Empty);
lable1.Text =strSource;
Part2 Coming Soon (How to extract URL from Web pages).
Feel Free to ask question, Any suggestion or if you have any problem then kindly comment it out.











