# Wednesday, 20 June 2018

Parsing information from a web page is not a trivial task. Fortunately, HTML has a defined structure and libraries exist to help us navigate that structure.

One such library for C# is HTML Agility Pack or HAP.

You can add this library to a project via NuGet. Simply right-click your project in the Visual Studio Solution Explorer, select "Manage NuGet Packages" (Fig. 1); then search for "HTML Agility Pack" and install the package (Fig. 2).

hap01-MangeNuGet
Fig. 1

hap02-NuGet
Fig. 2

Once the package is installed, you can load your document into an HtmlAgilityPack.HtmlDocument and begin working with it.

There are 3 ways to load a web page into an HtmlDocument: from a file on disk; from a string of HTML, from a URL, and from whatever document is loaded in a browser.

Below are examples of each (taken from the HAP web site).

// From File
var doc = new HtmlDocument();
doc.Load(filePath);

// From String
var doc = new HtmlDocument();
doc.LoadHtml(html);

// From Web
var url = "http://html-agility-pack.net/";
var web = new HtmlWeb();
var doc = web.Load(url);

// From Browser
var web1 = new HtmlWeb();
var doc1 = web1.LoadFromBrowser(url, o =>
{
	var webBrowser = (WebBrowser) o;

	// WAIT until the dynamic text is set
	return !string.IsNullOrEmpty(webBrowser.Document.GetElementById("uiDynamicText").InnerText);
});
  

Listing 1

I am interested in this project because I have a web page that lists all my public presentations and I want to make this page data driven, so I won't have to update a text file every time I schedule a new presentation.

To populate a database with this information, I could either type in every presentation or grab it from my web page and parse out the relevant information in an HTML table. Since I have delivered nearly 500 presentations, re-typing each of these individually seemed like way too much work.

Here is a screenshot of my "Public Presentations" page:

image
Fig. 3

and here is a partial listing of the source HTML for that page.

<table id="talklist">
 <tr style="height:15.0pt">
  <th style="height:15.0pt;width:62pt" width="83">
   Date</th>
  <th width="458">
   Topic</th>
  <th width="311">
   Event</th>
  <th width="64">
	Location</th>
 </tr>
        
 <tr style="height:15.0pt">
 <td>
  Sep 20, 2018
 </td>
 <td>
  Adding Image and Voice Intelligence to Your Apps with Microsoft Cognitive Services
 </td>
 <td>
  VSLive
 </td>
 <td>
  Chicago, IL
 </td>
 </tr>

 <tr style="height:15.0pt">
 <td>
  Sep 20, 2018
 </td>
 <td>
  Effective Data Visualization
 </td>
 <td>
  VSLive
 </td>
 <td>
  Chicago, IL
 </td>
 </tr>

 <tr style="height:15.0pt">
 <td>
  Jun 22, 2018
 </td>
 <td>
  Effective Data Visualization
 </td>
 <td>
  Beer City Code
 </td>
 <td>
  Grand Rapids, MI
 </td>
 </tr>
  

Listing 2

I used the following code to load the data into an HtmlDocument:

var url = "http://www.davidgiard.com/Schedule.aspx";
Console.WriteLine("Getting data from {0}...", url);
var web = new HtmlWeb();
var doc = web.Load(url);
  

Listing 3

The HtmlDocument has a DocumentNode property, which returns the root element of the HTML document as a node. Most of the time, I found myself working with nodes and collections of nodes. Each node has a SelectSingleNode and a SelectNodes method which returns a node and a collection of nodes, respectively. These take an XPATH argument, with which I was familiar from my days working with XML documents.

The following code retrieves a nodelist of all the <tr> row nodes within the "talklist" table, shown in Listing 2.

var documentNode = doc.DocumentNode;
var tableNode = documentNode
			.SelectSingleNode("//table[@id='talklist']");
var rowsNodesList = tableNode.SelectNodes("tr");
  

Listing 4

Finally, because each <tr> node contains 4 nodes of <td> cells, I can iterate through each node, find all the cells, and the innertext of each cell. For good measure, I stripped off any non-printing characters. Because the title row contains <th> cells, instead of <td> cells, I want to check for this before extracting information.

The code for this is in Listing 5.

var rowCount = 1;
foreach (var row in rowsNodesList)
{
	var cells = row.SelectNodes("td");
	if (cells != null && cells.Count > 0)
	{
		var date = cells[0].InnerText;
		date = date.Replace("\r\n", "").Trim();
		var topic = cells[1].InnerText;
		topic = topic.Replace("\r\n", "").Trim();
		var eventName = cells[2].InnerText;
		eventName = eventName.Replace("\r\n", "").Trim();
		var location = cells[3].InnerText;
		location = location.Replace("\r\n", "").Trim();

		Console.WriteLine("Row: {0}", rowCount);
		Console.WriteLine("Date: {0}", date);
		Console.WriteLine("Topic: {0}", topic);
		Console.WriteLine("Event: {0}", eventName);
		Console.WriteLine("Location: {0}", location);
		Console.WriteLine("--------------------");
		rowCount++;
	}
}
  

Listing 5

Here is the full code listing for my console app that retrieves the text of each cell.

using HtmlAgilityPack;
using System;

namespace TestHAP
{
    class Program
    {
        static void Main(string[] args)
        {
            var url = "http://www.davidgiard.com/Schedule.aspx";
            Console.WriteLine("Getting data from {0}...", url);
            var web = new HtmlWeb();
            var doc = web.Load(url);

            var documentNode = doc.DocumentNode;
            var tableNode = documentNode
                        .SelectSingleNode("//table[@id='talklist']");
            var rowsNodesList = tableNode.SelectNodes("tr");

            var rowCount = 1;
            foreach (var row in rowsNodesList)
            {
                var cells = row.SelectNodes("td");
                if (cells != null && cells.Count > 0)
                {
                    var date = cells[0].InnerText;
                    date = date.Replace("\r\n", "").Trim();
                    var topic = cells[1].InnerText;
                    topic = topic.Replace("\r\n", "").Trim();
                    var eventName = cells[2].InnerText;
                    eventName = eventName.Replace("\r\n", "").Trim();
                    var location = cells[3].InnerText;
                    location = location.Replace("\r\n", "").Trim();

                    Console.WriteLine("Row: {0}", rowCount);
                    Console.WriteLine("Date: {0}", date);
                    Console.WriteLine("Topic: {0}", topic);
                    Console.WriteLine("Event: {0}", eventName);
                    Console.WriteLine("Location: {0}", location);
                    Console.WriteLine("--------------------");
                    rowCount++;
                }
            }

            Console.ReadLine();
        }
    }
}
  

Listing 6

You can download this solution from my GitHub repository.

Although I don’t intend to use it for this, HAP also supports modifying the HTML you select with node methods like AppendChild(), InsertAfter(), and RemoveChild().

This tool will help me to retrieve and parse the hundreds of rows of data from my web page and insert them into a database.

C# | HTML5 | Web
Wednesday, 20 June 2018 08:39:00 (GMT Daylight Time, UTC+01:00)
# Tuesday, 19 June 2018

Years ago, I created a list of my public speaking. In order to make it more readable, I added a bit of jQuery to style alternating rows with a light gray color. I used jQuery because, at that time, CSS did not support the ability to apply a style to every other row in a table.

I liked the results, but the code always struck me as awkward.

image

Eventually, CSS3 provided this ability, but not every browser immediately supported this style. So I waited. And then, I forgot about it. It was slow, but it worked.

It has been a few years and I had a bit of time, so today I removed the jQuery reference and JavaScript from this page and achieved the same results using the CSS

Here is the table opening tag. I applied a style to it only to make it easy to find, in case there are other tables on the same page.

<table class="tiger-stripe">
  

Each row has no background-color styling, because I applied that later. Here is a sample row.

 <tr style="height:15.0pt">
  <td>
   Jun 11, 2018
  </td>
  <td>
   Building and Training your own Custom Image Recognition AI
  </td>
  <td>
   Norwegian Developers Conference
  </td>
  <td>
   Oslo, Norway
  </td>
 </tr>
  

Originally, I had all this within the page's <head>:

 <style type="text/css">
  table {width:90%; border:1px solid gray;}
  .oddrow {background-color:#E5E5E5;}
 </style>
 <script type="text/javascript" src="http://code.jquery.com/jquery-latest.min.js">
 </script>

 <script type="text/javascript">
  $(function(){
   $("table.tiger-stripe tr:even").addClass("oddrow");
 });
 </script>
  

I was able to accomplish the same results using the "even" argument of the "nth-child" selector. I replaced the code within the <head> above with the following:

<style type="text/css">
table {width:90%; border:1px solid gray;}
table#presentationlist tr:nth-child(even) {background-color:#E5E5E5;}
</style>
  

I decided to replace the table’s class (“tiger-stripe”) with an id (“presentationlist”), since the class only exists to identify this table. Strictly speaking, this was not necessary, but it felt cleaner this way. Here is the new table tag.

<table id="presentationList">
  

Each row has no background-color styling, because I applied that later. Here is a sample row.

By eliminating the custom JavaScript and the jQuery reference, the code is much simpler and the page loads faster. I did not need to modify any HTML. CSS selectors take care of all the styling.

This is an example of using declarative programming, instead of imperative programming to accomplish a task. In other words, the new system tells the browser the results I want without explicitly specifying how I want it done.

You can see the results here

HTML5 | Web
Tuesday, 19 June 2018 20:27:18 (GMT Daylight Time, UTC+01:00)
# Monday, 18 June 2018
Monday, 18 June 2018 03:46:00 (GMT Daylight Time, UTC+01:00)

IMG_0860Most of the Norway I saw is defined by towering cliffs, the result of glaciers gouging their way through the country thousands of years ago. Many of these filled with water to became the famous fjords of Norway.

This was the Norway I experienced when I made my first trip here this week.

I was invited to speak at the The Norwegian Developer Conference (“NDC”), so I flew to Oslo after speaking at IT Camp in Romania. I arrived in Oslo on a rainy Sunday night after a week in Romania.

IMG_0920The first 2 days in Oslo, I worked on a Machine Learning project for Bane Nor - the Norwegian national railway. This was a great experience for me, as Microsoft flew in engineers from all over the world and I had an experience to learn about the train industry from the customer and about data science from several experts.

NDC began on Wednesday, so I arrived bright and early to experience it all. The conference was amazing. Hundreds of speakers from all over the world come to Oslo each year for some high-quality sessions. I knew some of the speakers and I had the opportunity to meet many more.

IMG_0873Wednesday evening, the conference organizers treated all the speakers to a boat ride around the islands near Oslo, which was a great chance to meet new people.

IMG_0932My presentation - Building and Training your own Custom Image Recognition AI - was the last one of the day. I was happy to get it over with on Day 1, but I spent much of Wednesday preparing for it. In the end, it went very well. The bright stage lights prevented me from seeing the audience, but I received several good questions afterward, so I think the audience enjoyed it as much as I did.

IMG_0993The day after the conference, I booked a trip to Bergen. Oslo is near the eastern border of Norway and Bergen is on the west coast, so this all-day trip took me across the entire country. It consisted of 3 trains, a bus, and a boat. The boat ride was the most impressive as we traveled through the fjords of central Norway. The trip was designed to be more scenic than efficient and it took us from Oslo to Myrdal to Flåm to Gudvangen to Voss to Bergen. The fjord boat cruise took me to the northernmost point I have ever been, edging out my trip to Upsala, Sweden 3 years ago.

IMG_1028I only had one day in Bergen and I was exhausted from 2 weeks on the road. But I did a lot of walking around the city, and visited 2 art museums, and drank some local beers, and ate reindeer stew and whale steak, and sat by the harbor to watch the sun set at midnight. Scandinavian daylight last for over 20 hours this time of year, making it very difficult for me to pace myself.

IMG_1035I missed Father's Day in America (most European countries celebrate in March), but I will make time with my boys in the next few weeks.

IMG_1049In a few hours, I fly home, tired but content from 2 weeks abroad traveling thousands of miles. I feel like I need to return to Norway and see all the places I missed. Hopefully, NDC will make that happen next year.

Monday, 18 June 2018 00:21:00 (GMT Daylight Time, UTC+01:00)
# Saturday, 16 June 2018

There exists a competition in which a speaker must deliver a presentation in front of a slide deck that advances automatically every few seconds. The biggest challenge is knowing when the slide will change. Because most of us are not capable of simultaneously counting time and speaking, these talks often feature either awkward pauses waiting for the next slide or a rush to finish talking about a slide that disappeared a few seconds ago.

Prior to such a competition, one of the speakers asked me how to create a bar that would display across the bottom and gradually disappear as time expired. Here is how to do this:

Step 1: Create your slide

In PowerPoint, create your slides as you like each slide and set them to advance automatically. Fig. 1 shows an example of such a slide.

pp01-Slide
Fig. 1

Step 2: Draw a rectangle

Select one of your slides and insert a short wide rectangle shape at the bottom. From the Insert ribbon, select Shapes and click the rectangle shape, as shown in Fig. 1; then, drag your mouse along the bottom of the slide to draw the rectangle, as shown in Fig. 2. Make the rectangle exactly as wide as the slide.

pp02-InsertShape
Fig. 2

pp03-Shape
Fig. 3

Step 3: Animate the rectangle

Select the rectangle shape you just added. From the Animations ribbon, expand the list of animations and select "Wipe" from the "Exit" section, as shown in Fig. 4.

pp04-Animation
Fig. 4

By default, the "Wipe" animation will wipe the shape from the bottom. You want to wipe it from the right. Select the shape and, from the Animations ribbon, select Effect Options | From Right, as shown in Fig. 5.

pp05-EffectOptions
Fig. 5

Finally, set the timing of the animation. In the "Timing" section of the Animations ribbon, set the following:
Start: With Previous
Duration: Set to the same duration as the slide timing
Delay: 0

These are shown in Fig. 6.

pp06-Timing
Fig. 6

Step 4: Test your Transition

Press SHIFT+F5 to run this slide with the transition. You should see the rectangle slowly disappear from the right and completely disappear as the slide transitions to the next slide. Figures 7a, 7b, and 7c illustrate this.

pp07a-Results
Fig. 7a

pp07b-Results
Fig. 7b

pp07c-Results
Fig. 7c

Step 5: Copy Shape to Other Slides

When you are satisfied that the animation is working properly, copy / paste this shape to your other slides. The animations will copy along with the shape.

Saturday, 16 June 2018 04:59:34 (GMT Daylight Time, UTC+01:00)
# Thursday, 14 June 2018

GCast 2:

Azure Machine Learning Studio

Azure Machine Learning Studio is a graphical designer that allows you to quickly build Machine Learning solutions without writing a lot of code.

Thursday, 14 June 2018 02:47:24 (GMT Daylight Time, UTC+01:00)
# Monday, 11 June 2018
Monday, 11 June 2018 00:40:00 (GMT Daylight Time, UTC+01:00)
# Sunday, 10 June 2018

Romania

Achievement unlocked: I played Dungeons and Dragons last night for the first time in my life. And I did it in Transylvania!

I am writing this while sitting in the Cluj-Napoca airport, waiting for my flight to Bucharest and my connection to Oslo this evening.

IMG_0762This is the fifth consecutive year I have visited Romania to attend IT Camp. I enjoy it more each time I come - the conference, the people, and the country.

IT camp has become like a family reunion for me. I look forward to seeing old friends from Romania and from Europe and America. Most of them I only get to see once a year, so it is a real treat for me to come here. And, as always, the Hotel Grand Italia spoils me with their excellent service.

IMG_0824The conference continues to grow. Attendance was 500-600 this year (about 10% more than last year) and the speaker list grew to over 40. Session times were shortened to 45 minutes this year in order to accommodate the larger number of sessions. I delivered 2 presentations: "Own Your Own Career – Advice from a Veteran Consultant" and "How Cloud Computing Empowers a Data Scientist". I had a packed room for the first session, with many people standing in the back. A number of people approached me during the conference to ask more questions about my topics and to tell me they enjoyed my talks, which is always a treat.

In between sessions, I met new people, re-connected with old friends, recorded 4 interviews, learned a few things, and Tudy taught me how to play Dungeons and Dragons, even though I was so tired I nearly fell asleep an hour into the game.

IMG_0807Some inclement weather and a need to prepare my presentations kept me close to the hotel during the conference; but IT Camp always includes a field trip the day after the conference. This year, they took us to Sighișoara, a small city in central Romania most famous as the alleged birthplace of Vlad the Impaler, the inspiration for Count Dracula.

IMG_0818I am grateful to Mihai and Diana and Tudor and Noemi and the many volunteers who work hard to make IT Camp a success and to make me feel welcome. I got a lot out of this trip.

And how many people can say their first game of Dungeons and Dragons took place in Transylvania?

Sunday, 10 June 2018 20:48:30 (GMT Daylight Time, UTC+01:00)
# Saturday, 09 June 2018

SilverthornSilverthorn by Raymond Feist continues the Riftwar Saga, begun in Magician: Apprentice and Magician: Master.

The interplanetary war has ended, the rift between the world has closed, and Lyam has been named king.

But evil forces still lurk on Midkemia and an assassination attempt on Anita, the betrothed of Lyam's brother Arutha leaves her dying from a poison-dipped crossbow bolt. Most of this book follows Arutha and his companions in their quest to find the rare silverthorn plant, which promises to provide an antidote for the poison that is slowly killing Anita.

Pug, the title character of the first two novels is relegated to a minor role until the last few chapters, when he returns to Kelewan to seek a way to battle an evil magician hoping to take over and destroy Midkemia. The journey is perilous because Pug is an outlaw in Kelewan, thanks to his disruption of the High Magicians and a powerful warlord.

Feist does a good job of keeping the action flowing and the reader engaged. This book's story is more straightforward than that of its 2-volume predecessor. It does not suffer from the weakness of the earlier volumes - having characters summarize certain events, rather than letting the reader experience them directly. We follow the characters as the action unfolds.

Silverthorn stands well on its own; but is best enjoyed as part of the Riftwar Saga. I have already begun the next book in the series.

Saturday, 09 June 2018 17:54:01 (GMT Daylight Time, UTC+01:00)
# Friday, 08 June 2018

MagicianMasterA warning: You are much more likely to enjoy Raymond Feist's "Magician: Master", if you first read his novel "Magician: Apprentice". This story was originally published as a single volume, titled simply "Magician"; but after the success of this novel, Feist convinced his publisher to allow him to expand the story into 2 books and release those separately. As a result, neither book stands well on its own, but together they form a solid story.

"Magician: Master" continues the story of Pug, who was captured by the Tsurani of Kelewan during the Riftwar and sold into slavery before rising to power via his magical abilities. He ultimately returns to his home world of Midkemia in attempt to gain a truce; but he is confounded by evil forces and by a senile king.

This book and its predecessor hold up well as a high fantasy novel. Numerous characters are developed and cross paths through multiple subplots.

The greatest weakness of the story is Feist's tendency to skip significant events, then relay them later in retrospect. A character dies, but we don't hear of it until his girlfriend relays the news;  Pug transforms into a master magician, but we have very little insight into how this happens; he goes away and somehow is more powerful than his teachers when he returns. Showing the reader these actions would almost certainly have more impact than telling us about their results.

Despite its weaknesses, I enjoyed the 2-volume story and I have already begun reading the next book in the series.

Friday, 08 June 2018 09:56:00 (GMT Daylight Time, UTC+01:00)