Simpson’s Paradox

One of the most interesting mathematical curiosities I know of is called Simpson’s Paradox. Briefly, data in a table can lead to two opposite conclusions depending on how the data is presented.

Suppose you are looking at job application and job offer rates at some company, organized by males and females, so you can decide if the company is discriminating against women, as of course they probably are. The company has two departments, Administration and Production.

The hiring data for the entire company is:

              Hired / Applicants
Male         205/650 = 0.32
Female       150/500 = 0.30

So 32% of males who applied were hired, and only 30% of females who applied were hired. The government looks at the data, sues the company, and executives immediately order all managers to take diversity and gender training, and to increase the hiring of women.

But the exact same data, broken down by department is:

           Administration    Production
Male         5/50 = 0.10     200/600 = 0.33
Female    100/400 = 0.25      50/100 = 0.50

So, in Administration, the percentage of women who are hired is more than twice that of men. And in Production, the percentage of women hired is also much greater than that of men (50% to 33%).

In short, the grouped data shows men are hired at a greater rate than women, but the ungrouped data shows that women are hired at a greater rate than men in all departments!

(For this example, the ungrouped data is a more accurate profile of the company – men in fact are at a hiring disadvantage.)

Posted in Miscellaneous | Leave a comment

Reading a Text File using JavaScript

I don’t work with Web applications very often. But recently I’ve been taking a close look at a pre-release version of the Bing Maps 8 library. As part of that investigation, I decided to look at reading a text file using JavaScript.


In the old days, you’d have to use an ActiveX control to read a file. But now reading a text file is quite easy.

The key JavaScript code is:

function ReadFile()
  var file = fileInput.files[0];
  var reader = new FileReader();
  reader.onload = function(e) {
    displayArea.innerText = reader.result;

Here ‘fileInput’ is the ID of an HTML input type=file tag, and ‘displayArera’ is the ID of an HTML pre tag.

Very nice.


Posted in Miscellaneous | Leave a comment

Bing Maps 8 Page with Control Area

I’ve been looking at a pre-release version of Bing Maps 8. Version 8 will have new interactive features such as the ability to draw shapes on the map. This means that a Bing 8 map can more easily be the focus of an application, rather than being a semi-static map that supports Web page content (like a map that is associated with a restaurant).

So, I figured I’d code up an example HTML page that has a Bing map and a control area for user input and output. I’m no UI guy for sure. I just wanted something as simple as possible.


After some experimentation, I created an HTML page with a control area on the left and an area for the map on the right. Both are HTML div tags that float:left followed by a magic “br style=”clear: left;” tag to make the two areas side-by-side.

The controlPanel div contains a button input (to indicate typical user controls) and a textarea named “msgArea” for messages. Messages are programmatically displayed with:

function UpdateMsgArea(txt)
  var existing = msgArea.value + "\n" + "\n";
  msgArea.value = existing + txt;

The UpdateMsgArea function uses the value property plus two “\n” characters for double-spacing. I originally tried to use innerHTML plus two br tags, but a textarea doesn’t work that way.

Good fun. Looking forward to the release of Bing Maps 8.


Posted in Miscellaneous | Leave a comment

Bing Maps 8 Fetching a Drawn Shape

Bing Maps version 8 is scheduled to be released sometime this year. One new feature is a Drawing Control that allows users to draw a polygon on the map.

Drawing is fine but in order for the drawn shape to be useful, it’s necessary to know where the shape is on the map.


Because there’s no documentation yet, I don’t know if there will be a nice function that will return information about drawn shapes. So, I hacked together a demo that intercepts mouse-down events (while the user is drawing) and records the associated vertex of the shape.

It was a surprisingly difficult task because the JavaScript event model is kind of wacky. I really hope Bing Maps 8 will have built-in functions along the lines of a hypothetical “getDrawnShapes()”.


Posted in Miscellaneous | Leave a comment

Bing Maps 8 Initialization Options

Bing Maps version 8 is scheduled to be released sometime this year. I’ve been looking at a pre-release version to get a feel for what’s new. Today I looked at map initialization options — different ways for loading a map into a Web page.


First, there are two loading modes, synchronous and asynchronous. To load asynchronously, you put a script tag that points to the Bing Maps 8 library, in the HTML head section. Then, you write a GetMap() JavaScript function and put a onload=”GetMap();” attribute in the HTML body tag.

To load asynchronously, you put a script tag with an attribute like this:

src='(URL)?callback=GetMap' async defer

at the end of the HTML body section (and the HTML body tag needs no onload attribute).

A program-defined function GetMap() looks quite the same as version 7 code. For example:

var map = null;

function GetMap()
  var options = {
   credentials: "AmUck2V2b_etc_jSCm",
   center: new Microsoft.Maps.Location(45.50, -122.50),
   mapTypeId: Microsoft.Maps.MapTypeId.road,
   zoom: 8,
   enableClickableLogo: false,
   showCopyright: false

  var loc = document.getElementById("mapDiv");
  map = new Microsoft.Maps.Map(loc, options); 

For some reason, I could not get the map height and width options to work. I’m not sure if they’ve been deprecated or the calling mechanism changed somehow. I’ll be glad when the documentation is released.


Posted in Miscellaneous | Leave a comment

A Simple Math Bar Bet (Harmonic Mean)

What is the average of 20 miles per hour and 60 miles per hour? Well, it’s (20 + 60) / 2 = 40 miles-per-hour, right? Wrong.

This is one of my favorite little math problems. The average of 20 mph and 60 mph is 30m mph, not 40 mph. Suppose the distance between points A and B is 120 miles. You drive from A to B at 20 mph. It will take you 6 hours. Now you turn around and drive from B to A at 60 mph. It will take you 2 hours. So your total distance traveled is 240 miles and your time was 8 hours. Your average speed was 240 / 8 = 30 mph.

To calculate the average of two rates, it’s better to use the harmonic mean. The harmonic mean of x and y = 2 / (1/x + 1/y).

A neat little problem that many people don’t know about. Very useful as a bar bet!


Posted in Machine Learning | Leave a comment

The Secretary Problem

For some reason that I’m not quite sure of, I remembered an interesting mathematical probability problem I first read about many years ago. It’s called the Secretary Problem.

Suppose you have 100 job applicants for the position of your secretary. You don’t know which applicant is the best one, and you don’t want to interview all 100 applicants. After each interview, you must immediately decide to hire the applicant, or reject the applicant and move on to the next applicant. What is some reasonable rule you can use?

In the theory of this problem, the term Candidate is the best applicant seen to date. Suppose you decide to interview until you’ve seen 2 Candidates, then take the next Candidate.

For example, suppose each applicant has (unknown to you) a rating of 1 through 100, with 1 being best. And suppose the order of the ratings of the applicant interviews is:

[5, 100 (worst), 3, 4, 2, 8, 1 (best), 7, 6, . . ] 

You would interview the first applicant. Their rating is 5 and they’d be Candidate #1 and you reject them. The second applicant gets a rating of 100 so they aren’t a Candidate and you move on. You interview the third secretary applicant and she gets a rating of 3 and so she is Candidate #2 and you move on. You will now take the next Candidate encountered. The fourth applicant gets a rating of 4, is not a Candidate, and you move on. The fifth applicant gets a rating of 2 is a Candidate and you accept her. (You never know about the remaining applicants, including the best one who would have come 2 interviews later).

There is a lot of fascinating math behind the Secretary Problem, and a lot of potential rules you can try. It turns out a good strategy is to reject the first n/e applicants then take the next Candidate, where n is the number of applicants and e is the math constant 2.718. So for 100 applicants you’d reject the first 100/2.718 = 37 applicants (while tracking Candidates) and take the next Candidate.

Posted in Machine Learning | Leave a comment