Some years ago it goes public that Professor Robert Hagner from the Carnegie Mellon University at Pittsburgh just canceled the freshmen lesson for Object Oriented Programming (OOP). His reason was, that OOP is from nature antimodular and antiparallel and therefore not suitable for modern curriculum of Computer Science Studies.
I thought: “Wow”. Just for generations the thinking of software programmers, analysts and architects and the whole industry was in “Objects” and now it’s simply canceled. But what is true is true. With the limitation of raising processor clock pulse to speed up computer, the trend is going to parallel work instead. With Windows Azure as example it’s easy to scale processor count used by an application from one to thousands, but the application must be able to make use of it. So, let’s start think functional by using Visual F#. Microsoft defines F# as “a multiparadigm language that supports functional programming in addition to traditional object-oriented programming and .NET concepts. It’s a first class member of the .NET Framework languages”.
There is no universally accepted definition of functional programming, but one of the most agreed attributes about the functional programming paradigm is, that programming is done with expressions or declarations instead of statements. To remember, a statement is doing something like assigning a value to a variable, jumping to a other line of the code or calling a subroutine, while an expression is any section of the code that evaluates to a value.
So, let’s start with giving names to values. In F# this will be done with the
let keyword. It’s used to declare and to use identifiers:
// Integer and string. let num = 10 let str = "F#" // Storing integers and strings in a list. let integerList = [ 1; 2; 3; 4; 5; 6; 7 ] let stringList = [ "one"; "two"; "three" ] // Perform a simple calculation and bind intSquared to the result. let intSquared = num * num
Beside the simplification of implementing parallel and asynchronous tasks, the greatest benefit for enterprises to use functional programming is, that algorithm can be defined shorter and simpler. That lowers the maintenance costs of applications significant.
For a sample algorithm we will use a daily life example: we are searching on our laptop for documents, in our intranet for information related to our customers or in google for news and other information. This is called information retrieval and algorithms for that a mainly based on calculation of similarity which will be found out by calculating the distance between the searched terms existing in documents. One beside many of such algorithm is the Jaccard similarity coefficient.
The Jaccard similarity coefficient, also known as the Jaccard index, is a statistic method used for comparing the similarity and diversity of finite sample sets. It is defined as the size of the intersection divided by the size of the union of the sample sets.
One point which should be mentioned is the difference between symmetric versus asymmetric nominal variables inside the data set.
A binary variable has two states, 0 and 1 and is called symmetric, when there is no preference for which outcome of the binary variable should be coded as 0 and which as 1. For example, the binary variable “gender” for a human has the possible states “female” and “male” Both are equally valuable and carry the same weight when a proximity measure is computed.
On the other side, a binary variable is asymmetric if the outcomes of the states are not equally important, such as the positive and negative outcomes of a disease test. It is usually the rares one by 1 (HIV positive) and the other by 0 (HIV negative).
Lets suppose that we have a bunch of sheets defining professional skills:
// A list of skills // This employee is a system and data integration expert let employee1Skills = [".NET"; "C#"; "WCF"; "WF"; "BizTalk"; "SOA"; "BPMN"; "EAI"; "ESB"] // This employee is a project manager let employee2Skills = ["PM"; "IPMA"; "PMI"; "PMP"; "PRINCE2"] // This employee is a system engineer let employee3Skills = ["SCCM"; "AD"; "DNS"; "DHCP"; "GPO"; "SAN"; "LAN"; "WAN"]
and further a set of skills desired by a vacant job of a company:
// A companies vacant job for a Network Project Manager let vacantJob = ["BPMN"; "WF"; "DNS"; "DHCP"; "LAN"; "PM"; "PMP"; "PRINCE2"]
The Jaccard coefficient is now a useful measure of the overlap of attributes of one of the employee skill sets (we will call ‘A’) and the vacant job skill set (we will call ‘B’).
// Prepare the common set operations needed by the Jaccard Coefficent function let jaccardCoeff (first : string list) (second : string list) = let all = Set.union (first |> Set.ofList) (second |> Set.ofList) |> Set.toList let firstMatches = all |> List.map (fun t -> first |> List.contains t) let secondMatches = all |> List.map (fun t -> second |> List.contains t) // Next, the total number if each combination will be calculated
Each attribute of a A and B can be either present (we will define ‘1’) or absent (we will define ‘0’). The total number of each combination of attributes for both A and B are specified as follows:
M11 represents the total number of attributes where A and B both have a value of 1.
M01 represents the total number of attributes where the attribute of A is 0 and the attribute of B is 1.
M10 represents the total number of attributes where the attribute of A is 1 and the attribute of B is 0.
M00 represents the total number of attributes where A and B both have a value of 0.
// Now, calculate the total number if each combination let zipped = List.zip firstMatches secondMatches let M11 = zipped |> List.filter (fun t -> fst t = true && snd t = true ) |> List.length let M01 = zipped |> List.filter (fun t -> fst t = false && snd t = true ) |> List.length let M10 = zipped |> List.filter (fun t -> fst t = true && snd t = false ) |> List.length let M00 = zipped |> List.filter (fun t -> fst t = false && snd t = false ) |> List.length
Each attribute must fall into one of these four categories, meaning that
M11 + M01 + M10 + M00 = n
The Jaccard similarity coefficient, J, is given as:
// Calculate Jaccard Similarity Coefficient let J = float M11 / float (M01 + M10 + M11) // Return the calculated value J
To use the
[<EntryPoint>] let main argv = let simJobEmp1 = jaccardCoeff employee1Skills vacantJob let simJobEmp2 = jaccardCoeff employee2Skills vacantJob let simJobEmp3 = jaccardCoeff employee3Skills vacantJob printfn "Emp1: %f\nEmp2: %f\nEmp3: %f" simJobEmp1 simJobEmp2 simJobEmp3 0 // Exitcode as int
The result will show, that the employee working as a System Engineer has the highest similarity followed by the employee working as a Project Manager:
Emp1: 0.125000 Emp2: 0.272727 Emp3: 0.307692