menu

Intro to OCaml

A multi-paradigm, general-purpose programming language

Official Documentation • OCaml for Windows • Pervasives • Atom • Nuclide
Symbol/Keyword/TermUsage
Calling functionsCalling OCaml functions can be done through [fun name] [args...]. Parentheses are only necessary to wrap a single argument/tuple together, and should not be used to wrap all the arguments. If you are getting unbounded name errors, you may be running in the toploop. In this case, you may need to add semicolons (;;) after function declarations to execute it.
let ... inA nested function must end with the 'in' keyword to show that its declaration is finished.
match e with ...OCaml's pattern matching: Given input 'e', match it with the expressions below. This is similar but more powerful than Java's switch system. 'Cases' are defined as '| c -> ...', where 'c' is the case. All cases start with '|', and each case is self containing (no break keyword). '_' may be used as a wildcard to accept all matches. Pattern matching is done in the order it is defined.
[1;2;3]Lists are wrapped with square brackets, and their items are separated with semicolons.
<>!= is used as the negation of ==, which tests by physical equality, not by value! Odds are you shouldn't be using this. The negation of '=', which tests for structural equality, is '<>'.
TuplesTuples are ways of combining multiple types into one type. For ints a b c, we may combine them by using (a, b, c), which results in a tuple of type int * int * int.
Note that construction involves commas, whereas type representation involves asterisks.
Also note that if you take in a tuple, you may directly deconnstruct it and name the elements within it (t: (int * int * int)) vs (a, b, c : (int * int * int)).
recA recursive function that calls upon itself must have this keyword; otherwise, the compiler will look for a previous definition of the given function name.
'aLike generic types in Java; we may use the notation (x: 'a) to denote that x can be of any type. 'a' in this case is arbitrary, but the apostrophe prefix specifies the 'any' type. We may specify that two arguments must have the same type by using (x: 'a) (y: 'a).
Note that adding apostrophes after a variable name (eg x') is perfectly fine and doesn't mean anything special.
head :: tailSplits a list into its first element, head, and the rest of the list, tail
a @ [b]'@' is used to append to lists. Note that both arguments must be of type list. In this example, a is a list and b is an item, which is wrapped in a list before concatenation.
optionSpecial type that wraps a variable with 'Some' and adds a valid return 'None'. A returned option may or may not return the inner type we wish to receive. This also exists in other languages, including Java.
Basic Example
Here is a sample that shows the structure of an ml file and how functions work
(* Basic OCaml program *)
(*
    Comments are denoted with this notation.
    There does not exist a shortened notatino for single line comments
*)

(*
    Create function 'add' that takes in a and b
    Both a and b are inferred to be floats.
    Note that basic operations with floats have a '.' after the operation,
    and that numbers are not automatically casted as it is in Java.
*)
let add a b = a +. b

(* Call the function like so. The outcome will be 5.0 *)
add 2.0 3.0

(* This call will fail. Numbers are not automatically converted *)
add 2 3

(* Type creation *)
(* Defines a type key that is exactly an int *)
type key = int
(* Defines a type btree. 'a represents a generic. Every other instance of 'a is of the same type *)
type 'a btree =
    (* Blank constructor to create an empty binary tree *)
    | Empty
    (* Node constructor that holds a btree (left), value, and btree (right) as a tuple *)
    | Node of 'a btree * (key * 'a) * 'a btree

(* Note that types are typically lower case, and constructors are upper casees *)
(* When match functions are called, it will check that all constructors are taken into account for the given type *)

(* Exceptions are created like so *)
exception DivideByZero
(* They may optionally wrap another type *)
exception ErrorMessage of string

(* functions are anonymous and only contain one input.
They automatically support matching, and is equivalent to
fun x -> match x with ... *)
(* The following takes two ints and divides them if the divisor is not 0 *)
let div d = function | 0 -> raise DivideByZero | x -> d / x
Recursion
A big part of functional programming rests on the ability to leverage recursive functions. It may be a bit odd to think recursively (stacked calls), as we are used to thinking with loops (flat calls). Fortunately, most looping functions can be converted to recursive functions by passing all mutable states into the method.
Consider the following question ( from TopCoder ):
A sequence of numbers is called a zig-zag sequence if the differences between successive numbers strictly alternate between positive and negative. The first difference (if one exists) may be either positive or negative. A sequence with fewer than two elements is trivially a zig-zag sequence.

For example, 1,7,4,9,2,5 is a zig-zag sequence because the differences (6,-3,5,-7,3) are alternately positive and negative. In contrast, 1,4,7,2,5 and 1,7,4,5,5 are not zig-zag sequences, the first because its first two differences are positive and the second because its last difference is zero.

Given a sequence of integers, sequence, return the length of the longest subsequence of sequence that is a zig-zag sequence. A subsequence is obtained by deleting some number of elements (possibly zero) from the original sequence, leaving the remaining elements in their original order.
A possible Java implementation is like so:
public static int getLongestZigZag(int[] inputs) {
    //Implement base cases
    if (inputs.length <= 2) return inputs.length;
    int marker = inputs[0];
    boolean isPeak = inputs[0] > inputs[1]; //set peak to accept first inputs
    int count = 1; //accept first number
    for (int i : inputs) {
        if (marker == i) continue;
        if (marker < i != isPeak) {
            count++;
            isPeak = !isPeak;
        }
        marker = i;
    }
    return count;
}
The logic is to increment the count if the next input changes directions, and to accept the next input if it increases the absolute value of the marker (as this will allow for more values to change the direction).

Notice the following:
  • The method takes in of an array of inputs
  • There are stateful variables (marker, isPeak, count)
  • There is an int return type
In recursive functions, you want to avoid stateful variables, as each call executes a new function in the stack. The basic solution is to pass everything the next function must know inside the function itself.

In other words:
  • The function takes in a list of inputs (ints),
    an int for a marker,
    an int for a count,
    and a bool for whether the zigzag is at its peak
  • The return value is still an int
With this in mind, we simply change the method signature, and implement the same method body. Note that we will keep the "base cases" out of the recursive function, as it is only checked once and isn't part of the loop.

Result:
let getLongestZigZag (inputs: int list) =
    let rec zigZag (inputs': int list) marker' count' isPeak' =
        match inputs' with
            | [] -> count' (* Done recursion, emit value *)
            | head::tail -> (let incr = marker' < head <> isPeak' in
              zigZag tail head (if incr && head <> marker' then count' + 1 else count') (if incr then not isPeak' else isPeak'))
    in
    match inputs with
    | a :: b :: tail -> zigZag ([b] @ tail) a 1 (a > b)
    | _ -> List.length inputs
Higher-Order Functions
In functional paradigms, functions are first class. They can be passed into other functions and used as return types. This is very similar to callback methods, or interfaces that have only one method in OOP.
(* List creator that takes in an initial value and an action, and creates the list until the stop condition is met *)
let rec unfold (f: 'seed -> ('a * 'seed)) (stop : 'seed -> bool) (b : 'seed) : 'a list =
  if stop b then []
  else let x, b' = f b in
    x :: (unfold f stop b')

(* Creates list of evens from 0 to 100 *)
(* Returns a function that takes in a stop condition and a start value *)
let even_generator = unfold (fun x -> (x, x + 2))

(* Expands on even_generator to give it a limit of 0 - 100 *)
(* Returns an int list *)
let even_0_to_100 = even_generator ((<) 100) 0

(* Short form functions *)

(* Returns int -> bool *)
let is_2 = ((=) 2)

(* Returns int -> int *)
(* Space required to avoid treating op as a comment *)
let times_3 = (( * ) 3)
States
OCaml can emulate states through the ref tag. Values created as references can be accessed and modified by other functions.
Creationlet x = ref 0
Modificationx := 3
Accessif !x = 3 then ...
Backtracking
Backtracking involves finding solutions incrementally by abandoning any candidate solution as soon as it is determined to be unsuccessful. One implementation is to throw exceptions when a candidate has failed, so the parent may try other solutions.
(* Given an amount, and a list of coins (int),
find the smallest number of coins necessary to supply the exact change.
Assume that the coins are ordered in descending order *)

exception Change

let rec change coins amt =
    if amt = 0 then []
    else match coins with
        | [] -> raise Change (* amt remains, but no more coin values *)
        | coin :: cs -> if coin > amt then change cs amt (* coin too big, skip *)
        (* Try to include current coint. If not possible, skip and try again *)
        else try coin :: (change coins (amt - coin)) with Change -> change cs amt
Modules
OCaml allows support for interface structures through modules. It helps control complexity by breaking down large programs into separate pieces. Module signatures help guarantee function headers, without giving away implementation.
(* Module types are defined using signatures *)
module type STACK =
    sig
        type stack
        type el
        val empty : unit -> stack
        val is_empty : stack -> bool
        val pop : stack -> stack option
        val push : el -> stack -> stack
        (* val push : int -> stack -> stack *)
        (* cannot be el -> stack -> stack as 1 isn't a Stack.el *)
        (* May use further abstractions through creation functions, but that wasn't covered in class *)
    end


(* Modules are implemented with structs; types can be specified using with *)
module Stack : (STACK with type el = int) =
    struct
        type el = int
        type stack = int list
        let empty () : stack = []
        let push i ( s : stack) = i :: s
        let is_empty s = match s with
            | [] -> true
            | _::_ -> false
        let pop s = match s with
            | [] -> None
            | _::t -> Some t
        let top s = match s with
            | [] -> None
            | h :: _ -> Some h
        let rec length s acc = match s with
            | [] -> acc
            | x::t -> length t 1 + acc
    end
Module structs can also take in one or more struct parameters and use them within their own definitions.
Continuation
Continuation represents the execution state of a program (eg a call stack) at a certain point in time. The state can be saved and restored at a later point To allow for tail-recursive functions, the header will contain an additional argument, a continuation, holding the entire state of the function. This will be passed to initialize or build upon the function.

Recursive map function example:
let map list mapper =
    (* takes in the list, mapper function, and continuation *)
    let rec map_tl = l f c = match l with
        | [] -> c []
        (* Prepend mapped header to rest of (to be) mapped list *)
        | h :: t -> map_tl t f (fun r -> c ((f h) :: r))
    in
    (* Return accumulated data *)
    map_tl list mapper (fun r -> r)
Lazy Evaluation
The concept of laziness stems from evaluation upon necessity, rather than evaluation upon creation. Languagues such as OCaml & Java typically evaluate any expression that is assigned to a value (variable). This is helpful in that the order is guaranteed, but unused variables may be unnecessarily evaluated. The goal of laziness is to avoid such situations, and to only return a value when it is required to compute something else. Laziness can be created by wrapping evaluations in functions of the form unit -> 'a, since we know that functions are not evaluated until execution.