When you are starting with Python it’s difficult to choose the right IDE or programming environment. This video explains some of the good options available.

# Blog

## Types of Machine Learning Systems

In the last video we looked at what is Machine Learning and why is it used.

In this video we will learn about the different categories of ML Systems.

- Supervised vs Unsupervised vs Semi Supervised vs Reinforcement Learning Systems.
- Online vs Batch Learning Systems
- Instance based vs Model based Systems.

## What is Machine Learning

In this video you will learn what is Machine Learning (ML) and why do we use it.

It provides a gentle introduction to ML using a simple email example. The requirement is to predict if an email is spam or not.

## Python Programming Notes

Below I have listed some basic concepts that will help you to get started with programming in Python.

This is in no way extensive and familiarity with a programming language will be helpful. I will recommend you to visit the official Python documentation which will come in handy.

**Comments** can be added using #

**Basic data types** include Numbers, String etc.

**Strings** are immutable in Python. Which means you cannot modify a string.

**Format** method can be used to print values

x1 = ‘Jack’

print(‘{0} is new to the city’.format(x1))

**Python** does not allow special characters such as @, $, and % within identifiers. Python is a case-sensitive programming language.

**Python** is strongly object-oriented in the sense that everything is an object including numbers, strings and functions.

**How to indent**

Python does not use braces. Use four spaces for indentation.

Example:

if(a == 5):

print(‘Indentation works’)

**Functions**

Functions in Python let you create reusable piece of code. A function can have a block of code which performs a specific task. Like finding a prime number. Let’s see a simple function which returns sum of two numbers.

def getSum(a,b):

return a+b

**VarArgs parameters**

Sometimes you may want to define a function that can take any number of parameters, i.e. variable number of arguments, this can be achieved by using the stars.

```
def getSum(*numbers):
sum = 0
for x in numbers:
sum = sum+x
return sum
print(getSum(3,4,5,6))
```

What * does is to collect all values in a tuple. With ** we get the values in a dictionary.

**DocStrings**

This allows us to provide documentation for a function. A string on the first logical line of a function is the docstring for that function.

def getSum(*numbers):

“””This function return sum of the numbers.

“””

return sum

print(getSum.**doc**)

**Modules**

If we want to group a set of functions together to make a library. We can create a module by putting functions in a .py file or we can also write code in a language like C. Upon compilation it can be used by the python interpreter.

To use a module we need to import it. To import a function from a module we can use

from math import sqrt

from test_module import getDiff

**Packages**

Modules can be clubbed together in Packages.

**Data Structures in Python**

Data Structures are used to efficiently organise, manage and access data.

Python provides four types of data structures : List, Tuple, Dictionary and Set.

**List Data Structure**

Elements in a List can be of different data types. They are ordered. A List is mutable meaning it can be modified.

```
myList = [2,3,'a']
print(myList)
```

To iterate through a List we can use

```
for i in range(len(numbers)):
numbers[i] = numbers[i] * 2
```

**List Operations**

The + operator concatenates lists

The * operator makes it repeat

The slice operator also works on lists: print(myList[2:3])

We can use append() to add value to the end of the list.

sort() is available to sort values in list

To add all elements of the list we can use sum(myList)

**Map, Reduce and Filter**

Map is when we have a function that maps each element of a list with another. Like map each character to its upper case.

Reduce is like function that sums all values of the list and returns just one element.

Filter is like getting a subset based on some condition.

**Delete from a List**

pop can be used like d = myList.pop(3) It returns the element that was removed.

We can also use del with slice index. Like : del mylist[1:2]

**Dictionaries Data Structure**

A key-value pair. When we print the dictionary the order of items may be different.

To traverse the keys in sorted order, you can use the built-in function sorted.

We cannot use a list as a key since it’s mutable. We may not get the same hash value.

In a key-value pair the hash value is computed using the key.

dict = {1:”Ram”, 2:”Mike”}

**Tuples** are like Lists but are immutable.

t = (32, 34, 23)

Another way to create a Tuple is using t = tuple()

If we try to do this now t[0] = ‘g’; we get TypeError: ‘tuple’ object does not support item assignment

**Set Data Structure**

Python also supports Sets. Sets can be used to hold unique elements. But the elements in a set need not be ordered.

a = {2,3,4}

## Getting started with Python – Your first Program

In this video tutorial you will learn how to get started with Python. You will learn how to write your first Python program.

## Why learn Python?

Why should I learn Python is a very valid concern for anyone. Especially if you come from a Java, C or a JavaScript background.

Below I try to answer this question.

- Python is a
**simple**but**powerful**language that lets you focus on problem at hand rather than syntax etc. - It’s
**easy**to learn. If you are familiar with an object oriented programming language like Java or a functional programming language like Javascript then it will be very easy for you to get upto speed with Python. - It provides effective high level data structures along with
**object oriented**features. - Its
**free**and open source. - Its portable, it can be ported to any platform.
**Interpreted**: Python does not need compilation to binary. Python converts the source code into an intermediate form called byte-codes and then translates this into the native language of your computer and then runs it.- It supports both
**procedure**and object oriented programming. - Python comes with a rich set of standard libraries. These can help us with all sorts of functionalities like databases, multithreading, regular expressions etc. Apart from this there are various other high quality libraries available.
- People claim that using Python makes programming easier for them.

Let’s look at some other factors which favour Python over low level languages like C.

**C vs Python**

While the best possible runtime performance can be achieved in a low-level C programming language, working in a high-level language such as Python usually reduces the development time and often results in more flexible and extensible code.

You can write code in C that can power your Python libraries that are computationally expensive.

Today CPU-hours are cheap and are getting cheaper, but man-hours are expensive. This makes a strong case for minimising development time rather than the runtime of a computation by using a high-level programming language and environment such as Python and its scientific computing libraries.

Hence a solution that partially avoids the trade-off between high- and low-level languages is to use a high-level language for interface libraries and low-level languages for implementations.

Python excels at this type of integration. Code written in C can be used for computationally expensive operations. At high level for interface etc. Python can be used. This is an important reason why Python is a popular language for numerical computing.

Some other features include:

- No braces needed. Statement grouping is done via indentation.
- No variable declaration is needed.
- High level data types allows you to express complex operations in a single statement.

Python is quickly becoming the language of choice when it comes to Data Analysis and Machine Learning.

**Python for Engineering and Scientific Applications**

SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering.

In particular, these are some of the core packages:

- NumPy
- SciPy
- Matplotlib
- iPython
- Sympy
- Pandas

References

## An introduction to NumPy

**What is NumPy?**

In Python efficient data structures for working with Arrays are provided by NumPy. These data structures form the the core of NumPy library. NumPy is primarily used for performing Scientific computations. The elements in a NumPy array are of the same type.

**Why use NumPy?**

The core of NumPy is implemented in C and hence is pretty efficient. Using the data structures provided by NumPy improves the performance. It’s a very important part of scientific Python ecosystem.

**How Python List differs from NumPy Arrays?**

A Python list is a heterogenous collection of elements. Whereas in NumPy array all elements are of same data type and array is of fixed size. In addition NumPy provides a large set of functions to work with the data structures.

**How to use NumPy?**

To get started import the NumPy library using*import numpy as np*

**What is ndarray class in NumPy?**

It is the main class to represent a multidimensional array. In addition to the element values It also stores meta data. Like type, shape, size etc.

To create an ndarray one way is to use the following code*Y = np.array([1, 2, 3, 4, 4, 5])*

By typing np.ndarray we can get all the attributes.

Following attributes are provided as part of ndarray:

- shape : Dimension of the array like (2,3)
- size : Number of elements
- dtype : The data type of the elements in the array.

For numerical work the most important data types are int (for integers), float (for floating-point numbers), and complex (for complex floating-point numbers).

To create an array of float type elements we can give*np.array([1, 2, 3], dtype=np.float)*

Working with complex data type*data = np.array([1, 2, 3], dtype=complex)*

We can either print this using*data*

Or get real and imaginary parts using real and imaginary attributes*data.real**data.imaginary**In NumPy the default format to store multi dimensional array is row-major.*

**How to create ndarray?**

Using np.array is a basic way to create an ndarray. But practically there may be requirements, like reading data from a file and creating an ndarray, which need to be handled differently. NumPy library provides a rich set of functions to handle this.

Let’s look at some of the functions

- np.array : using a Python list for example. Ex. a = np.array([34,44,54]) creates a one dimensional array.
- np.zeros : Array filled with 0s
- np.ones : Array filled with 1s
- np.from-file : read data from a text file.
- np.random.rand : Generates an array with random numbers that are uniformly distributed between 0 and 1. Other types of distributions are also available in the np.random module .
- np.full : create an array filled with a value. Ex. a = np.full(5, 2)

NumPy library provides us with two methods to create a range of values that are evenly spaced.

Like if we need a sequence like 2,4,6 etc. there are two ways:

- np.arange(start, end, increment)
- np.linspace(start, end, no_of_elements)

np.logspace can be used to distribute elements logarithmically.

**What is a Meshgrid Array?**

For generating multidimensional coordinate grids we can use np.meshgrid.

Ex:

X = np.array([2,3,4])

Y = np.array([5,6,7])

a,b = np.meshgrid(X,Y)

Output

a = ([

[2,3,4],

[2,3,4],

[2,3,4]

])

b = ([

[5,5,5],

[6,6,6],

[7,7,7]

])

np.empty is also handy if we just want to declare an array without initializing it. This can save some time. But using np.zeros is better.

**What is Slicing?**

Let’s talk about 1-D arrays. Slicing can be done to select range of elements. We can use negative integers to extract elements from the end of the array. Like x = a[-2]

Look at the following slice examples:

- a[m:n] selects elements in the array starting at m and ending at (n-1).
- a[m:n:2] selects elements in the array starting at m and ending at (n-1) in increments of 2.
- a[::-1] selects all elements in reverse order.
- a[-5:] selects last 5 elements.

Example :

In : a[1:-1:2]

Out : array([1, 3, 5, 7, 9])

Let’s talk about multi-dimensional array. In this case we can apply the slicing operation on each axis. Let’s use lambda function and apply it on a 6*6 array.

In : f = lambda m, n: n + 10 * m

In : a = np.fromfunction(f, (6, 6), dtype=int)

In : a

Out : array([ [ 0, 1, 2, 3, 4, 5],

[10, 11, 12, 13, 14, 15],

[20, 21, 22, 23, 24, 25],

[30, 31, 32, 33, 34, 35],

[40, 41, 42, 43, 44, 45],

[50, 51, 52, 53, 54, 55]

])

Look at the following slice examples:

a[:,1] gives us the second column i.e. [1,11,21,31,41,51]

a[2:,:2] gives ([

[20,21],

[30,31],

])

**What is Reshaping and Resizing?**

Sometimes rearranging arrays can be helpful. Like arranging a N*N matrix as a vector of size N^2.

Some of the functions that NumPy provides to reshape an ndarray are:

- np.ndarray.flatten : Create a new 1-D array. Collapses all dimensions to just one.
- np.reshape : Reshape an n-dim array.
- np.squeeze : Removes axes with length 1.
- np.ravel : Similar to flatten but modifies original.
- np.transpose

Example

In : data = np.array([[1, 2], [3, 4]])

In : np.reshape(data, (1, 4)) // Creates a new array

Out : array([[1, 2, 3, 4]])

In : data.reshape(4) // Modifies existing one

Out : array([1, 2, 3, 4])

Generally the NumPy library gives two options – either modify existing array or create a new one.

**Arithmetic Operations on Matrices**

We can perform standard arithmetic operations on Matrices.

Ex

In : x = np.array([[1, 2], [3, 4]])

In : y = np.array([[5, 6], [7, 8]])

In : x + y

Out: array([[ 6, 8],

[10, 12]])

If we multiply a matrix with a scalar then it will apply to all the elements of the matrix.

When we apply an arithmetic operation then we get a new array. This can impact memory footprint and also degrade performance. It’s better to use **inplace** operation in such cases.

Like

x = x + y // uses __add__

x += y // uses __iadd__ which is an in place operator

Refer to – https://stackoverflow.com/questions/4772987/how-are-python-in-place-operator-functions-different-than-the-standard-operator

Similar behavior is observed for other mathematical operations like Trignometric, Logarithmic etc. Vectorized operations are applied and we get a new array as a result.

**Aggregate Functions**

We can perform aggregate operations using NumPy like taking sum of all array elements or finding the mean, median, standard deviation etc. We can specify the axis also along which operation needs to be performed.

Like a.sum(axis=0)

**Set Operations**

NumPy also provides the capability to work with a sets. A set is used to store unique, unordered elements. NumPy provides a set of functions to work on a set of elements.

Like np.unique can be used to get a set of unique elements.

a = np.unique([1, 2, 3, 3])

We can perform union, intersection etc. operations between two given sets.

**Summary**

NumPy is a core library for computing with Python that provides a foundation for nearly all computational libraries for Python. Familiarity with the NumPy library and its usage patterns is a fundamental skill for using Python for scientific and technical computing.

## Locality Sensitive Hashing using Euclidean Distance

It’s quite similar to Locality Sensitive Hashing (LSH) for Cosine Similarity which we covered earlier. I will be referring to the same here, so it’s better if you go through the same before proceeding.

The difference lies in the way we compute hash value. As we have seen we can divide the region using planes. In each region we can have data-points.

Follow these steps (refer to diagram)

1. Divide the plane into small parts.

2. Project each data-point on the planes.

3. For each datapoint take the distance along each plane and use it to calculate the hash value.

Rest of the procedure is similar to cosine similarity process. Like finding the nearest neighbor.

## Locality Sensitive Hashing using Cosine Similarity

The problem we are trying to solve is to predict the class of a new data point, given a dataset with pre-classified data points.

Two key ideas we will use here are k-NN algorithm and LSH. If you don’t know about these concepts then I will suggest you to check them out first.

**What is Cosine Similarity?**

At a high level cosine similarity can tell us how similar two points are. To do this we compute the vector representation for the two points and then find the angle between the two vectors.

The similarity between vectors a and b can be given by cosine of the angle between them.

We can use this concept to calculate the hash value for a data point.

Now that we know cosine similarity we can use this to calculate LSH values for data points. To do this we divide the space using hyperplanes.

Refer to the image below to understand each of the points explained next.

For simplicity consider a 2-D space with X-Y axis. We can divide this into 4 regions by using 2 planes / lines L1 and L2.

So a data point “A” will reside in one of these regions. For each plane we can find in which direction the point “A” lies, by using the concept of normal vector.

This way we can find the value for each plane. For each plane the value will be either +1 or -1. We can use this to calculate Hash Key.

Once we have the hash table in place we can use this to determine the key for a new data-point. And then find the nearest neighbors.

Say the new point lands in the bucket with key =1. Then we know it’s near to the points A,B. Next apply k-NN to find it’s classification.

## What is a k-d tree

k-d tree is a binary-tree based data structure. It can be used for data which is k-dimensional.

**What do we mean by k-dimensional?**

You may have heard of 2-D and 3-D. Similarly, we have higher dimension space.

**Why do we need k-d tree?**

Using k-d tree we can partition a k-dimensional space into regions. This allows for efficient searching. It’s used in computer graphics and nearest neighbour searches.

**How it works?**

Let us consider a 2-D dataset. A point in this can be represented as <X,Y>

**Constructing a k-d tree**

We follow these steps to construct the tree.

1. Select a dimension and project all points along that. Example: Project all points along X-axis.

2. Take the mean of points generated along X-axis. Let’s call it X_M

3. Split using the mean. If a point X1 is < X_M then it goes to the left sub-tree else right.

4. Repeat steps 1-3 for all dimensions.

By constructing this tree we have partitioned the space into smaller regions. Given a new query point we can traverse through the Tree to find an appropriate region.