An introduction to NumPy

What is NumPy?

In Python efficient data structures for working with Arrays are provided by NumPy. These data structures form the the core of NumPy library. NumPy is primarily used for performing Scientific computations. The elements in a NumPy array are of the same type.

Why use NumPy?

The core of NumPy is implemented in C and hence is pretty efficient. Using the data structures provided by NumPy improves the performance. It’s a very important part of scientific Python ecosystem.

How Python List differs from NumPy Arrays?

A Python list is a heterogenous collection of elements. Whereas in NumPy array all elements are of same data type and array is of fixed size. In addition NumPy provides a large set of functions to work with the data structures.

How to use NumPy?

To get started import the NumPy library using
import numpy as np

What is ndarray class in NumPy?

It is the main class to represent a multidimensional array. In addition to the element values It also stores meta data. Like type, shape, size etc.
To create an ndarray one way is to use the following code
Y = np.array([1, 2, 3, 4, 4, 5])

By typing np.ndarray we can get all the attributes.

Following attributes are provided as part of ndarray:

  • shape : Dimension of the array like (2,3)
  • size : Number of elements
  • dtype : The data type of the elements in the array.

For numerical work the most important data types are int (for integers), float (for floating-point numbers), and complex (for complex floating-point numbers).
To create an array of float type elements we can give
np.array([1, 2, 3], dtype=np.float)

Working with complex data type
data = np.array([1, 2, 3], dtype=complex)

We can either print this using
data

Or get real and imaginary parts using real and imaginary attributes
data.real
data.imaginary

In NumPy the default format to store multi dimensional array is row-major.

How to create ndarray?

Using np.array is a basic way to create an ndarray. But practically there may be requirements, like reading data from a file and creating an ndarray, which need to be handled differently. NumPy library provides a rich set of functions to handle this.

Let’s look at some of the functions

  • np.array : using a Python list for example. Ex. a = np.array([34,44,54]) creates a one dimensional array.
  • np.zeros : Array filled with 0s
  • np.ones : Array filled with 1s
  • np.from-file : read data from a text file.
  • np.random.rand : Generates an array with random numbers that are uniformly distributed between 0 and 1. Other types of distributions are also available in the np.random module .
  • np.full : create an array filled with a value. Ex. a = np.full(5, 2)


NumPy library provides us with two methods to create a range of values that are evenly spaced.

Like if we need a sequence like 2,4,6 etc. there are two ways:

  • np.arange(start, end, increment)
  • np.linspace(start, end, no_of_elements)

np.logspace can be used to distribute elements logarithmically.

What is a Meshgrid Array?

For generating multidimensional coordinate grids we can use np.meshgrid.

Ex:
X = np.array([2,3,4])
Y = np.array([5,6,7])
a,b = np.meshgrid(X,Y)

Output
a = ([
   [2,3,4],
   [2,3,4],
   [2,3,4]
])
b = ([
   [5,5,5],
   [6,6,6],
   [7,7,7]
])

np.empty is also handy if we just want to declare an array without initializing it. This can save some time. But using np.zeros is better.

What is Slicing?

Let’s talk about 1-D arrays. Slicing can be done to select range of elements. We can use negative integers to extract elements from the end of the array. Like x = a[-2]

Look at the following slice examples:

  • a[m:n] selects elements in the array starting at m and ending at (n-1).
  • a[m:n:2] selects elements in the array starting at m and ending at (n-1) in increments of 2.
  • a[::-1] selects all elements in reverse order.
  • a[-5:] selects last 5 elements.

Example :
In : a[1:-1:2]
Out : array([1, 3, 5, 7, 9])


Let’s talk about multi-dimensional array. In this case we can apply the slicing operation on each axis. Let’s use lambda function and apply it on a 6*6 array.

In : f = lambda m, n: n + 10 * m
In : a = np.fromfunction(f, (6, 6), dtype=int)
In : a
Out : array([ [ 0, 1, 2, 3, 4, 5],
[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55]
])

Look at the following slice examples:
a[:,1] gives us the second column i.e. [1,11,21,31,41,51]
a[2:,:2] gives ([
[20,21],
[30,31],
])

What is Reshaping and Resizing?

Sometimes rearranging arrays can be helpful. Like arranging a N*N matrix as a vector of size N^2.

Some of the functions that NumPy provides to reshape an ndarray are:

  • np.ndarray.flatten : Create a new 1-D array. Collapses all dimensions to just one.
  • np.reshape : Reshape an n-dim array.
  • np.squeeze : Removes axes with length 1.
  • np.ravel : Similar to flatten but modifies original.
  • np.transpose

Example

In : data = np.array([[1, 2], [3, 4]])
In : np.reshape(data, (1, 4)) // Creates a new array
Out : array([[1, 2, 3, 4]])
In : data.reshape(4) // Modifies existing one
Out : array([1, 2, 3, 4])

Generally the NumPy library gives two options – either modify existing array or create a new one.

Arithmetic Operations on Matrices

We can perform standard arithmetic operations on Matrices.

Ex
In : x = np.array([[1, 2], [3, 4]])
In : y = np.array([[5, 6], [7, 8]])
In : x + y
Out: array([[ 6, 8],
[10, 12]])

If we multiply a matrix with a scalar then it will apply to all the elements of the matrix.

When we apply an arithmetic operation then we get a new array. This can impact memory footprint and also degrade performance. It’s better to use inplace operation in such cases.

Like
x = x + y // uses __add__
x += y // uses __iadd__ which is an in place operator

Refer to – https://stackoverflow.com/questions/4772987/how-are-python-in-place-operator-functions-different-than-the-standard-operator

Similar behavior is observed for other mathematical operations like Trignometric, Logarithmic etc. Vectorized operations are applied and we get a new array as a result.

Aggregate Functions

We can perform aggregate operations using NumPy like taking sum of all array elements or finding the mean, median, standard deviation etc. We can specify the axis also along which operation needs to be performed.

Like a.sum(axis=0)

Set Operations

NumPy also provides the capability to work with a sets. A set is used to store unique, unordered elements. NumPy provides a set of functions to work on a set of elements.
Like np.unique can be used to get a set of unique elements.
a = np.unique([1, 2, 3, 3])
We can perform union, intersection etc. operations between two given sets.

Summary

NumPy is a core library for computing with Python that provides a foundation for nearly all computational libraries for Python. Familiarity with the NumPy library and its usage patterns is a fundamental skill for using Python for scientific and technical computing.