[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ next ]

# Some Mini-Howtos of Interest Chapter 7 - Python Tips and Templates

## 7.1 Creating a vector of random data

Created on April 12th, 2015.

The following snippet of Ipython code computes a vector, called vectorn, with 10 elements of normally distributed random data

import numpy as np

np.random.randn(10)
Out[2]:
array([-0.20490308, -0.39783301, -0.6802615 , -0.57939922, -0.10054472,
-0.20376277, -1.48068811,  0.98628113, -0.79514919, -0.09896364])

vectorn = np.random.randn(200)

## 7.2 Logical comparison between two boolean vectors

Created on May 06th, 2015.

The following snippet of Ipython code defines two vector, called vectorA and vectorB, with 20 elements of normally distributed random data each and, using the NumPy function logical_and check the occurrences where the corresponding elements of the two vectors are larger than zero

import numpy as np

vectorA = np.random.randn(20)
vectorB = np.random.randn(20)

boolvec = np.logical_and(vectorA > 0, vectorB > 0)

vectorA[boolvec]
Out[47]: array([ 0.39058535,  1.0062992 ])

vectorB[boolvec]
Out[48]: array([ 0.87795544,  0.59063525])

## 7.3 Creating a loop iterating on a list and the list index

Created on April 16th, 2015.

The following snippet of Ipython code uses as a starting point a a vector, called vn, with 10 elements of uniformly distributed random data in the interval \$[0,1)\$ and in a loop, using as iterator a pair index, value, we build a symmetric matrix A such that A_ij = vn_i*vn_j.

import numpy as np
##
vn = np.random.rand(20)
##
A_matrix = np.zeros((20,20))
##
for i, ival in enumerate(vn):
for j, jval in enumerate(vn):
A_matrix[i,j] = ival*jval

It can be easily optimized not computing the full matrix but the upper or lower diagonal and adding to its transpose (beware of double counting diagonal elements).

## 7.4 Adding a column to a Pandas dataframe

Created on April 21st, 2015.

If we have a Pandas dataframe for example, the following one, called df0

import numpy as np
import pandas as pd

vectorn = np.random.rand(20)
df0 = pd.DataFrame(data=vectorn, columns = ["s0"])

We can now add a second column of random data using the pd.Series command; the column is labeled s1

vectors = np.random.rand(20)
df0["s1"] = Series(data=vectors, index = index.df0)

## 7.5 Getting the maximum component of a vector and its index in numpy

Created on April 12th, 2015.

The following snippet of Ipython code computes a vector, called vectorn, with 200 elements of uniformly distributed random data in the interval \$[0,1)\$.

import numpy as np

vectorn = np.random.rand(200)

max_val, max_index = vectoru.max(), vectoru.argmax()

max_val
Out[87]: 0.99652709220203461

max_index
Out[88]: 117

## 7.6 Some easy examples of offset-aware times with pytz

Created on May 13th, 2015.

Dealing with timezones and the associated DST (daylight saving times) can cause a more than serious headache. Here we have some (very limited) examples of their use.

Let's assume that we have two strings: stdate1 = "20/03/2015 12:22" and stdate1 = "23/03/2015 22:22" and we want to parse them to a datetime object. This is done as follows:

from dateutil.parser import parse
stdate1 = "20/03/2015 12:22"
stdate2 = "23/03/2015 22:22"

date1 = parse(stdate1, dayfirst=True)

date2 = parse(stdate2, dayfirst=True)

date1
Out[6]: datetime.datetime(2015, 3, 20, 12, 22)

date2
Out[7]: datetime.datetime(2015, 3, 23, 22, 22)

date2-date1
Out[8]: datetime.timedelta(3, 36000)

At this point we have offset-naive times. If we want to transform to a given time zone, e.g. CET then we use

import pytz

cet_tz = pytz.timezone("CET")

cet_date1 = cet_tz.normalize(cet_tz.localize(date1))
cet_date2 = cet_tz.normalize(cet_tz.localize(date2))

cet_date1
Out[12]: datetime.datetime(2015, 3, 20, 12, 22, tzinfo=<DstTzInfo 'CET' CET+1:00:00 STD>)

cet_date2-cet_date1
Out[14]: datetime.timedelta(3, 36000)

We can now transform these time data to UTC

utc_tz = pytz.timezone('UTC')

utc_date1 = cet_date1.astimezone(utc_tz)
utc_date2 = cet_date2.astimezone(utc_tz)

utc_date2 - utc_date1
Out[22]: datetime.timedelta(3, 36000)

We can transform directly to UTC from the initially parsed variables

UTC_date1 = utc_tz.normalize(utc_tz.localize(date1))

UTC_date1
Out[28]: datetime.datetime(2015, 3, 20, 12, 22, tzinfo=<UTC>)

UTC_date1 - utc_date1
Out[27]: datetime.timedelta(0, 3600)

These functions can be applied on lists using lambda functions.

## 7.7 Creating a panel array of plots with Matplotlib

Created on April 12th, 2015.

The following snippet of code uses a vector of length 200 with random normally distributed data (see Creating a vector of random data, Section 7.1) and plot in four panels the data, their cumulative sum, a histogram with the data, and the sum of the data to a quadratic function.

import numpy as np
from matplotlib import pyplot

fig,axes = pyplot.subplots(2,2) # Define plot of 2x2 panels

axes[0,0].plot(vectorn,"k-o")
Out[36]: [<matplotlib.lines.Line2D at 0x7f827af2a510>]

axes[0,1].plot(vectorn.cumsum(),"k--")
Out[37]: [<matplotlib.lines.Line2D at 0x7f827af2a1d0>]

axes[1,0].hist(vectorn,bins=30,color="r",alpha=0.3)
Out[38]:
(array([  3.,   0.,   2.,   2.,   4.,   2.,   6.,   4.,  10.,  12.,   8.,
10.,  13.,  14.,  16.,  14.,   9.,  10.,   9.,  13.,   8.,   5.,
8.,   3.,   5.,   6.,   0.,   1.,   1.,   2.]),
array([-2.41379287, -2.24330459, -2.0728163 , -1.90232801, -1.73183972,
-1.56135143, -1.39086315, -1.22037486, -1.04988657, -0.87939828,
-0.70891   , -0.53842171, -0.36793342, -0.19744513, -0.02695684,
0.14353144,  0.31401973,  0.48450802,  0.65499631,  0.8254846 ,
0.99597288,  1.16646117,  1.33694946,  1.50743775,  1.67792603,
1.84841432,  2.01890261,  2.1893909 ,  2.35987919,  2.53036747,
2.70085576]),
<a list of 30 Patch objects>)

axes[1,1].scatter(np.arange(200),0.01*np.arange(200)**2+10*vectorn)
Out[39]: <matplotlib.collections.PathCollection at 0x7f827aedae90>

pyplot.show()

## 7.8 Creating a panel array of plots with common axes using Matplotlib

Created on April 12th, 2015.

The following snippet of code add different vectors of length 200 with random normally distributed data (see Creating a vector of random data, Section 7.1) to a parabollic function (mimicking experimental errors in a object free fall) and plot the results in four panels, with common abscyssa and ordinate axes, and controlling the spacing between the panels.

import numpy as np
from matplotlib import pyplot

vectorn = np.random.randn(100)

result1 = 0.5*9.8*time_grid**2 + 2*vectorn

result2 = 0.5*9.8*time_grid**2 + 4*vectorn

result3 = 0.5*9.8*time_grid**2 + 8*vectorn

result4 = 0.5*9.8*time_grid**2 + 16*vectorn

fig,axes = pyplot.subplots(2,2,sharex=True,sharey=True)

axes[0,0].plot(result1,"k-o")
Out[85]: [<matplotlib.lines.Line2D at 0x7f827aa7ae10>]

axes[0,1].plot(result2,"k-o")
Out[86]: [<matplotlib.lines.Line2D at 0x7f827aa7ae90>]

axes[1,0].plot(result3,"k-o")
Out[87]: [<matplotlib.lines.Line2D at 0x7f827aa7a4d0>]

axes[1,1].plot(result4,"k-o")
Out[88]: [<matplotlib.lines.Line2D at 0x7f827aaaff10>]

pyplot.show()

## 7.9 Combining several plots in a figure

Created on June 10th, 2015.

The following snippet of Ipython code computes three vectors, called vector1, vector2, and vector3, with 100 elements of normally distributed random data with the same mean value (2) and different standard deviations (0.1, 0.2, and 0.4). We then plot the three vectors in a single graph controlling the line styles and labels and ticks font sizes.

import numpy as np

meanval = 2
vector1 = np.random.normal(loc = meanval, scale = 0.2, size = 100)
vector2 = np.random.normal(loc = meanval, scale = 0.4, size = 100)
vector3 = np.random.normal(loc = meanval, scale = 0.8, size = 100)

ax = pyplot.subplot(111)

ax.plot(vector1,"o--b",lw=3)
ax.plot(vector2,"x:r",lw=2)
ax.plot(vector3,"g",lw=2)

ax.set_xlabel(r'X axis Label (a.u.)',fontsize=16)
ax.set_ylabel(r'Y axis Label \$v_1, v_2, v_3\$',fontsize = 16)
pyplot.setp(ax.get_xticklabels(), fontsize=14)
pyplot.setp(ax.get_yticklabels(), fontsize=14)

## 7.10 Display in Ipythonall the components of a PandasSeries or DataFrame

Created on January 29th, 2016.

Working in Ipython> only the head and tail of Series or DataFrames data structures are displayed. In order to display the full contents of DataFrames named e.g. exp_vals_H2 the following commands can be used.

pd.set_option('display.max_rows', len(exp_vals_H2))
print(exp_vals_H2)
pd.reset_option('display.max_rows')

If this is repeatedly needed a function can be defined as follows

def print_full(x):
pd.set_option('display.max_rows', len(x))
print(x)
pd.reset_option('display.max_rows')

## 7.11 Changing the number of lines to scroll in an Ipython qtconsole session.

Created on April 3rd, 2018.

Working in Ipython qtconsole the standard is that you can scroll up 500 lines. This number can be changed in some different ways (see references) but the simplest way may be in the program invocation. For example, to increase this number to 1000 lines one should execute:

ipython qtconsole --IPythonWidget.buffer_size=1000

### 7.11.1 References

[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ next ]

Some Mini-Howtos of Interest

Curro Perez-Bernal mailto:francisco.perez@dfaie.uhu.es