https://pandas.pydata.org/
https://pandas.pydata.org/docs/user_guide/10min.html
https://colab.research.google.com/notebooks/mlcc/intro_to_pandas.ipynb
https://www.w3schools.com/python/pandas/
If you already have Python, you can install Pandas with: ANACONDA conda install pandas PIP (in a virtual environment) pip install pandas APT (for the entire computer) Debian packages: python-pandas, python3-pandas, and dependencies.
'pandas' is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. 'pandas' aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
Main features:
The two primary data structures of pandas are Series (1-dimensional, a single column) and DataFrame (2-dimensional). A DataFrame contains one or more Series and a name for each Series.
DataFrame objects can be created by passing a dict mapping string column names to their respective Series. If the Series don't match in length, missing values are filled with special NA/NaN values.
import numpy as np import pandas as pd import matplotlib.pyplot as plt # for plots print(pd.__version__) # 0.23.3 in Debian 10
# pd.DataFrame # pd.Series # | | # o o # col1 col2 o-- columns described by keys # +------+------+ (default 0,1,...) # row1 | 'a1' | 11 | # +------+------+ # row2 | 'a2' | 22 | # +------+------+ # row3 | 'a3' | 33 | # o +------+------+ # | # rows described by indexes (default 0,1,...)
# Examples from Colab.
city_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento']) # strings
population = pd.Series([852469, 1015785, 485199]) # numbers (int)
# Constructing DataFrame from a dictionary.
df1 = pd.DataFrame({'City name': city_names, 'Population': population})
print(df1)
# City name Population # column names are shown
# 0 San Francisco 852469 # row indexes are default (0,1,...)
# 1 San Jose 1015785
# 2 Sacramento 485199
# Access to Series using Python dict/list operations.
print(df1['City name'])
print(df1['Population'])
print(df1.Population) # only if labels are proper Python identifiers
print(df1['Population'][1]) # 1015785
print(df1.Population[1]) # 1015785
# New colums can be added (with new labels).
df1['Area square miles'] = pd.Series([46.87, 176.53, 97.92])
# Calculations with numpy.
print(np.log(population)) # return new series
# 0 13.655892
# 1 13.831172
# 2 13.092314
# dtype: float64
# Transforming series.
# population.apply(lambda item: item > 1000000) # return new boolean series
print(population > 1000000) # numpy style (elementwise)
# 0 False
# 1 True
# 2 False
# dtype: bool
# Indexes.
print(df1.index) # RangeIndex(start=0, stop=3, step=1)
# df1a = df1.reindex([2, 0, 1]) # changing order of rows
# df1b = df1.reindex(np.random.permutation(df1.index)) # random order of rows