https://pandas.pydata.org/
https://pandas.pydata.org/docs/user_guide/10min.html
https://colab.research.google.com/notebooks/mlcc/intro_to_pandas.ipynb
https://www.w3schools.com/python/pandas/
If you already have Python, you can install Pandas with: ANACONDA conda install pandas PIP (in a virtual environment) pip install pandas APT (for the entire computer) Debian packages: python-pandas, python3-pandas, and dependencies.
'pandas' is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. 'pandas' aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
Main features:
The two primary data structures of pandas are Series (1-dimensional, a single column) and DataFrame (2-dimensional). A DataFrame contains one or more Series and a name for each Series.
DataFrame objects can be created by passing a dict mapping string column names to their respective Series. If the Series don't match in length, missing values are filled with special NA/NaN values.
import numpy as np import pandas as pd import matplotlib.pyplot as plt # for plots print(pd.__version__) # 0.23.3 in Debian 10
# pd.DataFrame # pd.Series # | | # o o # col1 col2 o-- columns described by keys # +------+------+ (default 0,1,...) # row1 | 'a1' | 11 | # +------+------+ # row2 | 'a2' | 22 | # +------+------+ # row3 | 'a3' | 33 | # o +------+------+ # | # rows described by indexes (default 0,1,...)
# Examples from Colab. city_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento']) # strings population = pd.Series([852469, 1015785, 485199]) # numbers (int) # Constructing DataFrame from a dictionary. df1 = pd.DataFrame({'City name': city_names, 'Population': population}) print(df1) # City name Population # column names are shown # 0 San Francisco 852469 # row indexes are default (0,1,...) # 1 San Jose 1015785 # 2 Sacramento 485199 # Access to Series using Python dict/list operations. print(df1['City name']) print(df1['Population']) print(df1.Population) # only if labels are proper Python identifiers print(df1['Population'][1]) # 1015785 print(df1.Population[1]) # 1015785 # New colums can be added (with new labels). df1['Area square miles'] = pd.Series([46.87, 176.53, 97.92]) # Calculations with numpy. print(np.log(population)) # return new series # 0 13.655892 # 1 13.831172 # 2 13.092314 # dtype: float64 # Transforming series. # population.apply(lambda item: item > 1000000) # return new boolean series print(population > 1000000) # numpy style (elementwise) # 0 False # 1 True # 2 False # dtype: bool # Indexes. print(df1.index) # RangeIndex(start=0, stop=3, step=1) # df1a = df1.reindex([2, 0, 1]) # changing order of rows # df1b = df1.reindex(np.random.permutation(df1.index)) # random order of rows