Python Tutorial
We’ll be getting started with some python material this week. It’ll help feed into the work you do in COMP704. Python Basics Python is a duck-typed language, and is strongly typed. This means that although variables don’t have types, like you might be familiar with in other languages, values do. I’m not going to go into a great deal of python basics in this session, you are masters students and the getting started with python materials at the bottom of this guide are very well written.
We’ll be getting started with some python material this week. It’ll help feed into the work you do in COMP704.
Python Basics
Python is a duck-typed language, and is strongly typed. This means that although variables don’t have types, like you might be familiar with in other languages, values do.
I’m not going to go into a great deal of python basics in this session, you are masters students and the getting started with python materials at the bottom of this guide are very well written. Instead, I’m going to point out some python features which are less well known, but will be useful for keeping your code clean.
Typing
For example, take the following expression:
>>> "hello" + 42
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "int") to str
This also holds true if the values are stored in variables:
>>> a = "hello"
>>> b = 42
>>> a + b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "int") to str
>>> type(a)
<class 'str'>
Type Annotations
Note
|
Modern versions of python 3 do support type/hints annotations, these allow a programmer to specify the types used, and although these can be checked, these annotations won’t be enforced by default |
This is a function that takes a string argument called name, and returns an int.
def myFunc(name: str) -> int:
return 42
You can learn more about type annotations in the Python Guide
Data Classes
A standard library feature, which makes good use of this annotation feature, are python dataclasses, which are very useful for keeping track of structured data (similar to a struct in C):
from dataclasses import dataclass
@dataclass
class InventoryItem:
"""Class for keeping track of an item in inventory."""
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
Batteries included
Python has an extensive standard library, the following may prove to be very useful to you. Here are a few of the ones I use a lot in my day to day data processing activities:
-
collections provide additional data types that might be useful to help organise your code, including defaultdict and counter
-
CSV provides CSV support, and can automatically escape values if required. This also allows easy storing and retriving of CSV lines from dictionaries.
-
Subprocess allows running of commands, it is a little unsual, but can be very useful as part of automated pipelines (eg, building code, or running scripts)
-
Pathlib is similar to std::filesystem in C++, and provides an OO interface for dealing with folders and folders
-
OS allows access to operating system features, most usefully environment variables.
-
Logging provides a way to output structured logging infomation, useful for ensuring the code does what you expect and making errors or issues easy to spot.
Libraries
Python has a range of data processing libraries will will examine in COMP704. Here are a few if you want to read ahead.
-
Pandas - graphing and data processing (along with seaborne)
-
scikit-learn (Some more traditional ML algorithms)
-
Stable Baselines A bunch of RL algorithms
Tooling
Python’s large collection of modules can quickly make code hard to maintain, as detailing these requirements is often forgotton. I recommend a tool like pipenv
or a managed virtual environment (such as those present in pycharm) to avoid these issues.
Pipenv
pip install --user pipenv (possibly pip.exe, or python.exe -m pip rather than pip depending on setup)
# go to directory
pipenv install packagename
pipenv run ./main.py
Note
|
it is important to ensure that any requirements you depend on are known, do not assume that someone else trying to run or mark your code will have these packages installed (as a minimum a requirements.txt file should be provided). |
Script structure
Speaking of main.py
it’s worth talking about the structure of a python script. Python scripts run from start to finish, if you intend to be able to import a module, as well as run it as a script, you should use the following structure:
#! /usr/bin/env python3
##
# Description of script
##
def main():
pass
# this will ensure that code will not run unless run as a script (eg, from the cli)
if __name__ == "__main__":
main()
Folders as packages
If you want to be able re-use python code, you should structure your code as a module:
myproject/
__init__.py # this tells python this is a module
main.py
mylib.py
You can then use import mylib
or from . import mylib
from inside main.py, to make use of your library functions. This can help keep your code clean.
For more infomation, see Package documentation
Structure
Here are few tips on code structure:
Docblock comments
It’s common to use a form of multi-line strings to comment methods, consider the following example:
def get_spreadsheet_cols(file_loc, print_cols=False):
"""Gets and prints the spreadsheet's header columns
Parameters
----------
file_loc : str
The file location of the spreadsheet
print_cols : bool, optional
A flag used to print the columns to the console (default is
False)
Returns
-------
list
a list of strings used that are the header columns
"""
file_data = pd.read_excel(file_loc)
col_headers = list(file_data.columns.values)
if print_cols:
print("\n".join(col_headers))
return col_headers
see This realpython guide, or PEP 257 for more infomation on these conventions.
Code formatting
Python has a well-defined formatting rules for source code, documented as PEP 8, your python code should comply with these language convetions. There are many linters which can check this (for example, pylint). There are also automated code formatters, such as black (while not completely pep8 complient, is another de-facto standard for python source formatting).
In addition, you can configure your version control system to automatically check and apply formatting rules using hooks.
Further Reading
Last updated 2023-01-06