12 Of Python Practices For Better Functions
In this tutorial we are going to learn about 12 Of Python Practices For Better Functions.
Writing code can be a seriously difficult and challenging gig. This is especially the case when some problems are difficult to solve, and have multiple solutions. It is not always easy to keep your code in tip-top, perfect shape. This is especially the case when there are multiple solutions to one problem, some wrong and some correct.
The first tip that I have to writing your functions is to take a step back. Consider what your function actually does. Ultimately, the destiny of a function is a return, or to alter something. The question one should ask in this scenario is
|“ What will it take to get there?”|
Without a clear direction, a function simply does not make sense to write. First ask what do we need to get out of the function? This is our output. Next, what do we need to put into our function in order to get that result? And finally, we fill in the arithmetic in the middle. In some cases, it might even be valuable to start the function off with the return. Although the following example is relatively simple, it can certainly go to demonstrate the concept in a very basic way. Let us write a function to calculate the mean.
Firstly, let us determine our output. In this case, we of course want a calculation of the mean in return. Data like this is typically stored in a vector, or a list in Python, so we can presume this is our input:
def mean(x : list): return(mu)
Now how do we get from a list to the mean? Simply fill in the blanks and return the proper value: def mean(x : list):
mu = sum(x) / len(x) return(mu)
Another method to writing functions I can certainly vouch for is extraction. Extraction is an essential component to cleaning code. Extraction is simply creating more methods in order to handle multiple things inside of one function without said function being a scavenger hunt for different values that we need. Instead, we write a function to get those values. Functions should be simple and have short directives. This makes code better, as I have said in the past, more methods means better code! If you would like to read a full article all about extraction, I have one that I really enjoyed writing you may read here:
Prior to showing a simplified Python example, let us look at a real-world Julian example of one of my functions in order to see how I employ this technique:
function OddFrame(file_path::String) # Labels/Columns extensions = Dict("csv" => read_csv) extension = split(file_path, '.') labels, columns = extensions[extension](file_path) length_check(columns) name_check(labels) types, columns = read_types(columns) # Coldata coldata = generate_coldata(columns, types) # Head """dox""" head(x::Int64) = _head(labels, columns, coldata, x) head() = _head(labels, columns, coldata, 5) # Drop drop(x) = _drop(x, columns) drop(x::Symbol) = _drop(x, labels, columns, coldata) drop(x::String) = _drop(Symbol(x), labels, columns, coldata) dropna() = _dropna(columns) dtype(x::Symbol) = typeof(coldata[findall(x->x == x, labels)]) dtype(x::Symbol, y::Type) = _dtype(columns[findall(x->x == x, labels)], y) # type self = new(labels, columns, coldata, head, drop, dropna, dtype); select!(self) return(self); end
What you will likely notice about this function is that anything that cannot be done in less than three lines is extracted. If all of these functions were written into the same function, it would be incredibly long. Furthermore, it would be nearly impossible to keep track of everything that is going on step by step. Another significant problem one might run into is stack-tracing. It is going to be much more difficult to stack-trace a bug if that bug is contained within a function that is a mile long. Whenever we receive a stack-trace, we get each function, coming out of one another where the error occurred. That in mind, we can see the exact call in each function where the error occurred.
I understand; that Julian example was a little much, especially for those who write Python. Let us create a normalization function (no, not a class) that will utilize the extraction method in a far simpler way in order to make this concept easier to understand. Side note, of course these functions are all available in libraries we could import, but this is of course not the point — this is merely an example to implement your own functions with.
from numpy import sqrt def norm(x : list): mu = sum(x) / len(x) x2 = [(i-mu) ** 2 for i in x] m = sum(x2) / len(x2) std = sqrt(m) return([(i - mu) / std for i in x])
Let us start with line one. Given that this package is calculating the norm of this data, we could assume that this package is going to be statistics-oriented. That being said, we are probably going to be using the mean a lot more than just in this function. Even though this is a one-line operation, we are likely to use it more — and just as well, it is always better to keep operations inside of a main function like this to a minimum, and primarily call other functions. Of course, this is not a terrible example, but let us see what follows.
Next, we go to calculate x², which is xbar² for xbar in x. We see that this is simply a value that we need in order to get our standard deviation later in the function. After we get this value, we then write out the arithmetic — and repeat the same exact code — in order to calculate the mean. Finally, we get our standard deviation and then return our normally distributed data. This method could most certainly use some extraction. Instead of putting the whole standard deviation’s arithmetic inside of this function, we will make a method call for it. We will also be doing the same for mean, which will work together to bring this function down to just a measly three lines.
def mean(x : list): return(sum(x) / len(x)) def std(x : list): mu = sum(x) / len(x) x2 = [(i-mu) ** 2 for i in x] m = sum(x2) / len(x2) return(sqrt(m)) def betternorm(x : list): mu = mean(x) st = std(x) return([(i - mu) / st for i in x])
The only downside to this version is that the mean is calculated twice, once inside of the scope or betternorm(), and once inside the scope of std(). There is definitely a way one could fix this — but although there is a trade off — this is such a little performance cost, and this is going to be far more acceptable code to essentially anyone.
An often neglected importance of functions is the name given to a function. Names are important, because in most cases they should tell you the output of the function. For example, my function called mean had the output. From there, the docstrings are merely used to determine how the input is meant to be formatted, and what sort of types we can expect to pass in said input. For example, let us say the following function makes “ Jerry” eat a pickle:
def pickle(n_pickles : int): pass
The function name “ pickle” is not very specific. This is important, because often in a work setting you might be working with others. And who knows who is getting fed this pickle with this method? Instead, the method should be named something like
def jerry_eat_pickle(n_pickles : int): pass
Of course, this is merely an example, a silly one at that — but I think the point here is quite obvious. Functions need to be named with convenient names. I would even say it would be nice for people to be able to guess what your function is called. For instance, you want to merge to Pandas dictionaries, in your mind you use that word. The function to do so is not called mer() or m(). It just makes sense and only takes a second. Another aspect of this name debate is to follow convention, Python uses all lower-case method names — you do not want to be using capitals in your work for such a thing. Capitalized aliases are meant to be reserved for types.
Another aspect of naming is also naming particular parts of one’s function. In my Julia example with the extraction method, you see there is no over-commenting — which is one of my pet peeves. Of course, it is one thing if you are trying to explain something step by step, but comments like
# multiply 5 by x 5 * x
really grind my gears! Anyway, I labeled these individual sections of my function because they all serve a purpose towards the return, which in that particular case was a type. This is not to say you need to leave a comment for each section of your function, though it is helpful — for example someone did some work on the head function in my OddFrame type and wants to alter the way it is constructed into the type, this is relatively easy for them to find. That being said, I think that naming also has the benefit of making programmers group things together. This is far better for those who are reading code, as they only need to read one thing at a time.
As I touched on prior, repeating yourself is bad. Of course, in many instances this can be helped with extraction, but regardless you may find circumstances where you must rewrite the same code over and over again. This can be problematic. That being said, it is probably a good idea to find some sort of way to not repeat the code that you write over and over again.
Let us not forget that modules are loaded into memory and/or cache. This means that each time a module is loaded, each character of that module is loaded into some sort of memory. That in mind, it is important to keep repetition to a minimum. This is just a great way to write better code that will run better and work better.
Less code is always superior to more code. Usually, less code will also have a significant benefit to performance costs. That being said, having less code will also always be easier to read, which is actually more important than performance in many situations. Let us just consider that the language we are writing here is Python. Python is not necessarily revered for its speed. However, the language is well-known for being relatively easy and beginner-friendly.
That being said, since the world of Data Science currently revolves around Python, and this article does as well, performance can take a hit for more readability. Even so, I think most of the time more concise solutions end up being faster anyway.
A huge mistake that many beginners make when they start to develop their first Python modules and functions is not restricting the types of their arguments. There are so many compelling reasons why one might want to restrict the types, but let us start with the most basic. Consider the following function:
def add5(x): return(x + 5)
If we were to pass a string through this, we would get an error, like so:
While some users might be able to quickly distinguish this error and change that string to an integer, to others it might be confusing. This is especially so given the fact that the x is listed first. This means that the throw is thinking that we are providing the wrong type to the addition of our string, not our integer. Therefore, some could read this and say “ but I am providing a string!” Sure, I bad assumption, but ultimately you can avoid an entire Github issue with a two-sentence answer by just making sure that the end user knows to pass an integer, and can identify this easier. While this might not be as much the case in Python, in other languages (especially my one and only, Julia,) setting the type of your arguments is very important!
Another reason you might want to set your arguments’ types is so that the interpreter is aware of the type that you might be working with in the function prior to the type ever being passed. This can be helpful to the interpreter, which could potentially speed up your software!
One of my favorite things on Earth to read, apparently, is documentation. This is because I spend nearly a quarter of my life doing just that. And with said documentation comes doc strings. Doc strings are absolutely vital — you simply cannot operate effectively without them. As a human, you are bound to forget things. There is no way you could remember what a module of 10,000 lines of code spread across 10 files does in every individual capacity.
That in mind, not only are doc-strings important for end-users, but also developers. When working in collaboration, these are going to be essential to writing a function that the entire team can work on and with throughout all of your code. Next time you write a function, do not leave the function’s head lonely:
def add5(x: int): return(x + 5) Instead, document! """Adds five to an integer""" def add5(x: int): return(x + 5)
Another common rookie mistake in the programming world is too much nesting. Nesting is a term that is used anytime there is a new level of scope declared inside of a given piece of software. For example, in Julia, we write “ module” to start a module, and this creates a first layer of nesting. Our scope is no longer global, now we are operating inside of a lower scope. We declare a function inside of this module, and thus there is another layer.
Usually, however, whenever programmers talk about nesting in a negative context, however, they are referring to nested looping and nested conditionals. Each for loop and conditional respectively has their own scope, an example of a nested for loop would be the following:
for i in range(1, 5): for i in range(1, 3): print(i)
This for loop is now going to get called everytime that the loop it is nested in is called. Needless to say, this can be incredibly problematic for the performance of our software. That being said, there are times that this is essentially unavoidable — but that is the only time it should be used! This is my recommendation, only write a nested for loop for
- solutions where every other option is exhausted
An underrated part of the Python programming language in my opinion is the ability to use Python decorators to completely alter the way a class works. These are so ridiculously underrated! Who would have thought one could completely change the nature of their code or improve the efficiency and speed of code just by adding a simple decorator to the top of their function?!
In order to make the most out of your functions, AND classes, one should definitely look into decorators when programming in Python! Decorators are incredibly easy to use and so fantastically effective at all kinds of things from improving the speed of your code to completely changing the paradigm of the Python language.
One mistake I see a lot, which I did touch on earlier is just how ridiculous some programmers make their commenting. If you have a code file that is 500 lines of code and 500 lines of comments, you are probably using comments wrong. Avoid making obvious remarks, such as:
# get parameters parameters = get_params(data)
First of all, the variable here we are assigning is named parameters, so I think that most could likely assume that we are getting our parameters. Second of all, even if the variable had a random name, the function is called get_params. If people could not read a simple assertion, they probably will not be reading your code! It looks so ugly when a comment is used between every line of code, and it is a terrible practice — that also does not follow PEP.
Another great tip for writing functions is not to write a function at all. Instead, you can write an expression using the lambda key-word. This will effectively turn your Python function into a 1-line statement that can be mapped to arrays. This is far superior — and much more Pythonic than your typical iterative methods of doing so. It is incredibly easy to use, and can be done just like this example, with the mean function we wrote earlier:
mean = lambda x: sum(x) / len(x)
Now we can call it as if we wrote out the function:
mean([5, 10, 15]) 10
Of course, this mere example only shows how lambda can be used to one-line a function. The real power comes when these expressions are used with other functions.
My final programming tip when writing Python function is to avoid key-word arguments. While of course I do not believe that key-word arguments are bad, and obviously there is a reason why we have them. They are incredibly useful, especially for things like parameters we use for plotting or machine-learning software. Needless to say, they have their place.