Python is a simple programming language, and its most used in Machine Learning, due to its tools availability like numpy library that can do complex mathematical operations in seconds. We can use other programming languages like java, c++ but python is more of a industry standard, and preferred by Data scientists to work on big data.
We don’t need to deep dive into learning every bit of python programming, but just some basic fundamental topics given below, and some important libraries like pandas, numpy, matplotlib or seaborn, etc. and how to define functions and importing packages in our program.
As we work on datasets (of csv, xls/xlsx , json format), we need to know how to manipulate them using libraries like pandas.
How much python is actually required for machine learning projects ?
It consists of 2 parts -
1) Python programming fundamentals
It means following topics :
Interactive mode vs Shell mode , indentation rules in python
Tokens, Identifiers ,keywords, literals, variables, operators , delimiters in python
Input, output, comments and Type casting in python
Data types in python
Conditional statements (if, if-else, if-elif-else, nested conditionals) and loops(for, while, nested loops), ternary operator
Functions (user-defined & built-in functions; calling a function; arguments & parameters; scope of variable-global, local; lambda function; module-grouping of functions; built-in modules like math, random, statistics)
OOPs in python (classes, objects ; polymorphism; inheritance; abstraction (abstract classes & methods); encapsulation; iterators)
Python Packages(creating package, _init_.py) ; Modules(creating python module, importing modules & different methods of importing a module) & Libraries(python standard library -pandas, numpy, matplotlib or seaborn, scikit-learn, Tensorflow, etc)
These may seem a little much, but we just need to be familiar with above 1,2,3,7 concepts(only overview is needed), but 4,5,6,8 topics need to be practiced in detail so that we can “Analyze datasets”, which is a part of machine learning process.
Resources for learning above topics
For 1, 2,3,4(overview of data types only), i am attaching a pdf in this linkedin post, that contains all of these topics - linkedin.com/posts/david-singh-96830324a_py..
For 4, 5, 6, 8 in detail; refer to this web book(free)- “Python for data analysis by Wes Mckinney(primary creator of pandas library)- https://wesmckinney.com/book/
Python tutorial detailed docs.python.org/3/tutorial
For brief overview of 8th point libraries(pandas, numpy, tensorflow, keras, matplotlib/seaborn, pytorch) use following links-
numpy.org/doc/stable/user/quickstart.html
https://pandas.pydata.org/docs/user_guide/10min.html
https://www.tensorflow.org/learn
https://keras.io/getting_started/
https://matplotlib.org/stable/users/explain/quick_start.html
https://pytorch.org/docs/stable/index.htmlTo learn in detail about above libraries, follow below links (will update later)
For Tensorflow, scikit-learn, keras in detail, best book is - https://github.com/Akramz/Hands-on-Machine-Learning-with-Scikit-Learn-Keras-and-TensorFlow
-
2) Specific python libraries
As we discussed above, we need to learn how to use python libraries like pandas, numpy, matplotlib, scikit-learn, etc for
Data preprocessing
Data cleaning
Data visualization
And for this we need to learn above mentioned libraries.
Hence, this much python is required for machine learning projects.