Last month, I wrote an article on the creation of a roadmap on the science of data using free courses proposed by the Massachusetts Technological Institute.
Nevertheless, the main attention in most courses that I listed was purely theoretical, and much attention was paid to the study of mathematics and statistics underlying machine learning algorithms.
While the roadmap of the Massachusetts Institute of Technology will help you understand the principles of forecast modeling, but they don’t have enough practices on real data science projects.
Having spent some time searching on the Internet, I found a couple of free Harvard courses that covered the entire workflow of data science - from programming to data analysis, statistics and machine learning.
After you complete all the courses of this training path, you will also be provided with the final project that will allow you to apply everything you learned in practice.
In this article I will list 9 Harvard Free Harvard courses that you can go to study the science of data from scratch. You can skip any of these courses if you already have knowledge on this subject.
The first step that you must take when studying the science of data is to learn how to program. You can do this using the programming language you have chosen - ideally Python or R.
If you want to study R, Harvard offers an introductory course on R, created specifically for the study of the science of data called Data Science: R Basics.
This kuos will introduce you to such concepts R as variables, data types, vector arithmetic and indexing. You will also learn to process data using libraries such as DPLYR and create graphs for data visualization.
If you prefer Python, you can take the course “Introduction to Python programming” from CS50, offered by Harvard free. In this course you will study such concepts as functions, arguments, variables, data types, conditional operators, cycles, objects, methods and much more.
Both of the above programs are intended for independent study. However, the Python course is more detailed than the R program and requires more time to pass it. In addition, the rest of the courses in this roadmap are taught on R, therefore, perhaps, it is worth studying R.
Visualization is one of the most powerful methods by which you can translate your conclusions in the form of data to another person.
Using a Harvard data visualization program, you will learn to create visualization using the GGPLOT2 library in R, as well as the principles of transmission of information based on data.
In this course, you will study the basic concepts of probabilities that are fundamental for conducting statistical tests on. The taught topics include random values, independence, modeling by Monte Carlo, expected values, standard errors and a central maximum theorem.
The concepts above will be presented using a thematic study, which means that you can apply everything you have learned to a real set of data from the real world.
Having studied the probability, you can go through this course to study the basics of statistical output and modeling.
This program will teach you to determine the assessments of the population and the permissible error in statistical assessments, introduce you to Bayesian statistics and teach you the basics of forecast modeling.
I included this project management course as optional, since it is not directly related to the study of the science of data. Rather, you will be taught to use UNIX/Linux to manage files, GITHUB, control versions and create reports in R.
The ability to make the above will save you a lot of time and will help to better manage comprehensive projects on the science of data.
Data Science: Productivity Tools
The next course in this list is called “data processing” and will teach you how to prepare data and transform it into a format for machine learning models.
You will learn how to import data in R, streamline data, process string data, analyze HTML, work with date and time objects and analyze the text.
As a specialist, according to the data, you often have to extract data that is in the public domain in the form of a PDF document, HTML or tweet web page. You will not always present clean, formatted data in the CSV or Excel sheet.
By the end of this course, you will learn how to analyze and clean the data in order to extract important features from them.
Linear regression is a machine learning method that is used to model linear dependence between two or more variables. It can also be used to identify and adjust the influence of mixed variables.
This course will teach you the theory underlying linear regression models, how to explore the relationship between the two variables and how to detect or remove unnecessary signs before building an algorithm of machine learning.
Finally, the course that you probably waited for! The Harvard Machine Learning Program will teach you the basics of machine learning, methods for preventing retraining, approaches to modeling with a teacher and without a teacher, as well as recommendations.
After passing all of the above courses, you can take part in the Harvard’s final project on the science of data where your data visualization skills will be evaluated, probability, statistics, data processing, data organization, regression and machine learning.
With this final project, you will get the opportunity to bring together all the knowledge gained in the above courses, and get the opportunity to implement a practical project on the science of data from scratch.
Note. All of the above courses are available on the EDX online education platform and can be listened to for free. However, if you want to receive a certificate of passing the course, you will have to pay for it.
#python #Programming #Technology #Deeuplearning #coding #bigData
#machinelearning #artificialintelligence #ai #datascience