{"id":2284,"date":"2020-10-07T06:11:26","date_gmt":"2020-10-07T06:11:26","guid":{"rendered":"http:\/\/sadievrenseker.com\/wp\/?p=2284"},"modified":"2021-01-15T06:29:31","modified_gmt":"2021-01-15T06:29:31","slug":"ece-549-advanced-data-science","status":"publish","type":"post","link":"https:\/\/sadievrenseker.com\/?p=2284","title":{"rendered":"ECE 549 Advanced Data Science"},"content":{"rendered":"<p><strong>Antalya University<\/strong><\/p>\n<p><strong>Course Name: Advanced Data Science<\/strong> <strong>Fall 2020<\/strong><\/p>\n<p><strong>Course Code:<\/strong>\u00a0ECE 549<\/p>\n<p><strong>Language of Course:<\/strong>\u00a0English<\/p>\n<p><strong>Credit:<\/strong>\u00a03<\/p>\n<p><strong>Course Coordinator \/ Instructor:<\/strong>\u00a0\u015eadi Evren \u015eEKER<\/p>\n<p><strong>Contact:<\/strong>\u00a0intrds@sadievrenseker.com<\/p>\n<p><strong>Schedule:<\/strong>\u00a0Wed 10.00 &#8211; 13.00<\/p>\n<p><strong>Location<\/strong>: Course will be online, via Discord (for server link please contact\u00a0Elif Su Y\u0130\u011e\u0130T &lt;elifsu.yigit@optiwisdom.com&gt; )<\/p>\n<p><strong>Course Description: \u00a0<\/strong>This course is an introduction level course to data science, specialized on machine learning, artificial intelligence and big data.<\/p>\n<ul>\n<li>The course starts with a top down approach to data science projects. The first step is covering data science project management techniques and we follow\u00a0<strong>CRISP-DM<\/strong>\u00a0methodology with 6 steps below:<\/li>\n<\/ul>\n<ul>\n<li><strong>Business Understanding :<\/strong>\u00a0We cover the types of problems and business processes in real life<\/li>\n<\/ul>\n<ul>\n<li><strong>Data Understanding:<\/strong>\u00a0We cover the data types and data problems. We also try to visualize data to discover.<\/li>\n<\/ul>\n<ul>\n<li><strong>Data Preprocessing:\u00a0<\/strong>We cover the classical problems on data and also handling the problems like\u00a0<strong>noisy or dirty data and missing values.\u00a0<\/strong>Row or column\u00a0<strong>filtering, data integration with concatenation and joins<\/strong>. We cover the data transformation such as\u00a0<strong>discretization, normalization, or pivoting<\/strong>.<\/li>\n<\/ul>\n<ul>\n<li><strong>Machine Learning:<\/strong>\u00a0we cover the classification algorithms such as\u00a0<em>Naive Bayes, Decision Trees, Logistic Regression or K-NN.<\/em>\u00a0We also cover prediction \/\u00a0regression algorithms like linear regression, polynomial regression or decision tree regression. We also cover unsupervised learning problems like clustering and association rule learning with k-means or hierarchical clustering, and a priori algorithms. Finally we cover\u00a0<strong>ensemble techniques<\/strong>\u00a0in Knime and Python on Big Data Platforms.<\/li>\n<\/ul>\n<ul>\n<li><strong>Evaluation:\u00a0<\/strong>In the final step of data science, we study the metrics of success via Confusion Matrix, Precision, Recall, Sensitivity, Specificity for classification; purity , randindex for Clustering and\u00a0 rmse, rmae, mse, mae for Regression \/\u00a0Prediction problems with Knime and Python on Big Data Platforms.<\/li>\n<\/ul>\n<p><strong>Course Objective and Learning Outcomes:\u00a0<\/strong><\/p>\n<p>1.\u00a0\u00a0\u00a0\u00a0 Understanding of real life cases about data<\/p>\n<p>2.\u00a0\u00a0\u00a0\u00a0 Understanding of real life data related problems<\/p>\n<p>3.\u00a0\u00a0\u00a0\u00a0 Understanding of data analysis methodologies<\/p>\n<p>4.\u00a0\u00a0\u00a0\u00a0 Understanding of some basic data operations like: preprocessing, transformation or manipulation<\/p>\n<p>5.\u00a0\u00a0\u00a0\u00a0 Understanding of new technologies like bigdata, nosql, cloud computing<\/p>\n<p>6.\u00a0\u00a0\u00a0\u00a0 Ability to use some trending software in the industry<\/p>\n<p>7.\u00a0\u00a0\u00a0\u00a0 Introduction to data related problems and their applications<\/p>\n<p><strong>Tools:<\/strong><\/p>\n<p>List of course software:<\/p>\n<p>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Excel,<\/p>\n<p>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 KNIME,<\/p>\n<p>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Python Programming with Numpy, Pandas, SKLearn, StatsModel or DASK<\/p>\n<p>This course is following hands on experience in all the steps. So attendance with laptop computers is necessary. Also the software list above, will be provided during the course and the list is subject to updates.<\/p>\n<p><strong>Grading<\/strong><\/p>\n<p>One individual term project covering all the topics covered in the course : %100<\/p>\n<p><strong>Project Requirements :<\/strong><\/p>\n<p>You are free to select a project topic. The only requirement about the project is, you have to cover at least two topics from the following list and solve the same problem with two separate approaches from the list, you are also asked to compare your findings from these two alternative solutions :\u00a0KNN, SVM, XGBoost, LightGBM, CatBoost, Decision Trees, Random Forest, Linear Regression, Polynomial Regression, SVR, ARL (ARM), K-Means, DBSCAN, HC<\/p>\n<div id=\"attachment_2296\" style=\"width: 310px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/project_flow.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2296\" class=\"Sample Project Flow wp-image-2296 size-medium\" title=\"Sample Project Flow\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/project_flow-300x168.png\" alt=\"Sample Project Flow\" width=\"300\" height=\"168\" srcset=\"https:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/project_flow-300x168.png 300w, https:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/project_flow-768x431.png 768w, https:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/project_flow-1024x574.png 1024w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-2296\" class=\"wp-caption-text\">Sample Project Flow<\/p><\/div>\n<p><strong>Example project topic:<\/strong> you can search Kaggle for some idea about the projects, you can also find some good data sets from these web sites.<\/p>\n<p><strong>Project proposal :<\/strong> until Apr 30 : please explain your project idea and alternative solution approaches from the course content.<\/p>\n<p><strong>Project Deliverables:<\/strong> You are asked to submit the below items via mail until May 19, 2020.<\/p>\n<ol>\n<li>Presentation and Demo video: please shoot a video for your presentation and demo of your project.<\/li>\n<li>Project Presentation: slides you are using during the presentation<\/li>\n<li>Project Report : a detailed explanation of your approaches, the difficulties you have faced during the project implementation, comparison of your two alternative approaches to the same problem (from the perspectives of implementation difficulties, their success rates, running performances etc.), some critical parts of your algorithms. Also provide details about increasing the success of your approach. Please answer all of those questions in your project report: what did you do to solve the unbalanced data if you have in your problem? what did you do to solve missing values, dirty or noisy data problems? did you use dimension transformation like PCA or LDA, why? did you check the underfitting or overfitting possibility and how did you get rid of it? did you use any regularization? did you implement segmentation \/ clustering before the classification or prediction steps, why or why not? Which data science project management method did you use (e.g. SEMMA, CRISP-DM or KDD?) why did you pick this method? Which step was the most difficulty step and why? How did you optimize the parameters of your algorithms? What was the best parameters and why? how did you found these parameters and do you think you can use same parameters for the other data sets in the future for the same problem?<\/li>\n<li>Running Code or Project: you are free to implement your solution in any platform \/ language. The only requirement about your implementation is, you have to code the two alternative solution on the same platform \/ programming language (otherwise it will not be fair to compare them). Please also provide an installation manual for your platform and running your code.<\/li>\n<li>Interview: A personal interview will be held after the submissions. Each of you will be asked to provide a time slot of at least 30 minutes for your projects. During this time, you will be asked to connect via an online platform and show your running demo and answer the questions. Please also attach your available time slots to your submissions.<\/li>\n<\/ol>\n<p><strong>Project Policies:<\/strong> There will be no late submission policy. If you can solve a problem with only 1 approach, which also means you can not compare two approaches, will be graded with 35 points over 100 max. So, please push yourselves to submit two separate approaches for your problem. You are free to use any library during your projects, you are not allowed to use a library or any code on the internet or written by anybody else on the AI part of your project only. So, in other words, you have to write the two different AI module for your project with two different approaches from the course content and using somebodyelse\u2019s code in the AI module will get 0 as the final grade.<\/p>\n<p><strong>Course Content:<\/strong><\/p>\n<table class=\"wp-block-table\">\n<tbody>\n<tr>\n<td><strong>Week 1 : Introduction to Data, Problems and Real World Examples:<\/strong>Some useful information:DIKW Pyramid:\u00a0<a href=\"https:\/\/www.google.com.tr\/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=1&amp;cad=rja&amp;uact=8&amp;ved=0ahUKEwiIs6utyKDXAhXqIJoKHRIdDNAQFggnMAA&amp;url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FDIKW_pyramid&amp;usg=AOvVaw1ddCSlI29On5ZqRhf1vREE\">DIKW pyramid \u2013 Wikipedia<\/a>CRISP-DM:\u00a0<a href=\"https:\/\/www.google.com.tr\/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=4&amp;cad=rja&amp;uact=8&amp;ved=0ahUKEwjT37ecyKDXAhUoYpoKHVVtAlsQFgg3MAM&amp;url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FCross-industry_standard_process_for_data_mining&amp;usg=AOvVaw0_SytZPUTYDZLBCbanvkr0\">Cross-industry standard process for data mining \u2013 Wikipedia<\/a>Slides from first week:<a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2017\/11\/week1-1.pdf\">week1<\/a><\/td>\n<\/tr>\n<tr>\n<td><strong>Week 2 : Introduction to Descriptive Analytics<\/strong> Repeating the first week for majority of the class and starting the concept of end to end data science projects.<\/p>\n<p>Installation of Knime from (www.knime.com and a brief introduction document :\u00a0<a href=\"https:\/\/www.knime.com\/blog\/seven-things-to-do-after-installing-knime\">https:\/\/www.knime.com\/blog\/seven-things-to-do-after-installing-knime<\/a> )<\/p>\n<p>Weight and Heigh Sample project and Data Set for Knime work flow.<\/p>\n<p><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/first_abu_workflow.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2289\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/first_abu_workflow-300x195.png\" alt=\"\" width=\"300\" height=\"195\" srcset=\"https:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/first_abu_workflow-300x195.png 300w, https:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/first_abu_workflow-768x500.png 768w, https:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/first_abu_workflow-1024x667.png 1024w, https:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/first_abu_workflow.png 1708w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2020\/10\/first_ABU_project.knwf_.zip\">download first workflow<\/a><\/td>\n<\/tr>\n<tr>\n<td><strong>Week 3 : Introduction to Data Manipulation<\/strong> Concept of Data and types of data : Categorical (Nominal, Ordinal) and Numerical (Interval, Ratio). Basic Data Manipulation techniques with Knime: 1.Row Filter and Concept of Missing Values 2.Column Filter 3.Advanced Filters 4.Concatenate 5.Join 6. Group by , Aggregation 7. Formulas, String Replace 8. String Manipulation 9. Discrete, Quantized Data, Binning 10. Normalization 11.Splitting and Merging 12.Type Conversion (Numeric , String) <a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/03\/abu_preprocessing.knwf_.zip\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2169\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/03\/abu_knime_preprocessing-1024x693.png\" sizes=\"auto, (max-width: 555px) 100vw, 555px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/03\/abu_knime_preprocessing-1024x693.png 1024w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/03\/abu_knime_preprocessing-300x203.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/03\/abu_knime_preprocessing-768x520.png 768w\" alt=\"\" width=\"555\" height=\"375\" \/><\/a><\/td>\n<\/tr>\n<tr>\n<td><strong>Week 4 : Introduction to Python Programming for Data Science and an end-to-end Python application for data science<\/strong> Brief review of python programming Introduction to data manipulation libraries: NumPY and Pandas Introduction to the Sci-Kit Learn library and a sample classification You can install anaconda and Spyder from the link below: <a href=\"http:\/\/www.anaconda.org\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2176\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/03\/maxresdefault-3-300x169.jpg\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/03\/maxresdefault-3-300x169.jpg 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/03\/maxresdefault-3-768x432.jpg 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/03\/maxresdefault-3-1024x576.jpg 1024w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/03\/maxresdefault-3.jpg 1280w\" alt=\"\" width=\"300\" height=\"169\" \/><\/a>Also we have covered below topics during the class:<\/p>\n<ul>\n<li>Data loading from external source using Pandas library (with read_excel or read_csv methods)<\/li>\n<li>DataFrame slicing and dicing (using the iloc property and the lists provided to the iloc method)<\/li>\n<li>Column Filtering (with copying into a new data frame)<\/li>\n<li>Row Filtering (with copying into a new data frame)<\/li>\n<li>Advanced row filtering (like filtering the people with even number of heights)<\/li>\n<li>Column or row wise formula (we have calculated the BMI for everybody)<\/li>\n<li>Quantization (discretization or binning): where we have applied the condition based binning<\/li>\n<li>Min \u2013 Max Normalization (we have implemented MinMaxScaler from the SKLearn library)<\/li>\n<li>Group By operation (we have implemented the groupby method from pandas library)<\/li>\n<\/ul>\n<p><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/anaconda_preprocessing.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2177\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/anaconda_preprocessing-300x198.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/anaconda_preprocessing-300x198.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/anaconda_preprocessing-768x507.png 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/anaconda_preprocessing-1024x676.png 1024w\" alt=\"\" width=\"300\" height=\"198\" \/><\/a><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/03\/Archive-6.zip\">Click here to download the codes from the class<\/a> For further information I strongly suggest you to read the below documentations:<\/p>\n<ul>\n<li>Pandas Library :\u00a0<a href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/\">https:\/\/pandas.pydata.org\/pandas-docs\/stable\/<\/a><\/li>\n<li>Numpy Library :\u00a0<a href=\"http:\/\/www.numpy.org\">http:\/\/www.numpy.org<\/a><\/li>\n<li>SK Learn Library :\u00a0<a href=\"https:\/\/scikit-learn.org\/stable\/\">https:\/\/scikit-learn.org\/stable\/<\/a><\/li>\n<li>Pandas Data Frame (This is the main topic we have covered this week):\u00a0<a href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.DataFrame.html\">https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.DataFrame.html<\/a><\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<td><strong>Week 5 : Classification Algorithms<\/strong> concepts of classification algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: K-NN Naive Bayes Decision Tree Logistic Regression Support Vector Machines <a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu2ndweek.py_.zip\">2nd Python Code of the course for the classifications<\/a> <a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_classification.knwf_.zip\">Knime Workflow for the classification algorithms<\/a><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2182\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Mar-19-21-54-02-300x196.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Mar-19-21-54-02-300x196.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Mar-19-21-54-02-768x501.png 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Mar-19-21-54-02-1024x667.png 1024w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Mar-19-21-54-02.png 1562w\" alt=\"\" width=\"300\" height=\"196\" \/><\/td>\n<\/tr>\n<tr>\n<td><strong>Week 6: Regression Algorithms<\/strong> concepts of prediction algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: Linear Regression Polynomial Regression Support Vector Regressor Regression Trees and Decision Tree Regressor <a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Archive-3.zip\">Python code for the Regression<\/a><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_regression.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2196\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_regression-300x196.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_regression-300x196.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_regression-768x501.png 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_regression-1024x668.png 1024w\" alt=\"\" width=\"300\" height=\"196\" \/><\/a><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_stockexchange.knwf_.zip\">Knime Workflow and the BIST 100 data set for the Regression Algorithms\u00a0<\/a><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/stockexchange_bist_prediction.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2188\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/stockexchange_bist_prediction-300x213.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/stockexchange_bist_prediction-300x213.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/stockexchange_bist_prediction-768x546.png 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/stockexchange_bist_prediction-1024x728.png 1024w\" alt=\"\" width=\"300\" height=\"213\" \/><\/a>The Data Set obtained from : <a href=\"http:\/\/finance.yahoo.com\">finance.yahoo.com<\/a><\/td>\n<\/tr>\n<tr>\n<td><strong>Week 7 : Clustering Algorithms<\/strong> concepts of clustering algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: K-Means DBScan Hierarchical Clustering <a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_clustering.knwf_.zip\">Knime Workflow<\/a><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2192\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/clustering-300x197.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/clustering-300x197.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/clustering-768x504.png 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/clustering-1024x672.png 1024w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/clustering.png 1310w\" alt=\"\" width=\"300\" height=\"197\" \/><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_clustering.py_.zip\">Python Code<\/a><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2194\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_clustering_python-300x291.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_clustering_python-300x291.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_clustering_python-768x746.png 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_clustering_python-1024x994.png 1024w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_clustering_python.png 1318w\" alt=\"\" width=\"300\" height=\"291\" \/><\/td>\n<\/tr>\n<tr>\n<td><strong>Week 8 : Association Rule Mining<\/strong> concepts of association rule mining (ARM) and association rule learning (ARL) algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: A-Priori Algorithm <a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/apyori.py_.zip\">Click Here To Download Apyroiri Library for the Python Codes<\/a><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Archive-2.zip\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2201\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-10-05-28-46-1-300x173.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-10-05-28-46-1-300x173.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-10-05-28-46-1-768x442.png 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-10-05-28-46-1-1024x589.png 1024w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-10-05-28-46-1.png 1394w\" alt=\"\" width=\"300\" height=\"173\" \/><\/a><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Archive-2.zip\">click for python code\u00a0<\/a><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/KNIME_project17.knwf_.zip\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2202\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_association-300x172.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_association-300x172.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_association-768x441.png 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_association-1024x588.png 1024w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_association.png 1236w\" alt=\"\" width=\"300\" height=\"172\" \/><\/a><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/KNIME_project17.knwf_.zip\">click for knime workflow<\/a> <a href=\"https:\/\/www.kaggle.com\/c\/instacart-market-basket-analysis\">Homework : Link for Kaggle, instacart<\/a><\/td>\n<\/tr>\n<tr>\n<td><strong>Week 9 : Concept of Error and Evaluation Techniques<\/strong> n-Fold Cross Validation , LOO, Split Validation RMSE, MAE, R2 values for regression RandIndex, Silhouet, WCSS for clustering algorithms Accuracy, Recall, Precision, F-Score, F1-Score etc. for classification algorithms We also got an introduction to dimension reduction with PCA (principal component analysis) and Neural networks with MLP (multi layer perceptron) Please don\u2019t forget to install Keras for next week. <a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_rf.knwf_.zip\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2208\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-16-19-20-46-300x204.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-16-19-20-46-300x204.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-16-19-20-46-768x521.png 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-16-19-20-46-1024x695.png 1024w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-16-19-20-46.png 1580w\" alt=\"\" width=\"300\" height=\"204\" \/><\/a><a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_randomforest.py_.zip\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2209\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-16-19-20-03-300x159.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-16-19-20-03-300x159.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-16-19-20-03-768x407.png 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/Screenshot-at-Apr-16-19-20-03-1024x542.png 1024w\" alt=\"\" width=\"300\" height=\"159\" \/><\/a><\/td>\n<\/tr>\n<tr>\n<td><strong>Week 10 : Collective Learning <\/strong>:<\/p>\n<p>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><strong>Week 11 : Collective Learning and Consensus Learning and Clustering Algorithms:\u00a0<\/strong>Ensemble Learning, Bagging, Boosting Techniques, Random Forest, GBM, XGBoost, LightGBM Some links useful for the class:<\/p>\n<ul>\n<li>Understanding the Boosting with a simple Decision tree:\u00a0<a href=\"https:\/\/towardsdatascience.com\/boosting-algorithm-gbm-97737c63daa3\">https:\/\/towardsdatascience.com\/boosting-algorithm-gbm-97737c63daa3<\/a><\/li>\n<li>Simplified version of GBM coding and visualization:\u00a0<a href=\"https:\/\/medium.com\/mlreview\/gradient-boosting-from-scratch-1e317ae4587d\">https:\/\/medium.com\/mlreview\/gradient-boosting-from-scratch-1e317ae4587d<\/a><\/li>\n<li>Kaggle Entry for the same GBM story (Also holds the scratch codes of DecisionTree class):\u00a0<a href=\"https:\/\/www.kaggle.com\/grroverpr\/gradient-boosting-simplified\/\">https:\/\/www.kaggle.com\/grroverpr\/gradient-boosting-simplified\/<\/a><\/li>\n<li>If you are curious about the splitting point and the std_agg or var_split functions :\u00a0<a href=\"https:\/\/towardsdatascience.com\/random-forests-and-decision-trees-from-scratch-in-python-3e4fa5ae4249\">https:\/\/towardsdatascience.com\/random-forests-and-decision-trees-from-scratch-in-python-3e4fa5ae4249<\/a><\/li>\n<\/ul>\n<p>Readings and resources:<\/p>\n<ul>\n<li>XGBoost Algorithm :<a href=\"https:\/\/xgboost.ai\">\u00a0https:\/\/xgboost.ai<\/a><\/li>\n<li>The very early resource for the XGBoost:\u00a0<a href=\"http:\/\/xgboost.readthedocs.io\">xgboost.readthedocs.io<\/a><\/li>\n<\/ul>\n<p>Python Codes from the class : Gradient Boosting: <a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/abu_boosting.py_.zip\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2215\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/gbm_abu-300x161.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/gbm_abu-300x161.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/gbm_abu.png 640w\" alt=\"\" width=\"300\" height=\"161\" \/><\/a>XGBoost (for running the code install XGBoost by the command prompt: conda install -c conda-forge xgboost Install XGBoost extension for Knime <a href=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/knime_xgboost.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2218\" src=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/knime_xgboost-300x116.png\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" srcset=\"http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/knime_xgboost-300x116.png 300w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/knime_xgboost-768x297.png 768w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/knime_xgboost-1024x396.png 1024w, http:\/\/sadievrenseker.com\/wp-content\/uploads\/2019\/02\/knime_xgboost.png 1568w\" alt=\"\" width=\"300\" height=\"116\" \/><\/a><\/td>\n<\/tr>\n<tr>\n<td><strong>Week 12 : Project Presentations First Group. <\/strong>Presentations will be picked randomly during the class and anybody absent will be considered as not presented. Project Deliveries (until May 6): Project Presentation, Project Report (explaining your project, your approach and methodologies, difficulties you have faced, solutions you have found, results you have achieved in your projects, links to your data sources). Knime Workflows (in .knwf format) and python codes (in .py format). <strong>Please <\/strong>make all these files a single .zip or .rar archive and do not put more than 4 files in your archive.<\/td>\n<\/tr>\n<tr>\n<td><strong>Week 13 : Project Presentations Second Group<\/strong> If you haven missed the project presentations in the first week, please contact me for further details.<\/td>\n<\/tr>\n<tr>\n<td>Week 14( May 12): TBA<\/td>\n<\/tr>\n<tr>\n<td>Week 15( May 19): TBA<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n","protected":false},"excerpt":{"rendered":"<p>Antalya University Course Name: Advanced Data Science Fall 2020 Course Code:\u00a0ECE 549 Language of Course:\u00a0English Credit:\u00a03 Course Coordinator \/ Instructor:\u00a0\u015eadi Evren \u015eEKER Contact:\u00a0intrds@sadievrenseker.com Schedule:\u00a0Wed 10.00 &#8211; 13.00 Location: Course will be online, via Discord (for server link please contact\u00a0Elif Su Y\u0130\u011e\u0130T &lt;elifsu.yigit@optiwisdom.com&gt; ) Course Description: \u00a0This course is an introduction level course to data science, specialized on machine learning, artificial &hellip; <a href=\"https:\/\/sadievrenseker.com\/?p=2284\">Continue Reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2284","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/sadievrenseker.com\/index.php?rest_route=\/wp\/v2\/posts\/2284","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sadievrenseker.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sadievrenseker.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sadievrenseker.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sadievrenseker.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2284"}],"version-history":[{"count":5,"href":"https:\/\/sadievrenseker.com\/index.php?rest_route=\/wp\/v2\/posts\/2284\/revisions"}],"predecessor-version":[{"id":2297,"href":"https:\/\/sadievrenseker.com\/index.php?rest_route=\/wp\/v2\/posts\/2284\/revisions\/2297"}],"wp:attachment":[{"href":"https:\/\/sadievrenseker.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2284"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sadievrenseker.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2284"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sadievrenseker.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2284"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}