CS 447 Introduction to Data Science

Antalya University

Course Name: Introduction to Data Science Fall 2022

Course Code: CS 447

Language of Course: English

Credit: 3

Course Coordinator / Instructor: Şadi Evren ŞEKER

Contact: intrds@sadievrenseker.com

Schedule: Firday 11.00 – 13.00

Course Description:  This course is an introduction level course to data science, specialized on machine learning, artificial intelligence and big data.

  • The course starts with a top down approach to data science projects. The first step is covering data science project management techniques and we follow CRISP-DM methodology with 6 steps below:
  • Business Understanding : We cover the types of problems and business processes in real life
  • Data Understanding: We cover the data types and data problems. We also try to visualize data to discover.
  • Data Preprocessing: We cover the classical problems on data and also handling the problems like noisy or dirty data and missing values. Row or column filtering, data integration with concatenation and joins. We cover the data transformation such as discretization, normalization, or pivoting.
  • Machine Learning: we cover the classification algorithms such as Naive Bayes, Decision Trees, Logistic Regression or K-NN. We also cover prediction / regression algorithms like linear regression, polynomial regression or decision tree regression. We also cover unsupervised learning problems like clustering and association rule learning with k-means or hierarchical clustering, and a priori algorithms. Finally we cover ensemble techniques in Knime and Python on Big Data Platforms.
  • Evaluation: In the final step of data science, we study the metrics of success via Confusion Matrix, Precision, Recall, Sensitivity, Specificity for classification; purity , randindex for Clustering and  rmse, rmae, mse, mae for Regression / Prediction problems with Knime and Python on Big Data Platforms.

Course Objective and Learning Outcomes: 

1.     Understanding of real life cases about data

2.     Understanding of real life data related problems

3.     Understanding of data analysis methodologies

4.     Understanding of some basic data operations like: preprocessing, transformation or manipulation

5.     Understanding of new technologies like bigdata, nosql, cloud computing

6.     Ability to use some trending software in the industry

7.     Introduction to data related problems and their applications

Tools: List of course software: ·       Excel, ·       KNIME, ·       Python Programming with Numpy, Pandas, SKLearn, StatsModel or DASK This course is following hands on experience in all the steps. So attendance with laptop computers is necessary. Also the software list above, will be provided during the course and the list is subject to updates.

Grading One individual term project and one individual homework track covering all the topics covered in the course : 50%, Homeworks submitted each week : 50% (duration of homework submission is 1 week until the starting time of the class (11.00 a.m), after Friday 11.00 a.m. the submission will be considered as late submission and will not be taken into grading, any attempt against code of honor will result disciplinary actions and fail from the class)

Project Requirements : You are free to select a project topic. The only requirement about the project is, you have to cover at least two topics from the following list and solve the same problem with two separate approaches from the list, you are also asked to compare your findings from these two alternative solutions : KNN, SVM, XGBoost, LightGBM, CatBoost, Decision Trees, Random Forest, Linear Regression, Polynomial Regression, SVR, ARL (ARM), K-Means, DBSCAN, HC

Example project topic: you can search Kaggle for some idea about the projects, you can also find some good data sets from these web sites.

Project proposal : until Apr 30 : please explain your project idea and alternative solution approaches from the course content, together with your data set and outcomes you plan to achieve. Send it in an e-mail with project proposal subject. Your project is very important for the course and possible problems might be related to the data set, algorithms, approaches or your purposes on the project. No late submissions or submissions with misinformation will be replied, so take the risk by your own.

Project Deliverables: You are asked to submit the below items via mail until Dec 22, 2022.

  1. Presentation and Demo video: please shoot a video for your presentation and demo of your project.
  2. Project Presentation: slides you are using during the presentation
  3. Project Report : a detailed explanation of your approaches, the difficulties you have faced during the project implementation, comparison of your two alternative approaches to the same problem (from the perspectives of implementation difficulties, their success rates, running performances etc.), some critical parts of your algorithms. Also provide details about increasing the success of your approach. Please answer all of those questions in your project report: what did you do to solve the unbalanced data if you have in your problem? what did you do to solve missing values, dirty or noisy data problems? did you use dimension transformation like PCA or LDA, why? did you check the underfitting or overfitting possibility and how did you get rid of it? did you use any regularization? did you implement segmentation / clustering before the classification or prediction steps, why or why not? Which data science project management method did you use (e.g. SEMMA, CRISP-DM or KDD?) why did you pick this method? Which step was the most difficulty step and why? How did you optimize the parameters of your algorithms? What was the best parameters and why? how did you found these parameters and do you think you can use same parameters for the other data sets in the future for the same problem?
  4. Running Code or Project: you are free to implement your solution in any platform / language. The only requirement about your implementation is, you have to code the two alternative solution on the same platform / programming language (otherwise it will not be fair to compare them). Please also provide an installation manual for your platform and running your code.
  5. Interview: A personal interview will be held after the submissions. Each of you will be asked to provide a time slot of at least 30 minutes for your projects. During this time, you will be asked to connect via an online platform and show your running demo and answer the questions. Please also attach your available time slots to your submissions.

Project Policies: There will be no late submission policy. If you can solve a problem with only 1 approach, which also means you can not compare two approaches, will be graded with 35 points over 100 max. So, please push yourselves to submit two separate approaches for your problem. You are free to use any library during your projects, you are not allowed to use a library or any code on the internet or written by anybody else on the AI part of your project only. So, in other words, you have to write the two different AI module for your project with two different approaches from the course content and using somebodyelse’s code in the AI module will get 0 as the final grade.

Course Content:

Week 1 (Sep 30): Introduction to Data, Problems and Real World Examples:Some useful information:DIKW Pyramid: DIKW pyramid – WikipediaCRISP-DM: Cross-industry standard process for data mining – Wikipedia Slides from first week:week1

Install Anaconda until next class from anaconda.org :

Week 2 (Oct 7): Introduction to Data Manipulation and Data Prepration : Introduction to python, numpy, pandas libraries, some basic operations for :

    1. Row Filter and Concept of Missing Values
    2. Column Filter
    3. Advanced Filters
    4. Concatenate
    5. Join
    6. Group by , Aggregation
    7. Formulas, String Replace
    8. String Manipulation
    9. Discrete, Quantized Data, Binning
    10. Normalization

Codes in python :

Click here to download codes and data file. 

Some useful links we referred during the class:

  • https://docs.python.org/3/tutorial/index.html
  • https://pandas.pydata.org/docs/reference/index.html
  • https://scikit-learn.org/stable/
  • https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html

Also useful resources from last years:

Introduction to Python Programming for Data Science and an end-to-end Python application for data science Brief review of python programming Introduction to data manipulation libraries: NumPY and Pandas Introduction to the Sci-Kit Learn library and a sample classification You can install anaconda and Spyder from the link below: Also we have covered below topics during the class:

  • Data loading from external source using Pandas library (with read_excel or read_csv methods)
  • DataFrame slicing and dicing (using the iloc property and the lists provided to the iloc method)
  • Column Filtering (with copying into a new data frame)
  • Row Filtering (with copying into a new data frame)
  • Advanced row filtering (like filtering the people with even number of heights)
  • Column or row wise formula (we have calculated the BMI for everybody)
  • Quantization (discretization or binning): where we have applied the condition based binning
  • Min – Max Normalization (we have implemented MinMaxScaler from the SKLearn library)
  • Group By operation (we have implemented the groupby method from pandas library)

Click here to download the codes from the class For further information I strongly suggest you to read the below documentations:

Homework 2 : create a new excel (or CSV) file for an imaginary classroom and put your own unique data where the columns will be studentID, name, midterm and final grades of students. The data file should contain at least 20 rows without missing data. Create another excel (or CSV) file and put studentID and project columns and fill the imaginary project grades for the students. Complete the below steps for your first homework:

  1. Create 2 dataframes for each of the files.
  2. Join both files into a single data frame.
  3. Get the name of the students with maximum grades for each midterm, final and projects.
  4. Find the average, maximum and minimum grades for midterm, final and projects.
  5. Normalize all grades with min-max normalization.
  6. Sort the dataframe by the name of students lexiconically.

Submit your homeworks to the email of course in a zip file including your data files (excel or csv) and your python code together.

Week 3 (Oct 14): Introduction to Data Manipulation Concept of Data and types of data : Categorical (Nominal, Ordinal) and Numerical (Interval, Ratio). Supervised / Unsupervised learning, Concept of Classification. Algorithm dominance (between rule based learning and decision tree as a sample), KNN Algorithm.

KNN Code

Some useful links covering the course content:

Homework 3: solve your problem with KNN algorithm and compare the outcomes of decision tree and KNN, comment about the success rates.

Week 4 (Oct. 21):  

Deadline for project proposals: Please prepare a couple of paragraphs to introduce your term project idea. Add the problem you want to solve, data set you want to use.

Machine Learning Algorithms: SVM, KNN (repeat), Decision Tree (Repeat),Naive Bayes

Week 4 Coding for Classification algorithms (Mushroom Data Set From Kaggle) and evaluation

Week 4 coding for Iris Data Set

Concept Covered:

  • https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
  • https://scikit-learn.org/stable/modules/naive_bayes.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html

Homework 4 : Implement SVM, NB, Confusion Matrix and Accuracy Score calculations to your previous codes.

Week 5 (Oct 28): Classification Algorithms and Regression (Prediction Algorithms) concepts of classification algorithms, implementing the algorithms

Logistic Regression, Linear Regression, Multiple Linear Regression

Concepts Covered:

  • https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
  • https://scikit-learn.org/stable/modules/linear_model.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html

Week 5 Codes 

Homework 5 : Implement Linear Regression and Logistic Regression algorithms to your previous codes.

 

Week 6 (Nov 4): Regression Algorithms concepts of prediction algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: Linear Regression Polynomial Regression Support Vector Regressor Regression Trees and Decision Tree Regressor Python code for the Regression

Concepts Covered:

  • https://scikit-learn.org/stable/modules/tree.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html

Week 6 Codes

Homework 6: Implement DTR, RFR and SVR on your homework dataset, also play with the parameters of the algorithms and try to find the best MAE and RMSE values without overfit.

Week 7 (Nov 11): Clustering Algorithms concepts of clustering algorithms, 2 types of clustering approaches : Hard Clustering and Soft Clustering, 4 Types of clustering algorithms : Centroid, Density Based, Statistical Distribution, Hierarchical. Evaluation of clustering algorithms, WCSS, Silhouette, Elbow Technique.

K-Means Clustering algorithm and concept of clustering.

  • https://scikit-learn.org/stable/modules/clustering.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html

Python Codes from this year

 

Python Code from last year :

 

Homework 7: Implement clustering algorithm on your homework scenario. You can implement clustering in the preprocessing phase for the data quality or you can benefit from clusters in the machine learning part.

Week 8 (Nov 18) : Clustering Algorithms (Cont.) DBSCAN, Hierarchical Clustering, Agglomerative Clustering and their implementations.

Homework 8: Implement DBSCAN, Hierarchical Clustering and Agglomerative Clustering algorithms, besides the K-Means algorithm from last week. Compare their success and execution performance.

Codes from 8th week

 

Week 9 (Nov 25) Association Rule Mining concepts of association rule mining (ARM) and association rule learning (ARL) algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: A-Priori Algorithm Click Here To Download Apyroiri Library for the Python Codesclick for python code click for knime workflow Homework : Link for Kaggle, instacart

Homework 9: Implement Association Rule Mining code with a-priori, fp-growth or éclat and compare their success and execution times.

 

Week 10 (Dec 2): Concept of Error and Evaluation Techniques

n-Fold Cross Validation , LOO, Split Validation RMSE, MAE, R2 values for regression RandIndex, Silhouet, WCSS for clustering algorithms Accuracy, Recall, Precision, F-Score, F1-Score etc. for classification algorithms

We also got an introduction to dimension reduction with PCA (principal component analysis)

 

Homework 10: Re-implement all the classification and prediction codes in all your homeworks and try to increase the success rates by using preprocessing techniques. Also discuss the evaluation techniques while comparing the preprocessing techniques.

 

Week 11 (Dec 9): Collective Learning and Consensus Learning and Clustering Algorithms: Ensemble Learning, Bagging, Boosting Techniques, Random Forest, GBM, XGBoost, LightGBM Some links useful for the class:

Readings and resources:

Python Codes from the class : Gradient Boosting:

XGBoost (for running the code install XGBoost by the command prompt: conda install -c conda-forge xgboost Install XGBoost extension for Knime

 

Homework 11: Re-implement every classification and regression code with ensemble learning techniques. Compare the success rates and execution speeds in a table (put all the algorithms you have implemented until now to the rows and evaluation metrics to the columns and compare the outcomes in a table).

 

Week 12 (Dec 16): Deep Learning: 

 

Homework 12: Re-implement every classification and regression code with deep learning techniques. Compare the success rates and execution speeds in a table (put all the algorithms you have implemented until now to the rows and evaluation metrics to the columns and compare the outcomes in a table). Also compare the positive or negative effects of preprocessing steps and parameters into the table and discuss the outcomes.

Week 13 (Dec 23):

Project Presentations Presentations will be picked randomly during the class and anybody absent will be considered as not presented. Project Deliveries (until Dec 22): Project Presentation, Project Report (explaining your project, your approach and methodologies, difficulties you have faced, solutions you have found, results you have achieved in your projects, links to your data sources). python codes (in .py format). Please make all these files a single .zip or .rar archive and do not put more than 4 files in your archive.

There is no late submission policy!

Week 14( May 12): TBA
Week 15( May 19): TBA

CS 441 Artificial Intelligence

CS 441 Artificial Intelligence

Classes: Friday 9.00 – 11.00 am

Location: A3-02 (Some meetings might be planned online)

Instructor: Dr. Şadi Evren ŞEKER (+9 0531 605 6726, Ezgi for concatcs)

E-Mail: ai@sadievrenseker.com

Course Content:

  • History and Philosophy of the Artificial Intelligence (AI)
  • Classical AI approaches like search problems, machine learning, constraint satisfaction, graphical models, logic etc.
  • Learning how to model a complex real-world problem by the classical AI approach

Objectives:

  • Introduction to Artificial Intelligence Problems
  • Programming with a mathematical notation language
  • AI Programming skills including and based on Data Structures, Automata, Theory of Algorithm courses.
  • Writing a real world application with an AI module (like a game)
  • Introducing sub-AI topics like neural computing, uncertainity and bayesian networks, concept of learning (supervised / unsupervised) etc.

Texts:

  • S. Russell and P. Norvig Artificial Intelligence: A Modern Approach Prentice Hall
  • —A must check : http://aima.cs.berkeley.edu
  • Some parts of the course is related to Machine Learning, Data Science, Data Mining, Pattern Recognition, Natural Language Processing, Statistics, Logic, Artificial Neural Networks and Fuzzy Logic, so you can read any [text] books about the topics.

Grading:

Grading One individual term project and one individual homework track covering all the topics covered in the course : 50%, Homeworks submitted each week : 50% (duration of homework submission is 1 week until the starting time of the class (11.00 a.m), after Friday 11.00 a.m. the submission will be considered as late submission and will not be taken into grading, any attempt against code of honor will result disciplinary actions and fail from the class)

Project Details:

There will be a final project submission only. Final Project will be individual work, and expectations are :

  • Project Report : all the details explained during the project, including your approach to problem, design issues or problem solving details.
  • Project Presentation : presentation holding the key points of your project implementation and problem solving, you will also use this file for the presentation
  • Running Code in Python: please provide full list of required libraries and execution guide, necessary data files (if you have required for the execution) and all other required files and please put enough comments to your code for readability.

Projects should include at least 2 separate implementation from following list (so, you will re-implement 2 different solutions for the same problem):  Search/ Heuristic, CSP, Game Trees, Logic, Fuzzy, Machine Learning / ANN

You will compare your 2 different solutions from the below perspectives, at least:

  • performance of computation / memory,
  • success rate and implementation
  • difficulties perspectives.

Late Submission Policy: Submissions are due to Ma 19, mid-night, any late submission will get 10% penalty for each day. Also Presentations will be digital submission from any environment you prefer, the presentations will be open to all class members so please share the connection information for every body and at least hold them online for 2 weeks during the final exams period, so everybody can get an opportunity to watch the presentation.

After your submissions (including project report, presentation and running code), I can request a Q&A session from you and I will contact you for the schedule.

Course Outline:

  • —Introduction and Agents (chapters 1,2)
  • —Search (chapters 3,4,5,6)
  • —Logic (chapters 7,8,9)
  • —Planning (chapters 11,12)
  • —Uncertainty (chapters 13,14)
  • —Learning (chapters 18,20)
  • —Natural Language Processing (chapter 22,23)

Schedule and Contents (Tentative):

  • Class 1, :[PPT] Introduction : Course Demonstration Slides, Introduction Slides. Homework 1 : Write an essay about what is AI, what do you think about the limits and current achievements in AI. You can also mention about the movies you have watched about AI.
  • Class 2,: [PPT] Agents, Homework 2: Design an agent in any environment (from real life, or a digital world, game or metaverse etc.) Answer the P.E.A.S. for your agent and also give the environement types (parallel to the concepts in class). Write two python codes about your agent. One reflex agent with if-else statements one model agent with state machines. Your code should also include a test case and demo code to show your agent can act for certain cases in the environment. Submit your codes and report of your agent design (P.E.A.S. and a brief info about your agent and your imaginations) in a zip file to the course e-mail before the next class starts.
  • Class 3,:  [PPT] Search, Homework 3: Design states of your agent in the environment. Draw and/or explain the state diagram and code a state transition graph for your agent (with states, transitions, costs and goal states). Implement DFS and BFS for the goal state of your agent.
  • Class 4,: Deadline of Project Proposals,  [PPT] Heuristic Search, Homework 4: Implement A* search for your problem, apply the a* search on your search space and also indicate your heuristic function merely.  Project proposal : until Apr 30 : please explain your project idea and alternative solution approaches from the course content, together with your data set and outcomes you plan to achieve. Send it in an e-mail with project proposal subject. Your project is very important for the course and possible problems might be related to the data set, algorithms, approaches or your purposes on the project. No late submissions or submissions with misinformation will be replied, so take the risk by your own.
  • Class 5,: [PPT]Constraint Satisfaction Problems, Homework 5: Implement CSP for your problem, apply the backtracking algorithm with any improvement (you can pick any improvement we have covered in the class).
  • Class 6,: [PPT] Game Playing, Homework 6: For the first part of the homework, implement Game playing tree and minimax or maximax tree depending on the story of your game. In the second part of the homework use the heuristic function from homework 4 and try to shorten the game tree. In the third part of the homework, compare the results from first and second parts of the homework.
  • Class 7, : [PPT] Logic, Homework 7: Implement a logic agent with rule based system, discuss the rules in your rule base system and compare the benefits of logical operators in your rule based system with the previous implementations.
  • Class 8,: [PPT]First Order Logic, Inference in First Order Logic, Homework 8: Enhance your previous homework with first order logic and compare the success of new operators in your implementation.
  • Class 9,: [PPT] Uncertainity and Fuzzy Logic , Homework 9: Re-implement the logical rules with fuzzy rules and operators, also compare the outcomes from all implementations in a table (your search, heuristic, game-tree, cap, logic, FOL and Fuzzy implementations), discuss the results.
  • Class 10,: Supervised / Unsupervised Learning and Classification / Clustering Problems, k-nn and k-means
  • Class 11,: Supervised / Unsupervised Learning and Classification / Clustering Problems, k-nn and k-means
  • Class 12,: [PPT] Artificial Neural Networks
  • Class 13,: RL, Deep Learning
  • Class 14,: [PPT] Genetic Algorithms

Coding Practices (from aima.cs.berkeley.edu )

CHAPTER MODULE FILES LINES DESCRIPTION
1-2 AGENTS .py 532 Implement Agents and Environments (Chapters 1-2).
3-4 SEARCH .py .txt 735 Search (Chapters 3-4)
5 CSP .py .txt 449 CSP (Constraint Satisfaction Problems) problems and solvers. (Chapter 5).
6 GAMES .py 285 Games, or Adversarial Search. (Chapters 6)
7-10 LOGIC .py .txt 887 Representations and Inference for Logic (Chapters 7-10)
11-12 PLANNING .py 6 Planning (Chapters 11-12)
13-15 PROBABILITY .py .txt 170 Probability models. (Chapter 13-15)
17 MDP .py .txt 141 Markov Decision Processes (Chapter 17)
18-20 LEARNING .py 585 Learn to estimate functions from examples. (Chapters 18-20)
21 RL .py 14 Reinforcement Learning (Chapter 21)
22 NLP .py .txt 169 A chart parser and some grammars. (Chapter 22)
23 TEXT .py .txt 364 Statistical Language Processing tools. (Chapter 23)
DOCTESTS .py .txt 42 Run all doctests from modules on the command line. For each
PY2HTML .py 109 Pretty-print Python code to colorized, hyperlinked html.
UTILS .py .txt 713 Provide some widely useful utilities. Safe for “from utils import *”.
5201

 

Collaboration Policy: You may freely use internet resources and your course notes in completing assignments and quizzes for this course. You may not consult any person other than the professor when completing quizzes or exams. (Clarifying questions should be directed to the professor.) On assignments you may collaborate with others in the course, so long as you personally prepare the materials submitted under your name, and they accurately reflect your understanding of the topic. Any collaborations should be indicated by a note submitted with the assignment.

Announcements

Please fill the knowledge card attached here, and send it back via email.

CS447 Introduction to Data Science

Antalya University

Course Name: Introduction to Data Science

Course Code: CS 447

Language of Course: English

Credit: 3

Course Coordinator / Instructor: Şadi Evren ŞEKER

Contact: intrds@sadievrenseker.com

Schedule: Tue 13.00 – 16.00

Location: Course will be online, via Discord (for server link please contact Ezgi Erdogan <ezgi.erdogan@optiwisdom.com> )

Courses will be available on YouTube channel (after a delay) : https://www.youtube.com/channel/UCeH53p3W2EJs7IlYyAyQCeg

 

Course Description:  This course is an introduction level course to data science, specialized on machine learning, artificial intelligence and big data.

  • The course starts with a top down approach to data science projects. The first step is covering data science project management techniques and we follow CRISP-DM methodology with 6 steps below:
  • Business Understanding : We cover the types of problems and business processes in real life
  • Data Understanding: We cover the data types and data problems. We also try to visualize data to discover.
  • Data Preprocessing: We cover the classical problems on data and also handling the problems like noisy or dirty data and missing values. Row or column filtering, data integration with concatenation and joins. We cover the data transformation such as discretization, normalization, or pivoting.
  • Machine Learning: we cover the classification algorithms such as Naive Bayes, Decision Trees, Logistic Regression or K-NN. We also cover prediction / regression algorithms like linear regression, polynomial regression or decision tree regression. We also cover unsupervised learning problems like clustering and association rule learning with k-means or hierarchical clustering, and a priori algorithms. Finally we cover ensemble techniques in Knime and Python on Big Data Platforms.
  • Evaluation: In the final step of data science, we study the metrics of success via Confusion Matrix, Precision, Recall, Sensitivity, Specificity for classification; purity , randindex for Clustering and  rmse, rmae, mse, mae for Regression / Prediction problems with Knime and Python on Big Data Platforms.

Course Objective and Learning Outcomes: 

1.     Understanding of real life cases about data

2.     Understanding of real life data related problems

3.     Understanding of data analysis methodologies

4.     Understanding of some basic data operations like: preprocessing, transformation or manipulation

5.     Understanding of new technologies like bigdata, nosql, cloud computing

6.     Ability to use some trending software in the industry

7.     Introduction to data related problems and their applications

Tools:

List of course software:

·       Excel,

·       KNIME,

·       Python Programming with Numpy, Pandas, SKLearn, StatsModel or DASK

This course is following hands on experience in all the steps. So attendance with laptop computers is necessary. Also the software list above, will be provided during the course and the list is subject to updates.

Grading

One individual term project covering all the topics covered in the course : %100

Project Requirements :

You are free to select a project topic. The only requirement about the project is, you have to cover at least two topics from the following list and solve the same problem with two separate approaches from the list, you are also asked to compare your findings from these two alternative solutions : KNN, SVM, XGBoost, LightGBM, CatBoost, Decision Trees, Random Forest, Linear Regression, Polynomial Regression, SVR, ARL (ARM), K-Means, DBSCAN, HC

Sample Project Flow

Sample Project Flow

Example project topic: you can search Kaggle for some idea about the projects, you can also find some good data sets from these web sites.

Project proposal : until Apr 30 : please explain your project idea and alternative solution approaches from the course content.

Project Deliverables: You are asked to submit the below items via mail until May 19, 2020.

  1. Presentation and Demo video: please shoot a video for your presentation and demo of your project.
  2. Project Presentation: slides you are using during the presentation
  3. Project Report : a detailed explanation of your approaches, the difficulties you have faced during the project implementation, comparison of your two alternative approaches to the same problem (from the perspectives of implementation difficulties, their success rates, running performances etc.), some critical parts of your algorithms. Also provide details about increasing the success of your approach. Please answer all of those questions in your project report: what did you do to solve the unbalanced data if you have in your problem? what did you do to solve missing values, dirty or noisy data problems? did you use dimension transformation like PCA or LDA, why? did you check the underfitting or overfitting possibility and how did you get rid of it? did you use any regularization? did you implement segmentation / clustering before the classification or prediction steps, why or why not? Which data science project management method did you use (e.g. SEMMA, CRISP-DM or KDD?) why did you pick this method? Which step was the most difficulty step and why? How did you optimize the parameters of your algorithms? What was the best parameters and why? how did you found these parameters and do you think you can use same parameters for the other data sets in the future for the same problem?
  4. Running Code or Project: you are free to implement your solution in any platform / language. The only requirement about your implementation is, you have to code the two alternative solution on the same platform / programming language (otherwise it will not be fair to compare them). Please also provide an installation manual for your platform and running your code.
  5. Interview: A personal interview will be held after the submissions. Each of you will be asked to provide a time slot of at least 30 minutes for your projects. During this time, you will be asked to connect via an online platform and show your running demo and answer the questions. Please also attach your available time slots to your submissions.

Project Policies: There will be no late submission policy. If you can solve a problem with only 1 approach, which also means you can not compare two approaches, will be graded with 35 points over 100 max. So, please push yourselves to submit two separate approaches for your problem. You are free to use any library during your projects, you are not allowed to use a library or any code on the internet or written by anybody else on the AI part of your project only. So, in other words, you have to write the two different AI module for your project with two different approaches from the course content and using somebodyelse’s code in the AI module will get 0 as the final grade.

Course Content:

Week 1 : Introduction to Data, Problems and Real World Examples:Some useful information:DIKW Pyramid: DIKW pyramid – WikipediaCRISP-DM: Cross-industry standard process for data mining – WikipediaSlides from first week:week1
Week 2 : Introduction to Descriptive Analytics Repeating the first week for majority of the class and starting the concept of end to end data science projects.Installation of Knime from (www.knime.com and a brief introduction document : https://www.knime.com/blog/seven-things-to-do-after-installing-knime )

Weight and Heigh Sample project and Data Set for Knime work flow.

download first workflow

Week 3 : Introduction to Data Manipulation Concept of Data and types of data : Categorical (Nominal, Ordinal) and Numerical (Interval, Ratio). Basic Data Manipulation techniques with Knime: 1.Row Filter and Concept of Missing Values 2.Column Filter 3.Advanced Filters 4.Concatenate 5.Join 6. Group by , Aggregation 7. Formulas, String Replace 8. String Manipulation 9. Discrete, Quantized Data, Binning 10. Normalization 11.Splitting and Merging 12.Type Conversion (Numeric , String)
Week 4 : Introduction to Python Programming for Data Science and an end-to-end Python application for data science Brief review of python programming Introduction to data manipulation libraries: NumPY and Pandas Introduction to the Sci-Kit Learn library and a sample classification You can install anaconda and Spyder from the link below: Also we have covered below topics during the class:

  • Data loading from external source using Pandas library (with read_excel or read_csv methods)
  • DataFrame slicing and dicing (using the iloc property and the lists provided to the iloc method)
  • Column Filtering (with copying into a new data frame)
  • Row Filtering (with copying into a new data frame)
  • Advanced row filtering (like filtering the people with even number of heights)
  • Column or row wise formula (we have calculated the BMI for everybody)
  • Quantization (discretization or binning): where we have applied the condition based binning
  • Min – Max Normalization (we have implemented MinMaxScaler from the SKLearn library)
  • Group By operation (we have implemented the groupby method from pandas library)

Click here to download the codes from the class For further information I strongly suggest you to read the below documentations:

Week 5 : Classification Algorithms concepts of classification algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: K-NN Naive Bayes Decision Tree Logistic Regression Support Vector Machines 2nd Python Code of the course for the classifications Knime Workflow for the classification algorithms
Week 6: Regression Algorithms concepts of prediction algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: Linear Regression Polynomial Regression Support Vector Regressor Regression Trees and Decision Tree Regressor Python code for the RegressionKnime Workflow and the BIST 100 data set for the Regression Algorithms The Data Set obtained from : finance.yahoo.com
Week 7 : Clustering Algorithms concepts of clustering algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: K-Means DBScan Hierarchical Clustering Knime WorkflowPython Code
Week 8 : Association Rule Mining concepts of association rule mining (ARM) and association rule learning (ARL) algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: A-Priori Algorithm Click Here To Download Apyroiri Library for the Python Codesclick for python code click for knime workflow Homework : Link for Kaggle, instacart
Week 9 : Concept of Error and Evaluation Techniques n-Fold Cross Validation , LOO, Split Validation RMSE, MAE, R2 values for regression RandIndex, Silhouet, WCSS for clustering algorithms Accuracy, Recall, Precision, F-Score, F1-Score etc. for classification algorithms We also got an introduction to dimension reduction with PCA (principal component analysis) and Neural networks with MLP (multi layer perceptron) Please don’t forget to install Keras for next week.
Week 10 : Collective Learning
Week 11 : Collective Learning and Consensus Learning and Clustering Algorithms: Ensemble Learning, Bagging, Boosting Techniques, Random Forest, GBM, XGBoost, LightGBM Some links useful for the class:

Readings and resources:

Python Codes from the class : Gradient Boosting: XGBoost (for running the code install XGBoost by the command prompt: conda install -c conda-forge xgboost Install XGBoost extension for Knime

Week 12 : Project Presentations First Group. Presentations will be picked randomly during the class and anybody absent will be considered as not presented. Project Deliveries (until May 6): Project Presentation, Project Report (explaining your project, your approach and methodologies, difficulties you have faced, solutions you have found, results you have achieved in your projects, links to your data sources). Knime Workflows (in .knwf format) and python codes (in .py format). Please make all these files a single .zip or .rar archive and do not put more than 4 files in your archive.
Week 13 : Project Presentations Second Group If you haven missed the project presentations in the first week, please contact me for further details.
Week 14( May 12): TBA
Week 15( May 19): TBA

CS 441 Artificial Intelligence , Spring 2021, Antalya University

CS441 Introduction to Artificial Intelligence

Classes: TUE 13.00 – 16.00

Location: Courses will be online on Discord (for server link please contact Ezgi Erdogan <ezgi.erdogan@optiwisdom.com> )

Courses will be available on YouTube channel (after a delay) : https://www.youtube.com/channel/UCeH53p3W2EJs7IlYyAyQCeg

Instructor: Dr. Şadi Evren ŞEKER (+9 0531 605 6726)

E-Mail: ai@sadievrenseker.com

Web Site: TBA

Course Content:

  • History and Philosophy of the Artificial Intelligence (AI)
  • Classical AI approaches like search problems, machine learning, constraint satisfaction, graphical models, logic etc.
  • Learning how to model a complex real-world problem by the classical AI approach

Objectives:

  • Introduction to Artificial Intelligence Problems
  • Programming with Python for solving Real Life problems with AI Algorithms
  • Writing a real world application with an AI module (like a game)
  • Introducing sub-AI topics like neural computing, uncertainity and bayesian networks, concept of learning (supervised / unsupervised) etc.

Texts:

  • S. Russell and P. Norvig Artificial Intelligence: A Modern Approach Prentice Hall
  • —A must check : http://aima.cs.berkeley.edu
  • Another useful link: https://www.cse.wustl.edu/~garnett/cse511a/
  • Some parts of the course is related to Machine Learning, Data Science, Data Mining, Pattern Recognition, Natural Language Processing, Statistics, Logic, Artificial Neural Networks and Fuzzy Logic, so you can read any [text] books about the topics.

—Grading:

Final Exam (100%).

Requirements and grading details: You are asked to solve the same problem with 2 different AI approaches in same programming language (you can pick any programming language) and compare them. The grading details are listed below:

  • Cheating = 0%, if somebody else solves your problem, writes your code, or you submit an already solved project without any effort on the project and without understanding the approach, strategy or coding, than you will be considered as you have submitted somebody else’s code and you will get 0 and fail the course, without exception.
  • Grading from Approach 1 = Code for Approach 1 (Code will be questioned during demonstration) (25%) + Detailed Report about Approach 1 (5%) + Presentation about Approach 1 ( 5%) = 35%
  • Grading from Approach 2 = Code for Approach 2 (Code will be questioned during demonstration) (25%) + Detailed Report about Approach 2 (5%) + Presentation about Approach 2 ( 5%) = 35%
  • Grading from comparing your approaches = If you have coded two approaches and if both of them are working, then you are asked to compare their advantages and disadvantages, processing complexity, memory complexity, your comments about these approaches. (presentation 10% + report 10% + demonstration 10%).

Grading summary : Approach 1 (35%) + Approach 2 ( 35%) + Comparison (30%) = 100%

  • These percents only represents the maximum score you can get from any part of the project, the grading may be lower, related to the quality of your submission.
  • Also grading will be individual, so depending on the answers during the demonstration you can get a different grade than your team mates.

Important Dates about Projects:

  • 5th of Jan, until midnight, for the submissions of project code (submit all related libraries or data files to execute your code), report (word or pdf format) and presentation files (power point or pdf format).
  • 6th of Jan, until 14.00, project presentation video (submit a video link to the discord channel, so everybody in the class can watch your presentation, the video link can be a youtube video or any drive link)
  • Final Exam week , TBA, Demonstrations, you will be invited to a time slot of 30 minutes. Please check the discord channel for updates.

Final Exam date will be announced at the end of the semester, demonstrations will be by invitation, during the final exams week.

Projects:

Final Project will be group work (max 3 people in a group), and expectations are : Project Report, Project Presentation, Running Code in Python (updated: the programming language is up to you, but you have to use the same programming environment/language for the whole project).

Projects should include at least 2 implementation from following list:  1) Search/ Heuristic, 2) CSP, 3) Game Trees, 4) Logic, 5) Fuzzy, 6) Machine Learning / ANN

Late Submission Policy: Any late submission will get 10% penalty for each 24 hours. Demonstration has no postpone and if you don’t appear on demo time you lose your grade percent from your code.

Course Outline:

  • —Introduction and Agents (chapters 1,2)
  • —Search (chapters 3,4,5,6)
  • —Logic (chapters 7,8,9)
  • —Planning (chapters 11,12)
  • —Uncertainty (chapters 13,14)
  • —Learning (chapters 18,20)
  • —Natural Language Processing (chapter 22,23)

Schedule and Contents (Very Very Very Tentative):

  • Class 1, :[PPT] Introduction : Course Demonstration Slides, Introduction Slides
  • Class 2, : [PPT] Agents
  • Class 3, :  [PPT] Search
  • Class 4, : No Class
  • Class 5, : [PPT] Heuristic Search
  • Class 6, : [PPT]Constraint Satisfaction Problems
  • Class 7, : [PPT] Game Playing
  • Class 8, : Constraint Satisfaction Problems (CSP)
  • Class 9, : [PPT] Logic, [PPT]First Order Logic, Inference in First Order Logic, [PPT] Uncertainity and Fuzzy Logic.
  • Class 10, : Supervised / Unsupervised Learning and Classification / Clustering Problems, k-nn, Decision Tree, Random Forest, Logistic Regression
  • Class 11, : Regression : Logistic Regression, Decision Tree Regression, Linear Regression, Polynomial Regression
  • Class 12, :  [PPT] Artificial Neural Networks
  • Class 13, : Project Presentations
  • Class 15, : Project Presentations
  • Class 16, : No Class. [PPT] Genetic Algorithms
  • Final Exam : Date TBA

Collaboration Policy: You may freely use internet resources and your course notes in completing assignments and quizzes for this course. You may not consult any person other than the professor when completing quizzes or exams. (Clarifying questions should be directed to the professor.) On assignments you may collaborate with others in the course, so long as you personally prepare the materials submitted under your name, and they accurately reflect your understanding of the topic. Any collaborations should be indicated by a note submitted with the assignment.

Announcements

Please fill the knowledge card attached here, and send it back via email.

DS 501 Introduction to Data Science

Antalya University

Course Name: Introduction to Data Science

Course Code: DS 501

Language of Course: English

Credit: 3

Course Coordinator / Instructor: Şadi Evren ŞEKER

Contact: intrds@sadievrenseker.com

Schedule: Tue 13.00 – 16.00

Location: Course will be online, via Discord (for server link please contact Meltem Koc <meltem.koc@optiwisdom.com> )

Courses will be available on YouTube channel (after a delay) : https://www.youtube.com/channel/UCeH53p3W2EJs7IlYyAyQCeg

 

Course Description:  This course is an introduction level course to data science, specialized on machine learning, artificial intelligence and big data.

  • The course starts with a top down approach to data science projects. The first step is covering data science project management techniques and we follow CRISP-DM methodology with 6 steps below:
  • Business Understanding : We cover the types of problems and business processes in real life
  • Data Understanding: We cover the data types and data problems. We also try to visualize data to discover.
  • Data Preprocessing: We cover the classical problems on data and also handling the problems like noisy or dirty data and missing values. Row or column filtering, data integration with concatenation and joins. We cover the data transformation such as discretization, normalization, or pivoting.
  • Machine Learning: we cover the classification algorithms such as Naive Bayes, Decision Trees, Logistic Regression or K-NN. We also cover prediction / regression algorithms like linear regression, polynomial regression or decision tree regression. We also cover unsupervised learning problems like clustering and association rule learning with k-means or hierarchical clustering, and a priori algorithms. Finally we cover ensemble techniques in Knime and Python on Big Data Platforms.
  • Evaluation: In the final step of data science, we study the metrics of success via Confusion Matrix, Precision, Recall, Sensitivity, Specificity for classification; purity , randindex for Clustering and  rmse, rmae, mse, mae for Regression / Prediction problems with Knime and Python on Big Data Platforms.

Course Objective and Learning Outcomes: 

1.     Understanding of real life cases about data

2.     Understanding of real life data related problems

3.     Understanding of data analysis methodologies

4.     Understanding of some basic data operations like: preprocessing, transformation or manipulation

5.     Understanding of new technologies like bigdata, nosql, cloud computing

6.     Ability to use some trending software in the industry

7.     Introduction to data related problems and their applications

Tools:

List of course software:

·       Excel,

·       KNIME,

·       Python Programming with Numpy, Pandas, SKLearn, StatsModel or DASK

This course is following hands on experience in all the steps. So attendance with laptop computers is necessary. Also the software list above, will be provided during the course and the list is subject to updates.

Grading

One individual term project covering all the topics covered in the course : %100

Project Requirements :

You are free to select a project topic. The only requirement about the project is, you have to cover at least two topics from the following list and solve the same problem with two separate approaches from the list, you are also asked to compare your findings from these two alternative solutions : KNN, SVM, XGBoost, LightGBM, CatBoost, Decision Trees, Random Forest, Linear Regression, Polynomial Regression, SVR, ARL (ARM), K-Means, DBSCAN, HC

Sample Project Flow

Sample Project Flow

Example project topic: you can search Kaggle for some idea about the projects, you can also find some good data sets from these web sites.

Project proposal : until Apr 30 : please explain your project idea and alternative solution approaches from the course content.

Project Deliverables: You are asked to submit the below items via mail until May 19, 2020.

  1. Presentation and Demo video: please shoot a video for your presentation and demo of your project.
  2. Project Presentation: slides you are using during the presentation
  3. Project Report : a detailed explanation of your approaches, the difficulties you have faced during the project implementation, comparison of your two alternative approaches to the same problem (from the perspectives of implementation difficulties, their success rates, running performances etc.), some critical parts of your algorithms. Also provide details about increasing the success of your approach. Please answer all of those questions in your project report: what did you do to solve the unbalanced data if you have in your problem? what did you do to solve missing values, dirty or noisy data problems? did you use dimension transformation like PCA or LDA, why? did you check the underfitting or overfitting possibility and how did you get rid of it? did you use any regularization? did you implement segmentation / clustering before the classification or prediction steps, why or why not? Which data science project management method did you use (e.g. SEMMA, CRISP-DM or KDD?) why did you pick this method? Which step was the most difficulty step and why? How did you optimize the parameters of your algorithms? What was the best parameters and why? how did you found these parameters and do you think you can use same parameters for the other data sets in the future for the same problem?
  4. Running Code or Project: you are free to implement your solution in any platform / language. The only requirement about your implementation is, you have to code the two alternative solution on the same platform / programming language (otherwise it will not be fair to compare them). Please also provide an installation manual for your platform and running your code.
  5. Interview: A personal interview will be held after the submissions. Each of you will be asked to provide a time slot of at least 30 minutes for your projects. During this time, you will be asked to connect via an online platform and show your running demo and answer the questions. Please also attach your available time slots to your submissions.

Project Policies: There will be no late submission policy. If you can solve a problem with only 1 approach, which also means you can not compare two approaches, will be graded with 35 points over 100 max. So, please push yourselves to submit two separate approaches for your problem. You are free to use any library during your projects, you are not allowed to use a library or any code on the internet or written by anybody else on the AI part of your project only. So, in other words, you have to write the two different AI module for your project with two different approaches from the course content and using somebodyelse’s code in the AI module will get 0 as the final grade.

Course Content:

Week 1 : Introduction to Data, Problems and Real World Examples:Some useful information:DIKW Pyramid: DIKW pyramid – WikipediaCRISP-DM: Cross-industry standard process for data mining – WikipediaSlides from first week:week1
Week 2 : Introduction to Descriptive Analytics Repeating the first week for majority of the class and starting the concept of end to end data science projects.

Installation of Knime from (www.knime.com and a brief introduction document : https://www.knime.com/blog/seven-things-to-do-after-installing-knime )

Weight and Heigh Sample project and Data Set for Knime work flow.

download first workflow

Week 3 : Introduction to Data Manipulation Concept of Data and types of data : Categorical (Nominal, Ordinal) and Numerical (Interval, Ratio). Basic Data Manipulation techniques with Knime: 1.Row Filter and Concept of Missing Values 2.Column Filter 3.Advanced Filters 4.Concatenate 5.Join 6. Group by , Aggregation 7. Formulas, String Replace 8. String Manipulation 9. Discrete, Quantized Data, Binning 10. Normalization 11.Splitting and Merging 12.Type Conversion (Numeric , String)
Week 4 : Introduction to Python Programming for Data Science and an end-to-end Python application for data science Brief review of python programming Introduction to data manipulation libraries: NumPY and Pandas Introduction to the Sci-Kit Learn library and a sample classification You can install anaconda and Spyder from the link below: Also we have covered below topics during the class:

  • Data loading from external source using Pandas library (with read_excel or read_csv methods)
  • DataFrame slicing and dicing (using the iloc property and the lists provided to the iloc method)
  • Column Filtering (with copying into a new data frame)
  • Row Filtering (with copying into a new data frame)
  • Advanced row filtering (like filtering the people with even number of heights)
  • Column or row wise formula (we have calculated the BMI for everybody)
  • Quantization (discretization or binning): where we have applied the condition based binning
  • Min – Max Normalization (we have implemented MinMaxScaler from the SKLearn library)
  • Group By operation (we have implemented the groupby method from pandas library)

Click here to download the codes from the class For further information I strongly suggest you to read the below documentations:

Week 5 : Classification Algorithms concepts of classification algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: K-NN Naive Bayes Decision Tree Logistic Regression Support Vector Machines 2nd Python Code of the course for the classifications Knime Workflow for the classification algorithms
Week 6: Regression Algorithms concepts of prediction algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: Linear Regression Polynomial Regression Support Vector Regressor Regression Trees and Decision Tree Regressor Python code for the RegressionKnime Workflow and the BIST 100 data set for the Regression Algorithms The Data Set obtained from : finance.yahoo.com
Week 7 : Clustering Algorithms concepts of clustering algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: K-Means DBScan Hierarchical Clustering Knime WorkflowPython Code
Week 8 : Association Rule Mining concepts of association rule mining (ARM) and association rule learning (ARL) algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: A-Priori Algorithm Click Here To Download Apyroiri Library for the Python Codesclick for python code click for knime workflow Homework : Link for Kaggle, instacart
Week 9 : Concept of Error and Evaluation Techniques n-Fold Cross Validation , LOO, Split Validation RMSE, MAE, R2 values for regression RandIndex, Silhouet, WCSS for clustering algorithms Accuracy, Recall, Precision, F-Score, F1-Score etc. for classification algorithms We also got an introduction to dimension reduction with PCA (principal component analysis) and Neural networks with MLP (multi layer perceptron) Please don’t forget to install Keras for next week.
Week 10 : Collective Learning :

 

Week 11 : Collective Learning and Consensus Learning and Clustering Algorithms: Ensemble Learning, Bagging, Boosting Techniques, Random Forest, GBM, XGBoost, LightGBM Some links useful for the class:

Readings and resources:

Python Codes from the class : Gradient Boosting: XGBoost (for running the code install XGBoost by the command prompt: conda install -c conda-forge xgboost Install XGBoost extension for Knime

Week 12 : Project Presentations First Group. Presentations will be picked randomly during the class and anybody absent will be considered as not presented. Project Deliveries (until May 6): Project Presentation, Project Report (explaining your project, your approach and methodologies, difficulties you have faced, solutions you have found, results you have achieved in your projects, links to your data sources). Knime Workflows (in .knwf format) and python codes (in .py format). Please make all these files a single .zip or .rar archive and do not put more than 4 files in your archive.
Week 13 : Project Presentations Second Group If you haven missed the project presentations in the first week, please contact me for further details.
Week 14( May 12): TBA
Week 15( May 19): TBA

ECE 550 Advanced AI

ECE 550 Advanced AI

Classes: Mon 13.00 – 16.00

Location: Courses will be online on Discord (for server link please contact Meltem Koc <meltem.koc@optiwisdom.com> )

Courses will be available on YouTube channel (after a delay) : https://www.youtube.com/channel/UCeH53p3W2EJs7IlYyAyQCeg

Instructor: Dr. Şadi Evren ŞEKER (+9 0531 605 6726)

E-Mail: ai@sadievrenseker.com

Web Site: TBA

Course Content:

  • History and Philosophy of the Artificial Intelligence (AI)
  • Classical AI approaches like search problems, machine learning, constraint satisfaction, graphical models, logic etc.
  • Learning how to model a complex real-world problem by the classical AI approach

Objectives:

  • Introduction to Artificial Intelligence Problems
  • Programming with Python for solving Real Life problems with AI Algorithms
  • Writing a real world application with an AI module (like a game)
  • Introducing sub-AI topics like neural computing, uncertainity and bayesian networks, concept of learning (supervised / unsupervised) etc.

Texts:

  • S. Russell and P. Norvig Artificial Intelligence: A Modern Approach Prentice Hall
  • —A must check : http://aima.cs.berkeley.edu
  • Another useful link: https://www.cse.wustl.edu/~garnett/cse511a/
  • Some parts of the course is related to Machine Learning, Data Science, Data Mining, Pattern Recognition, Natural Language Processing, Statistics, Logic, Artificial Neural Networks and Fuzzy Logic, so you can read any [text] books about the topics.

—Grading:

Final Exam (100%).

Requirements and grading details: You are asked to solve the same problem with 2 different AI approaches in same programming language (you can pick any programming language) and compare them. The grading details are listed below:

  • Cheating = 0%, if somebody else solves your problem, writes your code, or you submit an already solved project without any effort on the project and without understanding the approach, strategy or coding, than you will be considered as you have submitted somebody else’s code and you will get 0 and fail the course, without exception.
  • Grading from Approach 1 = Code for Approach 1 (Code will be questioned during demonstration) (25%) + Detailed Report about Approach 1 (5%) + Presentation about Approach 1 ( 5%) = 35%
  • Grading from Approach 2 = Code for Approach 2 (Code will be questioned during demonstration) (25%) + Detailed Report about Approach 2 (5%) + Presentation about Approach 2 ( 5%) = 35%
  • Grading from comparing your approaches = If you have coded two approaches and if both of them are working, then you are asked to compare their advantages and disadvantages, processing complexity, memory complexity, your comments about these approaches. (presentation 10% + report 10% + demonstration 10%).

Grading summary : Approach 1 (35%) + Approach 2 ( 35%) + Comparison (30%) = 100%

  • These percents only represents the maximum score you can get from any part of the project, the grading may be lower, related to the quality of your submission.
  • Also grading will be individual, so depending on the answers during the demonstration you can get a different grade than your team mates.

Important Dates about Projects:

  • 5th of Jan, until midnight, for the submissions of project code (submit all related libraries or data files to execute your code), report (word or pdf format) and presentation files (power point or pdf format).
  • 6th of Jan, until 14.00, project presentation video (submit a video link to the discord channel, so everybody in the class can watch your presentation, the video link can be a youtube video or any drive link)
  • Final Exam week , TBA, Demonstrations, you will be invited to a time slot of 30 minutes. Please check the discord channel for updates.

Final Exam date will be announced at the end of the semester, demonstrations will be by invitation, during the final exams week.

Projects:

Final Project will be group work (max 3 people in a group), and expectations are : Project Report, Project Presentation, Running Code in Python (updated: the programming language is up to you, but you have to use the same programming environment/language for the whole project).

Projects should include at least 2 implementation from following list:  1) Search/ Heuristic, 2) CSP, 3) Game Trees, 4) Logic, 5) Fuzzy, 6) Machine Learning / ANN

Late Submission Policy: Any late submission will get 10% penalty for each 24 hours. Demonstration has no postpone and if you don’t appear on demo time you lose your grade percent from your code.

Course Outline:

  • —Introduction and Agents (chapters 1,2)
  • —Search (chapters 3,4,5,6)
  • —Logic (chapters 7,8,9)
  • —Planning (chapters 11,12)
  • —Uncertainty (chapters 13,14)
  • —Learning (chapters 18,20)
  • —Natural Language Processing (chapter 22,23)

Schedule and Contents (Very Very Very Tentative):

  • Class 1, :[PPT] Introduction : Course Demonstration Slides, Introduction Slides
  • Class 2, : [PPT] Agents
  • Class 3, :  [PPT] Search
  • Class 4, : No Class
  • Class 5, : [PPT] Heuristic Search
  • Class 6, : [PPT]Constraint Satisfaction Problems
  • Class 7, : [PPT] Game Playing
  • Class 8, : Constraint Satisfaction Problems (CSP)
  • Class 9, : [PPT] Logic, [PPT]First Order Logic, Inference in First Order Logic, [PPT] Uncertainity and Fuzzy Logic.
  • Class 10, : Supervised / Unsupervised Learning and Classification / Clustering Problems, k-nn, Decision Tree, Random Forest, Logistic Regression
  • Class 11, : Regression : Logistic Regression, Decision Tree Regression, Linear Regression, Polynomial Regression
  • Class 12, :  [PPT] Artificial Neural Networks
  • Class 13, : Project Presentations
  • Class 15, : Project Presentations
  • Class 16, : No Class. [PPT] Genetic Algorithms
  • Final Exam : Date TBA

Collaboration Policy: You may freely use internet resources and your course notes in completing assignments and quizzes for this course. You may not consult any person other than the professor when completing quizzes or exams. (Clarifying questions should be directed to the professor.) On assignments you may collaborate with others in the course, so long as you personally prepare the materials submitted under your name, and they accurately reflect your understanding of the topic. Any collaborations should be indicated by a note submitted with the assignment.

Announcements

Please fill the knowledge card attached here, and send it back via email.

ECE 549 Advanced Data Science

Antalya University

Course Name: Advanced Data Science Fall 2020

Course Code: ECE 549

Language of Course: English

Credit: 3

Course Coordinator / Instructor: Şadi Evren ŞEKER

Contact: intrds@sadievrenseker.com

Schedule: Wed 10.00 – 13.00

Location: Course will be online, via Discord (for server link please contact Elif Su YİĞİT <elifsu.yigit@optiwisdom.com> )

Course Description:  This course is an introduction level course to data science, specialized on machine learning, artificial intelligence and big data.

  • The course starts with a top down approach to data science projects. The first step is covering data science project management techniques and we follow CRISP-DM methodology with 6 steps below:
  • Business Understanding : We cover the types of problems and business processes in real life
  • Data Understanding: We cover the data types and data problems. We also try to visualize data to discover.
  • Data Preprocessing: We cover the classical problems on data and also handling the problems like noisy or dirty data and missing values. Row or column filtering, data integration with concatenation and joins. We cover the data transformation such as discretization, normalization, or pivoting.
  • Machine Learning: we cover the classification algorithms such as Naive Bayes, Decision Trees, Logistic Regression or K-NN. We also cover prediction / regression algorithms like linear regression, polynomial regression or decision tree regression. We also cover unsupervised learning problems like clustering and association rule learning with k-means or hierarchical clustering, and a priori algorithms. Finally we cover ensemble techniques in Knime and Python on Big Data Platforms.
  • Evaluation: In the final step of data science, we study the metrics of success via Confusion Matrix, Precision, Recall, Sensitivity, Specificity for classification; purity , randindex for Clustering and  rmse, rmae, mse, mae for Regression / Prediction problems with Knime and Python on Big Data Platforms.

Course Objective and Learning Outcomes: 

1.     Understanding of real life cases about data

2.     Understanding of real life data related problems

3.     Understanding of data analysis methodologies

4.     Understanding of some basic data operations like: preprocessing, transformation or manipulation

5.     Understanding of new technologies like bigdata, nosql, cloud computing

6.     Ability to use some trending software in the industry

7.     Introduction to data related problems and their applications

Tools:

List of course software:

·       Excel,

·       KNIME,

·       Python Programming with Numpy, Pandas, SKLearn, StatsModel or DASK

This course is following hands on experience in all the steps. So attendance with laptop computers is necessary. Also the software list above, will be provided during the course and the list is subject to updates.

Grading

One individual term project covering all the topics covered in the course : %100

Project Requirements :

You are free to select a project topic. The only requirement about the project is, you have to cover at least two topics from the following list and solve the same problem with two separate approaches from the list, you are also asked to compare your findings from these two alternative solutions : KNN, SVM, XGBoost, LightGBM, CatBoost, Decision Trees, Random Forest, Linear Regression, Polynomial Regression, SVR, ARL (ARM), K-Means, DBSCAN, HC

Sample Project Flow

Sample Project Flow

Example project topic: you can search Kaggle for some idea about the projects, you can also find some good data sets from these web sites.

Project proposal : until Apr 30 : please explain your project idea and alternative solution approaches from the course content.

Project Deliverables: You are asked to submit the below items via mail until May 19, 2020.

  1. Presentation and Demo video: please shoot a video for your presentation and demo of your project.
  2. Project Presentation: slides you are using during the presentation
  3. Project Report : a detailed explanation of your approaches, the difficulties you have faced during the project implementation, comparison of your two alternative approaches to the same problem (from the perspectives of implementation difficulties, their success rates, running performances etc.), some critical parts of your algorithms. Also provide details about increasing the success of your approach. Please answer all of those questions in your project report: what did you do to solve the unbalanced data if you have in your problem? what did you do to solve missing values, dirty or noisy data problems? did you use dimension transformation like PCA or LDA, why? did you check the underfitting or overfitting possibility and how did you get rid of it? did you use any regularization? did you implement segmentation / clustering before the classification or prediction steps, why or why not? Which data science project management method did you use (e.g. SEMMA, CRISP-DM or KDD?) why did you pick this method? Which step was the most difficulty step and why? How did you optimize the parameters of your algorithms? What was the best parameters and why? how did you found these parameters and do you think you can use same parameters for the other data sets in the future for the same problem?
  4. Running Code or Project: you are free to implement your solution in any platform / language. The only requirement about your implementation is, you have to code the two alternative solution on the same platform / programming language (otherwise it will not be fair to compare them). Please also provide an installation manual for your platform and running your code.
  5. Interview: A personal interview will be held after the submissions. Each of you will be asked to provide a time slot of at least 30 minutes for your projects. During this time, you will be asked to connect via an online platform and show your running demo and answer the questions. Please also attach your available time slots to your submissions.

Project Policies: There will be no late submission policy. If you can solve a problem with only 1 approach, which also means you can not compare two approaches, will be graded with 35 points over 100 max. So, please push yourselves to submit two separate approaches for your problem. You are free to use any library during your projects, you are not allowed to use a library or any code on the internet or written by anybody else on the AI part of your project only. So, in other words, you have to write the two different AI module for your project with two different approaches from the course content and using somebodyelse’s code in the AI module will get 0 as the final grade.

Course Content:

Week 1 : Introduction to Data, Problems and Real World Examples:Some useful information:DIKW Pyramid: DIKW pyramid – WikipediaCRISP-DM: Cross-industry standard process for data mining – WikipediaSlides from first week:week1
Week 2 : Introduction to Descriptive Analytics Repeating the first week for majority of the class and starting the concept of end to end data science projects.

Installation of Knime from (www.knime.com and a brief introduction document : https://www.knime.com/blog/seven-things-to-do-after-installing-knime )

Weight and Heigh Sample project and Data Set for Knime work flow.

download first workflow

Week 3 : Introduction to Data Manipulation Concept of Data and types of data : Categorical (Nominal, Ordinal) and Numerical (Interval, Ratio). Basic Data Manipulation techniques with Knime: 1.Row Filter and Concept of Missing Values 2.Column Filter 3.Advanced Filters 4.Concatenate 5.Join 6. Group by , Aggregation 7. Formulas, String Replace 8. String Manipulation 9. Discrete, Quantized Data, Binning 10. Normalization 11.Splitting and Merging 12.Type Conversion (Numeric , String)
Week 4 : Introduction to Python Programming for Data Science and an end-to-end Python application for data science Brief review of python programming Introduction to data manipulation libraries: NumPY and Pandas Introduction to the Sci-Kit Learn library and a sample classification You can install anaconda and Spyder from the link below: Also we have covered below topics during the class:

  • Data loading from external source using Pandas library (with read_excel or read_csv methods)
  • DataFrame slicing and dicing (using the iloc property and the lists provided to the iloc method)
  • Column Filtering (with copying into a new data frame)
  • Row Filtering (with copying into a new data frame)
  • Advanced row filtering (like filtering the people with even number of heights)
  • Column or row wise formula (we have calculated the BMI for everybody)
  • Quantization (discretization or binning): where we have applied the condition based binning
  • Min – Max Normalization (we have implemented MinMaxScaler from the SKLearn library)
  • Group By operation (we have implemented the groupby method from pandas library)

Click here to download the codes from the class For further information I strongly suggest you to read the below documentations:

Week 5 : Classification Algorithms concepts of classification algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: K-NN Naive Bayes Decision Tree Logistic Regression Support Vector Machines 2nd Python Code of the course for the classifications Knime Workflow for the classification algorithms
Week 6: Regression Algorithms concepts of prediction algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: Linear Regression Polynomial Regression Support Vector Regressor Regression Trees and Decision Tree Regressor Python code for the RegressionKnime Workflow and the BIST 100 data set for the Regression Algorithms The Data Set obtained from : finance.yahoo.com
Week 7 : Clustering Algorithms concepts of clustering algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: K-Means DBScan Hierarchical Clustering Knime WorkflowPython Code
Week 8 : Association Rule Mining concepts of association rule mining (ARM) and association rule learning (ARL) algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are: A-Priori Algorithm Click Here To Download Apyroiri Library for the Python Codesclick for python code click for knime workflow Homework : Link for Kaggle, instacart
Week 9 : Concept of Error and Evaluation Techniques n-Fold Cross Validation , LOO, Split Validation RMSE, MAE, R2 values for regression RandIndex, Silhouet, WCSS for clustering algorithms Accuracy, Recall, Precision, F-Score, F1-Score etc. for classification algorithms We also got an introduction to dimension reduction with PCA (principal component analysis) and Neural networks with MLP (multi layer perceptron) Please don’t forget to install Keras for next week.
Week 10 : Collective Learning :

 

Week 11 : Collective Learning and Consensus Learning and Clustering Algorithms: Ensemble Learning, Bagging, Boosting Techniques, Random Forest, GBM, XGBoost, LightGBM Some links useful for the class:

Readings and resources:

Python Codes from the class : Gradient Boosting: XGBoost (for running the code install XGBoost by the command prompt: conda install -c conda-forge xgboost Install XGBoost extension for Knime

Week 12 : Project Presentations First Group. Presentations will be picked randomly during the class and anybody absent will be considered as not presented. Project Deliveries (until May 6): Project Presentation, Project Report (explaining your project, your approach and methodologies, difficulties you have faced, solutions you have found, results you have achieved in your projects, links to your data sources). Knime Workflows (in .knwf format) and python codes (in .py format). Please make all these files a single .zip or .rar archive and do not put more than 4 files in your archive.
Week 13 : Project Presentations Second Group If you haven missed the project presentations in the first week, please contact me for further details.
Week 14( May 12): TBA
Week 15( May 19): TBA

CS441 Introduction to Artificial Intelligence

CS441: Introduction to Artificial Intelligence

Classes: Wednesday 14.00 – 17.00 pm

Location: Courses will be online on Discord (for server link please contact Elif Su YİĞİT <elifsu.yigit@optiwisdom.com> )

Instructor: Dr. Şadi Evren ŞEKER (+9 0531 605 6726)

E-Mail: ai@sadievrenseker.com

Web Site: TBA

 

Course Content:

  • History and Philosophy of the Artificial Intelligence (AI)
  • Classical AI approaches like search problems, machine learning, constraint satisfaction, graphical models, logic etc.
  • Learning how to model a complex real-world problem by the classical AI approach

Objectives:

  • Introduction to Artificial Intelligence Problems
  • Programming with Python for solving Real Life problems with AI Algorithms
  • Writing a real world application with an AI module (like a game)
  • Introducing sub-AI topics like neural computing, uncertainity and bayesian networks, concept of learning (supervised / unsupervised) etc.

Texts:

  • S. Russell and P. Norvig Artificial Intelligence: A Modern Approach Prentice Hall
  • —A must check : http://aima.cs.berkeley.edu
  • Another useful link: https://www.cse.wustl.edu/~garnett/cse511a/
  • Some parts of the course is related to Machine Learning, Data Science, Data Mining, Pattern Recognition, Natural Language Processing, Statistics, Logic, Artificial Neural Networks and Fuzzy Logic, so you can read any [text] books about the topics.

—Grading:

Final Exam (100%).

Requirements and grading details: You are asked to solve the same problem with 2 different AI approaches in same programming language (you can pick any programming language) and compare them. The grading details are listed below:

  • Cheating = 0%, if somebody else solves your problem, writes your code, or you submit an already solved project without any effort on the project and without understanding the approach, strategy or coding, than you will be considered as you have submitted somebody else’s code and you will get 0 and fail the course, without exception.
  • Grading from Approach 1 = Code for Approach 1 (Code will be questioned during demonstration) (25%) + Detailed Report about Approach 1 (5%) + Presentation about Approach 1 ( 5%) = 35%
  • Grading from Approach 2 = Code for Approach 2 (Code will be questioned during demonstration) (25%) + Detailed Report about Approach 2 (5%) + Presentation about Approach 2 ( 5%) = 35%
  • Grading from comparing your approaches = If you have coded two approaches and if both of them are working, then you are asked to compare their advantages and disadvantages, processing complexity, memory complexity, your comments about these approaches. (presentation 10% + report 10% + demonstration 10%).

Grading summary : Approach 1 (35%) + Approach 2 ( 35%) + Comparison (30%) = 100%

  • These percents only represents the maximum score you can get from any part of the project, the grading may be lower, related to the quality of your submission.
  • Also grading will be individual, so depending on the answers during the demonstration you can get a different grade than your team mates.

Important Dates about Projects:

  • 5th of Jan, until midnight, for the submissions of project code (submit all related libraries or data files to execute your code), report (word or pdf format) and presentation files (power point or pdf format).
  • 6th of Jan, until 14.00, project presentation video (submit a video link to the discord channel, so everybody in the class can watch your presentation, the video link can be a youtube video or any drive link)
  • Final Exam week , TBA, Demonstrations, you will be invited to a time slot of 30 minutes. Please check the discord channel for updates.

Final Exam date will be announced at the end of the semester, demonstrations will be by invitation, during the final exams week.

Projects:

Final Project will be group work (max 3 people in a group), and expectations are : Project Report, Project Presentation, Running Code in Python (updated: the programming language is up to you, but you have to use the same programming environment/language for the whole project).

Projects should include at least 2 implementation from following list:  1) Search/ Heuristic, 2) CSP, 3) Game Trees, 4) Logic, 5) Fuzzy, 6) Machine Learning / ANN

Late Submission Policy: Any late submission will get 10% penalty for each 24 hours. Demonstration has no postpone and if you don’t appear on demo time you lose your grade percent from your code.

Course Outline:

  • —Introduction and Agents (chapters 1,2)
  • —Search (chapters 3,4,5,6)
  • —Logic (chapters 7,8,9)
  • —Planning (chapters 11,12)
  • —Uncertainty (chapters 13,14)
  • —Learning (chapters 18,20)
  • —Natural Language Processing (chapter 22,23)

Schedule and Contents (Very Very Very Tentative):

  • Class 1, :[PPT] Introduction : Course Demonstration Slides, Introduction Slides
  • Class 2, : [PPT] Agents
  • Class 3, :  [PPT] Search
  • Class 4, : No Class
  • Class 5, : [PPT] Heuristic Search
  • Class 6, : [PPT]Constraint Satisfaction Problems
  • Class 7, : [PPT] Game Playing
  • Class 8, : Constraint Satisfaction Problems (CSP)
  • Class 9, : [PPT] Logic, [PPT]First Order Logic, Inference in First Order Logic, [PPT] Uncertainity and Fuzzy Logic.
  • Class 10, : Supervised / Unsupervised Learning and Classification / Clustering Problems, k-nn, Decision Tree, Random Forest, Logistic Regression
  • Class 11, : Regression : Logistic Regression, Decision Tree Regression, Linear Regression, Polynomial Regression
  • Class 12, :  [PPT] Artificial Neural Networks
  • Class 13, : Project Presentations
  • Class 15, : Project Presentations
  • Class 16, : No Class. [PPT] Genetic Algorithms
  • Final Exam : Date TBA

Collaboration Policy: You may freely use internet resources and your course notes in completing assignments and quizzes for this course. You may not consult any person other than the professor when completing quizzes or exams. (Clarifying questions should be directed to the professor.) On assignments you may collaborate with others in the course, so long as you personally prepare the materials submitted under your name, and they accurately reflect your understanding of the topic. Any collaborations should be indicated by a note submitted with the assignment.

Announcements

Please fill the knowledge card attached here, and send it back via email.

CS447 Introduction to Data Science

Antalya Science University

Course Name: Introduction to Data Science

Course Code: CS 447

Language of Course: English

Credit: 3

Course Coordinator / Instructor: Şadi Evren ŞEKER

Contact: intrds@sadievrenseker.com

Schedule: Tuesday 15.00 – 18.00

Course Description:  This course is an introduction level course to data science, specialized on machine learning, artificial intelligence and big data.

  • The course starts with a top down approach to data science projects. The first step is covering data science project management techniques and we follow CRISP-DM methodology with 6 steps below:
  • Business Understanding : We cover the types of problems and business processes in real life
  • Data Understanding: We cover the data types and data problems. We also try to visualize data to discover. 
  • Data Preprocessing: We cover the classical problems on data and also handling the problems like noisy or dirty data and missing values. Row or column filtering, data integration with concatenation and joins. We cover the data transformation such as discretization, normalization, or pivoting
  • Machine Learning: we cover the classification algorithms such as Naive Bayes, Decision Trees, Logistic Regression or K-NN. We also cover prediction / regression algorithms like linear regression, polynomial regression or decision tree regression. We also cover unsupervised learning problems like clustering and association rule learning with k-means or hierarchical clustering, and a priori algorithms. Finally we cover ensemble techniques in Knime and Python on Big Data Platforms.
  • Evaluation: In the final step of data science, we study the metrics of success via Confusion Matrix, Precision, Recall, Sensitivity, Specificity for classification; purity , randindex for Clustering and  rmse, rmae, mse, mae for Regression / Prediction problems with Knime and Python on Big Data Platforms.

Course Objective and Learning Outcomes: 

1.     Understanding of real life cases about data

2.     Understanding of real life data related problems

3.     Understanding of data analysis methodologies

4.     Understanding of some basic data operations like: preprocessing, transformation or manipulation

5.     Understanding of new technologies like bigdata, nosql, cloud computing

6.     Ability to use some trending software in the industry

7.     Introduction to data related problems and their applications

Tools:

List of course software:

·       Excel,

·       KNIME,

·       Python Programming with Numpy, Pandas, SKLearn, StatsModel or DASK

This course is following hands on experience in all the steps. So attendance with laptop computers is necessary. Also the software list above, will be provided during the course and the list is subject to updates.

Grading

Reading, Attendence and Discussions: 30%

Homeworks: 30%

Project: 40%

Course Content:

Week 1 (Feb 19): Introduction to Data, Problems and Real World Examples:Some useful information:DIKW Pyramid: DIKW pyramid – WikipediaCRISP-DM: Cross-industry standard process for data mining – WikipediaSlides from first week:week1
Week 2 (Feb 26): Introduction to Descriptive Analytics
Repeating the first week for majority of the class and starting the concept of end to end data science projects. Weight and Heigh Sample project and Data Set for Knime work flow. Brief introduction to algorithms: K-NN, Naive Bayes, Decision Trees, Linear Regression

Week 3 (Mar 5): Introduction to Data Manipulation
Concept of Data and types of data : Categorical (Nominal, Ordinal) and Numerical (Interval, Ratio).
Basic Data Manipulation techniques with Knime:
1.Row Filter and Concept of Missing Values
2.Column Filter
3.Advanced Filters
4.Concatenate
5.Join
6. Group by , Aggregation
7. Formulas, String Replace
8. String Manipulation
9. Discrete, Quantized Data, Binning
10. Normalization
11.Splitting and Merging
12.Type Conversion (Numeric , String)

Week 4 (Mar. 12): Introduction to Python Programming for Data Science and an end-to-end Python application for data science
Brief review of python programming
Introduction to data manipulation libraries: NumPY and Pandas
Introduction to the Sci-Kit Learn library and a sample classification

You can install anaconda and Spyder from the link below:

Also we have covered below topics during the class:

  • Data loading from external source using Pandas library (with read_excel or read_csv methods)
  • DataFrame slicing and dicing (using the iloc property and the lists provided to the iloc method)
  • Column Filtering (with copying into a new data frame)
  • Row Filtering (with copying into a new data frame)
  • Advanced row filtering (like filtering the people with even number of heights)
  • Column or row wise formula (we have calculated the BMI for everybody)
  • Quantization (discretization or binning): where we have applied the condition based binning 
  • Min – Max Normalization (we have implemented MinMaxScaler from the SKLearn library)
  • Group By operation (we have implemented the groupby method from pandas library)


Click here to download the codes from the class

For further information I strongly suggest you to read the below documentations:

Week 5 (Mar 19): Classification Algorithms
concepts of classification algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are:
K-NN
Naive Bayes
Decision Tree
Logistic Regression
Support Vector Machines

2nd Python Code of the course for the classifications

Knime Workflow for the classification algorithms

Week 6 (Mar 26): Regression Algorithms
concepts of prediction algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are:
Linear Regression
Polynomial Regression
Support Vector Regressor
Regression Trees and Decision Tree Regressor

Python code for the Regression

Knime Workflow and the BIST 100 data set for the Regression Algorithms 

The Data Set obtained from : finance.yahoo.com

Week 7 (Apr 2): Clustering Algorithms
concepts of clustering algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are:
K-Means
DBScan
Hierarchical Clustering

Knime Workflow

Python Code

Week 8 (Apr 9): Association Rule Mining
concepts of association rule mining (ARM) and association rule learning (ARL) algorithms, implementing the algorithms in Knime and coding in python. Algorithms covered are:
A-Priori Algorithm

Click Here To Download Apyroiri Library for the Python Codes

click for python code 

click for knime workflow

Homework : Link for Kaggle, instacart

Week 9 (Apr 16): Concept of Error and Evaluation Techniques
n-Fold Cross Validation , LOO, Split Validation
RMSE, MAE, R2 values for regression
RandIndex, Silhouet, WCSS for clustering algorithms
Accuracy, Recall, Precision, F-Score, F1-Score etc. for classification algorithms

We also got an introduction to dimension reduction with PCA (principal component analysis) and Neural networks with MLP (multi layer perceptron)

Please don’t forget to install Keras for next week.

Week 10 (Apr 23): Collective Learning

This content has moved to previous week because of the holiday

Week 11 (Apr 30): Collective Learning and Consensus Learning and Clustering Algorithms: Ensemble Learning, Bagging, Boosting Techniques, Random Forest, GBM, XGBoost, LightGBM

Some links useful for the class:

Readings and resources: 

Python Codes from the class :

Gradient Boosting:

XGBoost (for running the code install XGBoost by the command prompt: 

conda install -c conda-forge xgboost

Install XGBoost extension for Knime

 

Week 12 (May 7): Project Presentations First Group.
Presentations will be picked randomly during the class and anybody absent will be considered as not presented.
Project Deliveries (until May 6): Project Presentation, Project Report (explaining your project, your approach and methodologies, difficulties you have faced, solutions you have found, results you have achieved in your projects, links to your data sources). Knime Workflows (in .knwf format) and python codes (in .py format). Please make all these files a single .zip or .rar archive and do not put more than 4 files in your archive.

Week 13 (May 14): Project Presentations Second Group

If you haven missed the project presentations in the first week, please contact me for further details. 

 
 
 

Business Analytics 2018

İstanbul City University

Course Name: Business Analytics

Course Code: MBA 548

Language of Course: English

Credit: 3

Course Coordinator / Instructor: Şadi Evren ŞEKER

Contact: businessA@sadievrenseker.com

Schedule: Sat 13.00 – 16.00

Course Description:  This course is an introduction level course to data analsis, specialized on business processes and real life cases.

This course will uncover you of the information analytics hones executed in the business globe. We will investigate such magic ranges Concerning illustration the explanatory process, how information will be created, stored, accessed, what’s more entryway the association meets expectations with information and makes nature’s turf in which analytics could prosper. The thing that you take in this span will provide for you An solid framework On the whole those territories that backing analytics What’s more will assistance you on preferred position yourself to victory inside your association. You’ll create abilities What’s more An viewpoint that will settle on you All the more profitable speedier Also permit you should turned a profitable advantage should your association. This span additionally gives a support for setting off deeper under propelled investigative Furthermore computational methods, which you bring a chance to investigate On future courses of the information Analytics for benefits of the business specialization.

This course is outlined with have wide bid over Numerous sorts from claiming learners. Anybody who is looking should get an Comprehension about how benefits of the business analytics is really performed for genuine associations will profit. This course will be essential pointed toward experts who have a bachelor’s degree or A percentage introduction of the benefits of the business reality. The individuals for specialized foul degrees or a greater amount propelled business degrees like a mba will discover certain ranges simpler will absorb, What’s more might get most extreme esteem from those span. However, Indeed undergraduates to non-technical fields or propelled high-school people seeking after internships will have the capacity on take after mossycup oak ideas Also get quality from the span. Finally, Significantly experts who bring required profound encounters over systems will inclined discover esteem in this course.

Course Objective: 

1.     Understanding of real life cases about data

2.     Understanding of real life data related problems

3.     Understanding of data analysis methodologies

4.     Understanding of some basic data operations like: preprocessing, transformation or manipulation

5.     Understanding of new technologies like bigdata, nosql, cloud computing

6.     Ability to use some trending software in the industry

7.     Introduction to data related problems and their applications

Method:

List of course software:

·       Excel,

·       KNIME,

·       RapidMiner

·       MS-SQL, SSAS, SSIS

·       Oracle Database, ODI, BI

·       Apache Cassandra

This course is following hands on experience in all the steps. So attendance with laptop computers is necessary. Also the software list above, will be provided during the course and the list is subject to updates.

Grading

 

Reading, Attendence and Discussions: 30%

Homeworks: 30%

Project: 40%

Course Content:

Week 1: Introduction to Data, Problems and Real World Examples:

Some useful information:

DIKW Pyramid: DIKW pyramid – Wikipedia

CRISP-DM: Cross-industry standard process for data mining – Wikipedia

Slides from first week:week1

Week 2: Introduction to Descriptive Analytics

Splitting data into sets : Training, Test and Validation Sets

First Problem Type: Classification

Slides from second week: week 2

Homework #1 (Due Date: Nov. 23, 2017) : Download the data set of customers (click to download). In the data set you can see, each record is holding the salary and age of the customer and their action in the store (buy: they buy a product, notbuy: they don’t buy any product). Create your own Knime data flow and predict the salary of people below:

Salary Age
51
22
33

Write a brief explanation for your submission (which algorithm did you use, what are the results you have achieved and how)

Analytical Problems and Analysis
Business Model, conceptualization and frameworks
Information – Action Value Chain
Data Capturing and data sources: Thinking in Data
Analytical Technologies: Data Storage
Analytical Technologies: Big Data, Cloud and Evolution of Web
Analytical Technologies: Relational Databases
Analytical Technologies: Virtualization, In Memory and NoSQL
Analytical Technologies: Introduction to SQL: Simple Queries
Analytical Technologies: SQL – 2: Multiple Tables, Sub Queries
Data Mining and Data Science Basics 1: Classification Problems
Data Mining and Data Science Basics 2: Regression and Prediction
Understanding Error
Business Intelligence Tools and Applications

 Knime WorkFlows:

Ensemble Fusion Workflow (click to download data and knime workflow)

 

Announcement about Projects

presentations are at 22 December

  1. Project proposal will include : Project idea and the data briefly and if you want to go within a group or not and if it is a group than provide the names and ids of group members (2 paragraphs), will be sent to the course email until next week 1stof december.
  2. Deliverables (21stof December) : Knime workflow, report (including your approach and solution together with project definition and data description), presentation
  3. Presentations will be about 5 – 10 minutes (22ndof December): Explain your problem, describe your data set and demonstration of your solution and benefits , from the business approach. (in minimum 3 slides and a knime workflow (of course working)).
  4. businessA@sadievrenseker.com