Skip to content

Latest commit

 

History

History
101 lines (71 loc) · 6.21 KB

index.md

File metadata and controls

101 lines (71 loc) · 6.21 KB

Text Mining - Course Syllabus

Content

Basic Information

Course Overview and Learning Objectives

Tentative Schedule

Resources

Requirements

Grading and Evaluation

Academic Integrity

Students with Special Needs

Basic Information

Teaching Team

Course Instructor: Derya Ipek Eroglu

Email: deryaipek[at]vt[dot]edu

Office: Pamplin 2004

Office Hours: I am always available for both email conversations and in-person/zoom meetings. To schedule an in-person/zoom meeting, do not hesitate to send an email for an appointment! I will also stay for 15 minutes before and after the class to answer your questions.

Teaching Assistant: TBA

Email: TBA

Office: TBA

(Back to Top)

Course Overview and Learning Objectives

With rapid digitization, we do not only get connected, but also connect faster and generate more content all of which are stored at clouds and data warehouses for various reasons. There is huge amount of data out there to be analyzed, millions of questions to be answered, and millions of algorithms to be developed to better analyze this data. Current Machine Learning (ML) and Deep Learning (DL) practices focus solely on handling this data and extracting insights. Among all the various data types, perhaps the most challenging one is text data. The challenge of the text data comes from its differing use across individuals and its flexibility in expressing different concepts/objects/emotions which are primarily because we are performing a linguistic analysis. In this class, we not only discuss ML/DL applications on Text Mining but also focus on the interpretive nature of it. We also cover some of the bad algorithms and discuss the societal impacts.

Learning Objectives

  1. Help students have a deeper understanding of the ML/DL applications on Text Data
  2. Provide students with the recent text mining practices and the research methodologies
  3. Help students understand the challenges and pitfalls of text mining applications using bad text mining applications
  4. Help students gain a familiarity to the field at the level of utilizing and understanding the research papers of the field

(Back to Top)

Tentative Schedule

Week Topic Resource Activity/Deliverable
1 Intro I - Text as Data Resource Personal Introduction
2 Intro II - Text Mining Applications Resource Project Ideas
3 Text Processing Resource Homework 1
4 Document Similarity Resource Project Teams
5 Document Classification I - Naive Bayes and Classification Tree Resource Project Proposal
6 Document Representation - Embeddings I Resource -
7 Document Representation - Embeddings II Resource Homework 2
8 Document Clustering I - Topic Modeling with LDA Resource Midsemester Feedback
9 Document Clustering II - Topic Modeling with Doc2Vec Resource Progress Report &Cluster Interpretation
10 Document Classification II - Neural Networks Resource Homework 3
11 Document Classification III - Deep Learning Resource -
12 Text Mining and Societal Issues Resource Homework 4
13 Text Mining and Societal Issues Resource Final Exam
14 Project Presentation Resource Final Paper

(Back to Top)

Resources

There are no required textbooks for this class. Resources will be released before the week of the class to stay up-to-date.

Reference Books

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.".
  • Manning, C., & Schutze, H. (1999). Foundations of statistical natural language processing. MIT press.
  • Aggarwal, C. C., & Zhai, C. (Eds.). (2012). Mining text data. Springer Science & Business Media.

(Back to Top)

Requirements

There are no prerequisites for this course. However, to think for your best, we require a certain level of Machine Learning knowledge. You should have a basic understanding of the conventional ML methods such as Naive Bayes, Classification Trees, and Neural Networks. Additionally, we require experience with Python (especialy numpy and matplotlib and scikit-learn libraries). If you do not have that background but want to take this class, please contact the teaching team. We can provide you with some tutorials so that you can catch up the course content.

This class has a 20% attendance requirement. While I would like to provide you with the flexibility to follow the class within your availability, I also want to get to know all of you! Therefore, there is an attendance requirement.

(Back to Top)

Grading and Evaluation

  • Homework Assignments: 40% (10% each)
  • Final Exam: 20%
  • Project Proposal: 5%
  • Progress Report: 15%
  • Final Paper: 20%

(Back to Top)

Academic Integrity

All work other than the project in this course is to be done on an individual basis. Discussion among students are encouraged for homework assignments; however, your submissions should completely be your own work, and should not involve partial or complete copy of another student’s work. Similarly, a project team may not use partial or complete copy of another project team’s work. Graduate Honor System will be in effect for all aspects of this course. (Back to Top)

Students with Special Needs

Students who need to make arrangements for special circumstances should make an appointment to speak with us. We are a strong advocate of an equal opportunity learning environment for all students. We have accommodations that we can implement to ensure this equal opportunity environment. Please contact Services for Students with Disabilities to get assistance with the documentation process. For any other questions, please contact the teaching team. (Back to Top)