Quantitative Textual Analysis: Introduction to Text Mining
QAC 386
Spring 2022
| Section:
01
|
Course Cluster and Certificates: Applied Data Science Certificate |
We encounter computerized processing of text in almost every field of life. Google tries to infer the meaning of our search queries, online review engines try to extract information about what products are popular with the users, and across different fields scholars analyze text for insights into the processes and phenomena they study. This course will introduce you to the skills necessary to mine text for information and knowledge. You will learn how to use R to retrieve text from a variety of sources, how to use regular expressions to identify which pieces of text are useful to your study, and how to use techniques from data mining to analyze the processed text to extract information and for classification and prediction. |
Credit: 1 |
Gen Ed Area Dept:
SBS QAC |
Course Format: Laboratory Course | Grading Mode: Graded |
Level: UGRD |
Prerequisites: COMP112 OR QAC155 OR QAC239 OR QAC305 OR QAC385 |
|
Fulfills a Requirement for: (CADS)(DATA-MN)(PSYC) |
|
Past Enrollment Probability: 75% - 89% |
SECTION 01 | Special Attributes: CQC |
Major Readings: Wesleyan RJ Julia Bookstore
Selected Chapters from: Jurafsky, Dan and James H. Martin, Statistical Language Processing, draft of 3rd edition, available online at http://web.stanford.edu/~jurafsky/slp3/ Aggarwal, Charu and Zhai ChengXiang, eds., MINING TEXT DATA. Springer Verlag, 2012. Available online through Wesleyan library: http://link.springer.com/book/10.1007%2F978-1-4614-3223-4 Jockers, Matthew A., TEXT ANALYSIS WITH R FOR STUDENTS OF LITERATURE. Springer Verlag, 2014. Available online through Wesleyan library. http://link.springer.com/book/10.1007%2F978-3-319-03164-4 Articles: Hajek, Petr and Vladimir Olej, Word categorization of annual reports for bankruptcy prediction by machine learning methods, in Text, Speech, and Dialogue, Springer International Publishing, 2015, http://dx.doi.org/10.1007/978-3-319-24033-6_14 Wei, Chih-Ping et al., Exploiting temporal characteristics of features for effectively discovering event episodes from news corpora, Journal of the Association for Information Science and Technology, Vol. 65 No. 3, 2014, http://dx.doi.org/10.1002/asi.22995 Mosteller, Frederick and David Wallace, Inference in an Authorship Problem, Journal of the American Statistical Association, Vol. 58(302), 1963, http://www.jstor.org/stable/2283270 Xiaohua Li et al., Recognizing named entities in tweets. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, 2011, Huang, James et al., Improving restaurants by extracting subtopics of Yelp reviews, online, 2013
|
Examinations and Assignments:
Several homework assignments, two take-home midterms, and a final project. Part of the grade depends on in-class participation/preparedness. |
Additional Requirements and/or Comments:
The course includes a strong lab component and programming is a significant part of the course. Programming with python and/or a strong background in R is required. In addition, students should have at least an introductory statistics/data analysis background and that is why QAC211 and ECON 300 are listed as formal prerequisites. Pre-req overrides will be approved by the Professor for students who satisfy this basic requirements through other course work. |
Instructor(s): Oleinikov,Pavel V Times: ..T.R.. 02:50PM-04:10PM; Location: ALLB204; |
Total Enrollment Limit: 19 | | SR major: 0 | JR major: 0 |   |   |
Seats Available: -2 | GRAD: 1 | SR non-major: 7 | JR non-major: 7 | SO: 4 | FR: 0 |
Drop/Add Enrollment Requests | | | | | |
Total Submitted Requests: 0 | 1st Ranked: 0 | 2nd Ranked: 0 | 3rd Ranked: 0 | 4th Ranked: 0 | Unranked: 0 |
|
|