Quantitative Textual Analysis: Introduction to Text Mining
QAC 386
Spring 2018
| Section:
01
|
Certificates: Applied Data Science |
Course Cluster: Data Analysis Minor |
We encounter computerized processing of text in almost every field of life. Google tries to infer the meaning of our search queries, online review engines try to extract information about what products are popular with the users, and across different fields scholars analyze text for insights into the processes and phenomena they study. This course will introduce you to the skills necessary to mine text for information and knowledge. You will learn how to use R to retrieve text from a variety of sources, how to use regular expressions to identify which pieces of text are useful to your study, and how to use techniques from data mining to analyze the processed text to extract information and for classification and prediction. |
Credit: 1 |
Gen Ed Area Dept:
SBS QAC |
Course Format: Laboratory Course | Grading Mode: Graded |
Level: UGRD |
Prerequisites: QAC211 OR ECON300 OR [GOVT367 or QAC302] |
|
Fulfills a Requirement for: (CADS)(DATA-MN)(PSYC) |
|
Past Enrollment Probability: 75% - 89% |
SECTION 01 | Special Attributes: CQC |
Major Readings: Wesleyan RJ Julia Bookstore
Selected Chapters from:
Jurafsky, Dan and James H. Martin, Statistical Language Processing, draft of 3rd edition, available online at http://web.stanford.edu/~jurafsky/slp3/
Aggarwal, Charu and Zhai ChengXiang, eds., MINING TEXT DATA. Springer Verlag, 2012. Available online through Wesleyan library: http://link.springer.com/book/10.1007%2F978-1-4614-3223-4
Jockers, Matthew A., TEXT ANALYSIS WITH R FOR STUDENTS OF LITERATURE. Springer Verlag, 2014. Available online through Wesleyan library. http://link.springer.com/book/10.1007%2F978-3-319-03164-4
Articles:
Hajek, Petr and Vladimir Olej, Word categorization of annual reports for bankruptcy prediction by machine learning methods, in Text, Speech, and Dialogue, Springer International Publishing, 2015, http://dx.doi.org/10.1007/978-3-319-24033-6_14
Wei, Chih-Ping et al., Exploiting temporal characteristics of features for effectively discovering event episodes from news corpora, Journal of the Association for Information Science and Technology, Vol. 65 No. 3, 2014, http://dx.doi.org/10.1002/asi.22995
Mosteller, Frederick and David Wallace, Inference in an Authorship Problem, Journal of the American Statistical Association, Vol. 58(302), 1963, http://www.jstor.org/stable/2283270
Xiaohua Li et al., Recognizing named entities in tweets. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, 2011,
Huang, James et al., Improving restaurants by extracting subtopics of Yelp reviews, online, 2013
|
Examinations and Assignments: Several homework assignments, two take-home midterms, and a final project. Part of the grade depends on in-class participation/preparedness. |
Additional Requirements and/or Comments: An introductory statistics/data analysis background is a prerequisite for the course and that is why QAC211, ECON300, GOVT367 are listed as formal prerequisites. Pre-req overrides will be approved by the Professor for students who satisfy this basic requirements through other course work. The course includes a strong lab component and programming in R is a significant part of the course work. |
Instructor(s): Oleinikov,Pavel V Times: ..T.R.. 02:50PM-04:10PM; Location: ALLB204; |
Total Enrollment Limit: 19 | | SR major: 0 | JR major: 0 |   |   |
Seats Available: -5 | GRAD: 1 | SR non-major: 7 | JR non-major: 7 | SO: 4 | FR: 0 |
Drop/Add Enrollment Requests | | | | | |
Total Submitted Requests: 2 | 1st Ranked: 1 | 2nd Ranked: 0 | 3rd Ranked: 0 | 4th Ranked: 1 | Unranked: 0 |
|
|