CEMAT - Publication

Publications > Artigos ou Capítulos em Livros Editados

Statistical Methods for Word Association in Text Mining

Anacleto Correia; Teodoro, Filomena; Victor Lobo

Recent Studies in Risk Analysis and Statistical Modeling. Teresa Oliveira, Christos Kitsos, Amilcar Oliveiraand Luís M. Grilo (edts). Contributions to Statistics serie, Springer, (2018), 375-384
https://doi.org/10.1007/978-3-319-76605-8_27

Text data has been growing dramatically in the last years, mainly due to the advance of web related technologies that enable people to produce an over whelming amount of data. Many knowledge about the world is encoded in text data available through blogs, tweets, web pages, articles and books. This paper introduces some general techniques for text data mining, based on text retrieval models, that can be applicable to any text in any natural language. The techniques are targeted to problems requiring minimum or no human effort. These techniques, which can be used in many applications, allow the measurement of similarity of contexts, as well as the co-occurrence of terms in text data with differentlevels of granularity.

CEMAT - Center for Computational and Stochastic Mathematics

Instituto Superior Têcnico
Faculdade de Ciências
Universidade de Lisboa