Statistical Methods for Word Association in Text Mining
Anacleto Correia; Teodoro, Filomena; Victor Lobo
A publicar em Recent Studies in Risk Analysis and Statistical Modeling. Teresa Oliveira, Christos Kitsos, Amilcar Oliveiraand Luís M. Grilo (edts). Contributions to Statistics serie, Springer
Text data has been growing dramatically in the last years, mainly due to the advance of web related technologies that enable people to produce an over whelming amount of data. Many knowledge about the world is encoded in text data available through blogs, tweets, web pages, articles and books. This paper introduces some general techniques for text data mining, based on text retrieval models, that can be applicable to any text in any natural language. The techniques are targeted to problems requiring minimum or no human effort. These techniques, which can be used in many applications, allow the measurement of similarity of contexts, as well as the co-occurrence of terms in text data with differentlevels of granularity.