Exportar registro bibliográfico


Metrics:

Combining semantic and term frequency similarities for text clustering (2019)

  • Authors:
  • USP affiliated authors: CAMPELLO, RICARDO JOSÉ GABRIELLI BARRETO - ICMC ; SOARES, VICTOR HUGO ANDRADE - ICMC
  • Unidades: ICMC; ICMC
  • DOI: 10.1007/s10115-018-1278-7
  • Subjects: MINERAÇÃO DE DADOS; RECONHECIMENTO DE TEXTO; DESCOBERTA DE CONHECIMENTO
  • Keywords: Document clustering; Similarity measure; Semantic similarity; Text mining
  • Agências de fomento:
  • Language: Inglês
  • Imprenta:
  • Source:
  • Online source accessDOI
    Informações sobre o DOI: 10.1007/s10115-018-1278-7 (Fonte: oaDOI API)
    • Este periódico é de assinatura
    • Este artigo NÃO é de acesso aberto
    • Cor do Acesso Aberto: closed

    How to cite
    A citação é gerada automaticamente e pode não estar totalmente de acordo com as normas

    • ABNT

      SOARES, Victor Hugo Andrade; CAMPELLO, Ricardo José Gabrielli Barreto; NOURASHRAFEDDIN, Seyednaser; MILIOS, Evangelos; NALDI, Murilo Coelho. Combining semantic and term frequency similarities for text clustering. Knowledge and Information Systems, London, Springer, v. 61, n. 3, p. 1485-1516, 2019. Disponível em: < http://dx.doi.org/10.1007/s10115-018-1278-7 > DOI: 10.1007/s10115-018-1278-7.
    • APA

      Soares, V. H. A., Campello, R. J. G. B., Nourashrafeddin, S., Milios, E., & Naldi, M. C. (2019). Combining semantic and term frequency similarities for text clustering. Knowledge and Information Systems, 61( 3), 1485-1516. doi:10.1007/s10115-018-1278-7
    • NLM

      Soares VHA, Campello RJGB, Nourashrafeddin S, Milios E, Naldi MC. Combining semantic and term frequency similarities for text clustering [Internet]. Knowledge and Information Systems. 2019 ; 61( 3): 1485-1516.Available from: http://dx.doi.org/10.1007/s10115-018-1278-7
    • Vancouver

      Soares VHA, Campello RJGB, Nourashrafeddin S, Milios E, Naldi MC. Combining semantic and term frequency similarities for text clustering [Internet]. Knowledge and Information Systems. 2019 ; 61( 3): 1485-1516.Available from: http://dx.doi.org/10.1007/s10115-018-1278-7

    Referências citadas na obra
    Aggarwal CC, Zhai CX (2012) Mining text data. Springer, Berlin
    Arora S, Liang Y, Ma T (2017) A simple but tough-to-beat baseline for sentence embeddings. In: International conference on learning representations
    Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 805–810. http://dl.acm.org/citation.cfm?id=1630659.1630775 . Accessed 24 Sept 2018
    Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ”nearest neighbor” meaningful? In: Proceedings of the 7th international conference on database theory, ICDT ’99. Springer, London, UK, pp 217–235
    Bishop CM (2006) Pattern recognition and machine learning. No. 4 in Information science and statistics, Springer. https://doi.org/10.1117/1.2819119 . http://www.library.wisc.edu/selectedtocs/bg0137.pdf . ISBN: 0-387-31073-8. Accessed 24 Sept 2018
    Blei DM, Ng AY, Jordan MI, Lafferty J (2003) Latent Dirichlet allocation. J Mach Learn Res 3:2003
    Brants T, Franz A (2006) Web 1T 5-gram corpus version 1. Technical Report, Google Research
    Cai D, He X, Han J (2008) Training linear discriminant analysis in linear time. In: IEEE 24th international conference on data engineering, 2008, ICDE 2008. IEEE, pp 209–217
    Carpenter GA, Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Graph Image Process 37(1):54–115. https://doi.org/10.1016/S0734-189X(87)80014-2
    Cormack GV, Hidalgo JMG, Sánz EP (2007) Feature engineering for mobile (SMS) spam filtering. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM, New York, NY, USA, SIGIR ’07, pp 871–872. https://doi.org/10.1145/1277741.1277951
    Feldman R, Sanger J (2006) The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, Cambridge
    Ferreira R, Lins RD, Freitas F, Simske SJ, Riss M (2014) A new sentence similarity assessment measure based on a three-layer sentence representation. In: Proceedings of the 2014 acm symposium on document engineering, ACM, New York, NY, USA, DocEng ’14, pp 25–34. https://doi.org/10.1145/2644866.2644881
    Ho C, Murad MAA, Kadir RA, Doraisamy SC (2010) Word sense disambiguation-based sentence similarity. In: Proceedings of the 23rd international conference on computational linguistics: posters, association for computational linguistics, Stroudsburg, PA, USA, COLING ’10, pp 418–426. http://dl.acm.org/citation.cfm?id=1944566.1944614 . Accessed 24 Sept 2018
    Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75:800–802
    Hochberg Y, Tamhane AC (1987) Multiple comparison procedures. Wiley, New York
    Hollander M, Wolfe DA (1999) Nonparametric statistical methods. Wiley series in probability and statistics, Wiley, New York. A Wiley-Interscience publication. http://opac.inria.fr/record=b1095753 . Accessed 24 Sept 2018
    Horn RA, Johnson CRCR (2012) Matrix analysis, 2nd edn. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781139020411
    Hotho A, Nrnberger A, Paa G (2005) A brief survey of text mining. LDV Forum GLDV J Comput Linguist Lang Technol 20(1):19–62
    Hu J, Fang L, Cao Y, Zeng HJ, Li H, Yang Q, Chen Z (2008) Enhancing text clustering by leveraging wikipedia semantics. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’08, pp 179–186. https://doi.org/10.1145/1390334.1390367
    Huang A, Milne D, Frank E, Witten IH (2008) Clustering documents with active learning using wikipedia. In: Proceedings of the 2008 Eighth IEEE international conference on data mining , ICDM ’08. IEEE Computer Society, Washington, DC, USA, pp 839–844. https://doi.org/10.1109/ICDM.2008.80
    Huang L, Milne D, Frank E, Witten IH (2012) Learning a concept-based document similarity measure. J Am Soc Inf Sci Technol 63(8):1593–1608. https://doi.org/10.1002/asi.22689
    Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218. https://doi.org/10.1007/bf01908075
    Islam A, Inkpen D (2008) Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans Knowl Discov Data 2(2):10:1–10:25. https://doi.org/10.1145/1376815.1376819
    Islam A, Milios E, Keselj V (2012) Text similarity using Google tri-grams. In: Proceedings of the 25th Canadian conference on advances in artificial intelligence, Canadian AI’12. Springer, Berlin, Heidelberg, pp 312–317
    Kaplan A (1955) An experimental study of ambiguity and context. Mech Transl 2:39–46. http://www.mt-archive.info/MT-1955-Kaplan.pdf
    Kogan J, Nicholas C, Volkovich V (2003) Text mining with information-theoretic clustering. Comput Sci Eng 5(6):52–59. https://doi.org/10.1109/MCISE.2003.1238704
    Krishnapuram R, Joshi A, Nasraoui O, Yi L (2001) Low-complexity fuzzy relational clustering algorithms for web mining. Trans Fuzzy Syst 9(4):595–607. https://doi.org/10.1109/91.940971
    Lafore R (2002) Data structures and algorithms in Java, 2nd edn. Sams, Indianapolis
    Lau JH, Baldwin T (2016) An empirical evaluation of doc2vec with practical insights into document embedding generation. CoRR arXiv:1607.05368
    Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on international conference on machine learning, ICML’14, Vol 32. JMLR.org, pp II–1188–II–1196. http://dl.acm.org/citation.cfm?id=3044805.3045025 . Accessed 24 Sept 2018
    Lee MD, Welsh M (2005) An empirical evaluation of models of text document similarity. In: In CogSci2005, Erlbaum, pp 1254–1259
    Li Y, McLean D, Bandar ZA, O’Shea JD, Crockett K (2006) Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng 18(8):1138–1150. https://doi.org/10.1109/TKDE.2006.130
    Liu L, Shell D (2010) Assessing optimal assignment under uncertainty: an interval-based algorithm. In: Proceedings of robotics: science and systems, Zaragoza, Spain. https://doi.org/10.15607/RSS.2010.VI.016
    Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
    Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York
    Meng L, Tan AH, Xu D (2014) Semi-supervised heterogeneous fusion for multimedia data co-clustering. IEEE Trans Knowl Data Eng 26(9):2293–2306
    Meng L, Tan AH, Wunsch DC (2015) Adaptive scaling of cluster boundaries for large-scale social media data clustering. IEEE Trans Neural Netw Learn Syst 27(12):2656–2669
    Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st national conference on artificial intelligence, Vol 1, AAAI’06. AAAI Press, pp 775–780. http://dl.acm.org/citation.cfm?id=1597538.1597662 . Accessed 24 Sept 2018
    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR arXiv:1301.3781
    Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41. https://doi.org/10.1145/219717.219748
    Milne D, Witten IH (2013) An open-source toolkit for mining wikipedia. Artif Intell 194:222–239. https://doi.org/10.1016/j.artint.2012.06.007
    Naldi MC, Campello RJGB, Hruschka ER, Carvalho ACPLF (2011) Efficiency issues of evolutionary k-means. Appl Soft Comput 11(2):1938–1952
    Nourashrafeddin S (2014) Interactive user-supervised text document clustering. Ph.D. thesis, Dalhousie University
    Nourashrafeddin S, Milios E, Arnold D (2013) Interactive text document clustering using feature labeling. In: Proceedings of the 2013 ACM symposium on document engineering, DocEng ’13. ACM, New York, NY, USA, pp 61–70. https://doi.org/10.1145/2494266.2494279
    Nourashrafeddin S, Milios E, Arnold DV (2014) An ensemble approach for text document clustering using wikipedia concepts. In: Proceedings of the 2014 ACM symposium on document engineering, DocEng ’14. ACM, New York, NY, USA, pp 107–116. https://doi.org/10.1145/2644866.2644868
    Paulovich FV, Nonato LG, Minghim R, Levkowitz H (2008) Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans Vis Comput Graph 14(3):564–575. https://doi.org/10.1109/TVCG.2007.70443
    Rakib M, Islam A, Milios E (2015) TrWP: text relatedness using word and phrase relatedness. In: Proceedings of the SemEval-2015. ACM, New York, NY, USA, pp 90–95. https://doi.org/10.1145/2644866.2644881
    Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536. https://doi.org/10.1038/323533a0
    Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523. https://doi.org/10.1016/0306-4573(88)90021-0
    Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill Inc., New York
    Tang B, Shepherd M, Milios E, Heywood M (2005) Comparing and combining dimension reduction techniques for efficient text clustering. In: SIAM international workshop on feature selection for data mining - interfacing machine learning and statistics, Newport Beach, California, in conjunction with 2005 SIAM international conference on data mining, pp 1–10
    Walpole RE, Myers RH, Myers SL, Ye K (2007) Probability and statistics for engineers and scientists, 8th edn. Pearson Education, Upper Saddle River
    Wei T, Lu Y, Chang H, Zhou Q, Bao X (2015) A semantic approach for text clustering using wordnet and lexical chains. Expert Syst Appl 42(4):2264–2275. https://doi.org/10.1016/j.eswa.2014.10.023
    Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on association for computational linguistics, ACL ’94. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 133–138. https://doi.org/10.3115/981732.981751

Digital Library of Intellectual Production of Universidade de São Paulo     2012 - 2020