Exportar registro bibliográfico


Metrics:

Theoretical learning guarantees applied to acoustic modeling (2019)

  • Authors:
  • USP affiliated authors: MELLO, RODRIGO FERNANDES DE - ICMC ; ALUISIO, SANDRA MARIA - ICMC
  • Unidades: ICMC; ICMC
  • DOI: 10.1186/s13173-018-0081-3
  • Subjects: REDES NEURAIS; APRENDIZADO COMPUTACIONAL; RECONHECIMENTO DE VOZ
  • Keywords: Acoustic modeling; Convolutional neural networks; Shallow learning; Speech recognition; Statistical learning theory; Support vector machines
  • Agências de fomento:
  • Language: Inglês
  • Imprenta:
  • Source:
  • Versão PublicadaOnline source accessDOI
    Informações sobre o DOI: 10.1186/s13173-018-0081-3 (Fonte: oaDOI API)
    • Este periódico é de acesso aberto
    • Este artigo é de acesso aberto
    • URL de acesso aberto
    • Cor do Acesso Aberto: gold
    • Licença: cc-by

    Download do texto completo

    Tipo Nome do arquivo Link
    Versão Publicada2922698.pdfDirect link
    How to cite
    A citação é gerada automaticamente e pode não estar totalmente de acordo com as normas

    • ABNT

      SHULBY, Christopher D; FERREIRA, Martha D; MELLO, Rodrigo Fernandes de; ALUÍSIO, Sandra Maria. Theoretical learning guarantees applied to acoustic modeling. Journal of the Brazilian Computer Society, Heidelberg, SpringerOpen, v. 25, p. 1-12, 2019. Disponível em: < http://dx.doi.org/10.1186/s13173-018-0081-3 > DOI: 10.1186/s13173-018-0081-3.
    • APA

      Shulby, C. D., Ferreira, M. D., Mello, R. F. de, & Aluísio, S. M. (2019). Theoretical learning guarantees applied to acoustic modeling. Journal of the Brazilian Computer Society, 25, 1-12. doi:10.1186/s13173-018-0081-3
    • NLM

      Shulby CD, Ferreira MD, Mello RF de, Aluísio SM. Theoretical learning guarantees applied to acoustic modeling [Internet]. Journal of the Brazilian Computer Society. 2019 ; 25 1-12.Available from: http://dx.doi.org/10.1186/s13173-018-0081-3
    • Vancouver

      Shulby CD, Ferreira MD, Mello RF de, Aluísio SM. Theoretical learning guarantees applied to acoustic modeling [Internet]. Journal of the Brazilian Computer Society. 2019 ; 25 1-12.Available from: http://dx.doi.org/10.1186/s13173-018-0081-3

    Referências citadas na obra
    Witt SM (2012) Automatic error detection in pronunciation training: where we are and where we need to go. Proc IS ADEPT 6:1–8.
    Li K, Qian X, Meng H (2017) Mispronunciation detection and diagnosis in l2 english speech using multidistribution deep neural networks. IEEE/ACM Trans Audio Speech Lang Process 25(1):193–207.
    Hinton G, Deng L, Yu D, Dahl GE, Mohamed A-R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97.
    Chan A (2005) 10 Common Pitfalls of using SphinxTrain. http://www.cs.cmu.edu/~archan/10CommonPitfallsST.html. Accessed: 12 Oct 2016.
    Cieri C, Miller D, Walker K (2004) The Fisher corpus: a resource for the next generations of speech-to-text In: LREC, vol.4, 69–71.
    Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an ASR corpus based on public domain audio books In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5206–5210.. IEEE.
    Chen X, Eversole A, Li G, Yu D, Seide F (2012) Pipelined back-propagation for context-dependent deep neural networks In: Interspeech, 26–29, Portland.
    May T (2017) Robust speech dereverberation with a neural network-based post-filter that exploits multi-conditional training of binaural cues In: IEEE/ACM Trans Audio, Speech, and Lang Process.
    Kim TY, Han CW, Kim S, Ahn D, Jeong S, Lee JW (2016) Korean LVCSR system development for personal assistant service In: Consumer Electronics (ICCE), 2016 IEEE International Conference On, 93–96.. IEEE, Las Vegas.
    Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2016) Understanding deep learning requires rethinking generalization In: CoRR. https://doi.org/abs/1611.03530.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks In: Advances in Neural Information Processing Systems, 1097–1105.
    Ladefoged P, Disner SF (2012) Vowels and consonants. 3rd. Wiley-Blackwell, Malden, MA.
    Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160(1):106–154.
    Abdel-Hamid O, Mohamed A-R, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Lang Process 22(10):1533–1545.
    LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. Handb Brain Theory Neural Netw 3361(10):1995.
    Vapnik V (2013) The Nature of Statistical Learning Theory In: Paperback, 2nd.. Springer, New York.
    Shulby CD, Ferreira MD, de Mello RF, Aluísio SM (2017) Acoustic modeling using a shallow CNN-HTSVM architecture In: 2017 Brazilian Conference on Intelligent Systems (BRACIS), 85–90.. IEEE, Uberlândia.
    Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Proc 37(3):328–339.
    Lee H, Pham P, Largman Y, Ng AY (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks In: Advances in Neural Information Processing Systems, 1096–1104.
    Hau D, Chen K (2011) Exploring hierarchical speech representations with a deep convolutional neural network In: UKCI 2011 Accepted Papers, 37.
    Abdel-Hamid O, Mohamed A-R, Jiang H, Penn G (2012) Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4277–4280.. IEEE, Kyoto.
    Sainath TN, Mohamed A-R, Kingsbury B, Ramabhadran B (2013) Deep convolutional neural networks for LVCSR In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 8614–8618.. IEEE, Vancouver.
    Mohamed A-R, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22.
    Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks In: ICML. vol. 14, 1764–1772.
    Maas AL, Hannun AY, Jurafsky D, Ng AY (2014) First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNS In: CoRR. https://doi.org/abs/1408.2873.
    Tóth L (2015) Phone recognition with hierarchical convolutional deep maxout networks. EURASIP J Audio Speech Music Proc 2015(1):25.
    Dekel O, Keshet J, Singer Y (2004) An online algorithm for hierarchical phoneme classification In: International Workshop on Machine Learning for Multimodal Interaction, 146–158.. Springer, Martigny.
    Karpagavalli S, Chandra E (2015) A hierarchical approach in tamil phoneme classification using support vector machine. Indian J Sci Technol 8(35):57–63.
    Driaunys K, Rudžionis V, žvinys P (2015) Implementation of hierarchical phoneme classification approach on LTDIGITS corpora. Inf Technol Control 38(4):303–310.
    Amami R, Ellouze N (2015) Study of phonemes confusions in hierarchical automatic phoneme recognition system In: CoRR. https://doi.org/abs/1508.01718.
    Schiel F, Draxler C, Baumann A, Ellbogen T, Steffen A (2012) The production of speech corpora. epub uni-muenchen.
    Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, et al (2002) The HTK book. Cambridge Univ Eng Dept 3:175.
    Yuan J, Liberman M (2008) Speaker identification on the SCOTUS corpus. J Acoust Soc Am 123(5):3878.
    Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals.
    Hifny Y, Renals S (2009) Speech recognition using augmented conditional random fields. IEEE Trans Audio Speech Lang Process 17(2):354–365.
    Graves A, Jaitly N, Mohamed A-R (2013) Hybrid speech recognition with deep bidirectional lstm In: Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop On, 273–278.. IEEE, Olomouc.
    Lombart J, Miguel A, Lleida E (2013) Articulatory feature extraction from voice and their impact on hybrid acoustic models In: Advances in Speech and Language Technologies for Iberian Languages, 138–147.. Springer, Las Palmas de Gran Canaria.
    Lopes C, Perdigão F, et al (2009) Phonetic recognition improvements through input feature set combination and acoustic context window widening In: 7th Conference on Telecommunications, Conftele, 449–452.. Citeseer, Porto.
    Garofolo J, Lamel L, Fisher W, Fiscus J, Pallett D, Dahlgren N (1990) The DARPA TIMIT acoustic-phonetic continuous speech corpus, NTIS speech disc. NTIS order number PB91-100354.
    Lee K-F, Hon H-W (1989) Speaker-independent phone recognition using hidden Markov models. IEEE Trans Acoust Speech Signal Process 37(11):1641–1648.
    Bagwell C (2018) Sox(1) - Linux man page. https://linux.die.net/man/1/sox. Accessed: 01 Mar 2018.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.
    Ian Goodfellow AC, Yoshua Bengio (2016) Deep learning. The MIT Press, Cambridge. http://goodfeli.github.io/dlbook/. Accessed 18 Mar 2018.
    Ferreira MD, Corrêa DC, Nonato LG, de Mello RF (2018) Designing architectures of convolutional neural networks to solve practical problems In: Expert Systems with Applications 94(Supplement C), 205–217. https://doi.org/10.1016/j.eswa.2017.10.052.
    Chollet F, et al (2015) Keras. https://keras.io. Accessed 18 Mar 2018.
    Bromberg I, Qian Q, Hou J, Li J, Ma C, Matthews B, Moreno-Daniel A, Morris J, Siniscalchi M, Tsao Y, Wang Y (2017) Detection-based ASR in the automatic speech attribute transcription project In: Proceedings of The Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, 1829–1832.. ISCA, Stockholm, Sweden. https://doi.org/10.21437/Interspeech.2017.
    von Luxburg U, Schölkopf B (2011) Statistical learning theory: models, concepts, and results, vol. 10. Elsevier, North Holland, Amsterdam, Netherlands. Max-Planck-Gesellschaft.
    Chang Y-W, Hsieh C-J, Chang K-W, Ringgaard M, Lin C-J (2010) Training and testing low-degree polynomial data mappings via linear svm. J Mach Learn Res 11(Apr):1471–1490.
    Goldberg Y, Elhadad M (2008) splitSVM: fast, space-efficient, non-heuristic, polynomial kernel computation for NLP applications In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, 237–240.. Association for Computational Linguistics, Columbus.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357.
    MacLean K (2018) Tutorial: create acoustic model - manually. http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/tutorial/triphones/step-10. Accessed: 1 Mar 2018.
    Vertanen K (2018) HTK acoustic models. https://www.keithv.com/software/htk/us/. Accessed: 1 Mar 2018.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830.
    Hinton GE (2011) Connectionist learning procedures. Artif Intell 40(1-3):185–234. https://doi.org/10.1016/0004-3702(89)90049-0.
    Lopes C, Perdigão F (2012) Phone recognition on the TIMIT database In: Speech Technologies. https://doi.org/10.5772/17600.
    de Mello RF, Ferreira MD, Ponti MA (2017) Providing theoretical learning guarantees to deep learning networks In: CoRR. https://doi.org/abs/1711.10292.
    de Mello FR, Antonelli Ponti M, Grossi Ferreira CH (2018) Computing the shattering coefficient of supervised learning algorithms. ArXiv e-prints. http://arxiv.org/abs/1805.02627.
    Hoffmann S, TIK E (2009) Automatic phone segmentation. Corpora 3:2–1.
    Yang HH, Van Vuuren S, Sharma S, Hermansky H (2000) Relevance of time–frequency features for phonetic and speaker-channel classification. Speech Comm 31(1):35–50.

Digital Library of Intellectual Production of Universidade de São Paulo     2012 - 2020