Torrent details for "Silberztein M. Linguistic Resources for Natural Language Processing...2024 [andryold1]"    Log in to bookmark

Torrent details
Cover
Download
Torrent rating (0 rated)
Controls:
Category:
Language:
English English
Total Size:
11.60 MB
Info Hash:
dbb4074baf72320394da20aca52b17f824ec4ce9
Added By:
Added:  
18-03-2024 10:56
Views:
130
Health:
Seeds:
42
Leechers:
5
Completed:
159
wide




Description
wide
Externally indexed torrent
If you are the original uploader, contact staff to have it moved to your account
Textbook in PDF format

Empirical - data-driven, neural network-based, probabilistic, and statistical ? methods seem to be the modern trend. Recently, OpenAI’s ChatGPT, Google’s Bard and Microsoft’s Sydney chatbots have been garnering a lot of attention for their detailed answers across many knowledge domains. In consequence, most AI researchers are no longer interested in trying to understand what common intelligence is or how intelligent agents construct scenarios to solve various problems. Instead, they now develop systems that extract solutions from massive databases used as cheat sheets. In the same manner, Natural Language Processing (NLP) software that uses training corpora associated with empirical methods are trendy, as most researchers in NLP today use large training corpora, always to the detriment of the development of formalized dictionaries and grammars.
Not questioning the intrinsic value of many software applications based on empirical methods, this volume aims at rehabilitating the linguistic approach to NLP. In an introduction, the editor uncovers several limitations and flaws of using training corpora to develop NLP applications, even the simplest ones, such as automatic taggers. The first part of the volume is dedicated to showing how carefully handcrafted linguistic resources could be successfully used to enhance current NLP software applications. The second part presents two representative cases where data-driven approaches cannot be implemented simply because there is not enough data available for low-resource languages. The third part addresses the problem of how to treat multiword units in NLP software, which is arguably the weakest point of NLP applications today but has a simple and elegant linguistic solution.
Nowadays, most Natural Language Processing software applications use empirical “black box” methods associated with training corpora to analyze texts written in natural languages. To analyze a sequence of text, they look for similar sequences in a corpus, select among them the most similar one according to some statistical measurement or some neural-network-based optimization state, and then bring forth its analysis as the new sequence analysis. Here, I first show that the limited size of the corpora used and their questionable quality explain why most NLP applications produce unreliable results. Next, I examine the principles which are at the basis of corpus-based methods and uncover their linguistic naivet?. I finally dispute the scientific validity of empirical approaches. I propose solutions to various problems that are based on the use of carefully handcrafted linguistic methods and resources.
I. Introduction
II. Developing Linguistic-Based NLP Software
III. Linguistic Resources for Low-Resource Languages

  User comments    Sort newest first

No comments have been posted yet.



Post anonymous comment
  • Comments need intelligible text (not only emojis or meaningless drivel).
  • No upload requests, visit the forum or message the uploader for this.
  • Use common sense and try to stay on topic.

  • :) :( :D :P :-) B) 8o :? 8) ;) :-* :-( :| O:-D Party Pirates Yuk Facepalm :-@ :o) Pacman Shit Alien eyes Ass Warn Help Bad Love Joystick Boom Eggplant Floppy TV Ghost Note Msg


    CAPTCHA Image 

    Anonymous comments have a moderation delay and show up after 15 minutes



      Sitefriends