Large language models (LLM) - Asian Massive Crew Community 2002/2020
Home Control Panel Gallery Chat Room Arcade Eye Candy Projects Multimedia Networking Search Sign Up

Advertisements



★ ♥ ★ A Multicultural Community that unites people from all over the world ★ ♥ ★
Go Back   Asian Massive Crew Community 2002/2020 > Forum for Guests > Topics & Posts For Public Viewing
Forgotten Your Password? Register
User Tag List

Reply
 
Thread Tools

Large language models (LLM)
  #1  
Old 3 Weeks Ago
BulletProofYogi's Avatar
BulletProofYogi
Wild Poster
BulletProofYogi is offline
 
Join Date: Mar 2010
Posts: 2,698
BulletProofYogi will become famous soon enough
My Mood:
Status:
I'm Not Telling You It Is Going To Be Easy.. I'm Telling You It's Going To Be Worth It

Large language models (LLM)


Link: https://www.edps.europa.eu/data-prot...-models-llm_en

Author: Xabier Lareo

Quote:
Language models are artificial intelligence (AI) systems designed to learn grammar, syntax and semantics of one or more languages to generate coherent and context-relevant language. Language models have been developed using neural networks since the 1990s, but the results were modest.

The evolution to large language models (LLMs) was made possible by technical developments that improved the performance and efficiency of AI systems.

These developments included the advent of large-scale pre-trained models, the development of transformers (which learn context and meaning by tracking relationships in sequential data), and self-attention mechanisms (which allow models to weigh the importance of different elements in an input sequence and dynamically adjust their influence on the output).

As a type of generative AI system, LLMs create new content in response to user commands based on their training data. They are trained on huge amounts of text sources (from billions to billions of words) from a variety of sources, including public sources, and their size can be measured by the number of parameters used.

They're also considered a type of 'foundation model', which is a model trained on large amounts of data (usually using large-scale self-monitoring) that can be adapted to a variety of applications, including text generation, summarising, translating, answering questions, and more.

The number of parameters in LLMs has increased over time: while version 2 of the Generative Pre-trained Transformer (GPT-2) had 1.5 billion parameters, the Pathways Language Model (PaLM) reached 540 billion parameters. At a certain point, the development of competitive high-performance LLMs seemed to be something that only the most resourceful technology companies, such as Google, Meta or OpenAI, could achieve.

However, two developments changed that trend and made LLM development more broadly available. First, the publication of research showing that there is an optimal set of values when selecting computing power, model size and training dataset size. Second, the appearance of parameter efficient fine-tuning techniques (e.g. LoRA), which have greatly reduced the amount of resources needed to train an LLM - PALM 2 already following this trend and, although it appears to have been trained with a much larger dataset, it has fewer parameters than its predecessor (340 billion against PaLM’s 540 billion).

Some LLM service providers have made their models publicly available – previous registration and, in several cases, using a subscription model - through web interfaces that allow users to enter commands (prompts) and view the output generated by the models. Publicly accessible models are sometimes presented as research previews or testing versions that might produce erroneous or harmful output. LLM service providers also tend to offer access to their models (usually for a fee) through an application programming interface (API) that allows their LLM to be embedded into customers’ IT systems.

LLMs are currently being used or tested for a wide variety of tasks in different domains, including translation; customer care (e.g. chatbots); education (e.g. language training); natural language processing (e.g. named entity recognition or summarisation); supporting the generation of images from a given prompt output; preparation of programming code; or even the creation of artistic works.

As LLMs continue to evolve, they both offer opportunities and important challenges for privacy and data protection.

Positive impacts foreseen on data protection:
LLMs could be used to support certain privacy activities in very specific scenarios, if designed, developed and deployed in a responsible and trustworthy manner, respecting the principles of data protection, privacy, human control and transparency.

For example:

Detection of personal data
Identifying personal data in unstructured data, such as in text fields is relatively easy for humans, but difficult to automate using simple rules. However, human review does not scale well and becomes impractical or unfeasible in large-text files or web-scraped datasets. The natural language processing capabilities of LLMs could help detect and better manage personal data on unstructured information (e.g. a text field containing family history). LLMs could also help reduce the personal data included in their training datasets, by automatically identifying, redacting or obfuscating personal data.

Negative impacts foreseen on data protection:
Training LLMs is a data-intensive activity, which can include personal data
The vast majority of the data used to train state-of-the-art LLMs are texts scraped from publicly available Internet resources (e.g. the latest Common Crawl dataset, which contains data from more than 3 billion pages). These web-scraped datasets contain personal data of public figures, but also of other individuals. Personal data contained in these datasets could be accurate or inaccurate. These datasets could also contain plain misinformation. Implementing controls to address the data protection risks posed by the use of these datasets is very challenging. Moreover, if not properly secured, LLM output might reveal sensitive or private information included in the datasets used for training, leading to potential or real data breaches.

“Hallucinations”, data accuracy and bias
LLMs sometimes suffer from so-called ‘hallucinations’, meaning they produce erroneous information that appears to be correct. When hallucinating, an LLM can produce false or misleading information about individuals. Inaccurate information can affect individuals not only because it can damage their public image, but also because it can lead to decisions that affect them. LLMs, if trained on biased data, could perpetuate or even amplify biases present in their training data. This might lead to unfair or discriminatory outputs, potentially violating the principle of fair processing of personal data.

Implementing data subjects’ rights is difficult
LLMs store the data they learn in the form of the value of billions or trillions of parameters, rather than in a traditional database. For this reason, rectifying, deleting or even requesting access to personal data learned by LLMs, whether it is accurate or made up of “hallucinations”, may be difficult or impossible.
Suggestions for further reading:
Vaswani, Ashish, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin. “Attention is All you Need.”, 2017, https://doi.org/10.48550/arXiv.1706.03762
Kaplan, Jared, Sam McCandlish, T. J. Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeff Wu and Dario Amodei. “Scaling Laws for Neural Language Models.”, 2020, https://doi.org/10.48550/arXiv.2001.08361
Hu, Edward J., Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. "LoRA: Low-rank adaptation of large language models", 2021, https://arxiv.org/abs/2106.09685v2.
Naveed, Humza, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Nick Barnes, and Ajmal Mian. "A comprehensive overview of large language models." , 2023, https://doi.org/10.48550/arXiv.2307.06435
Global Privacy Assembly Resolution on Generative Artificial Intelligence Systems, 2023,

https://edps.europa.eu/system/files/...systems_en.pdf


'We must walk consciously only part way toward our goal, and then leap in the dark to our success.'

- Henry David Thoreau


>> www.minds.com/bulletproofyogi<<


Reply With Quote

5 Lastest Threads by BulletProofYogi
Thread Forum Last Poster Replies Views Last Post
Roasts: Dhruv Rathee + Dhurandhar Topics & Posts For Public Viewing BulletProofYogi 0 1 25-03-2026 13:08
Large language models (LLM) Topics & Posts For Public Viewing BulletProofYogi 0 1 18-03-2026 02:09
Roast: Priyanka Chopra & Javier Bardem Oscars... Topics & Posts For Public Viewing jay999 1 2 17-03-2026 00:56
Sunni & Shia Memes Topics & Posts For Public Viewing BulletProofYogi 0 1 06-03-2026 21:32
Islamic invasion of India: Tughlaq Dynasty Topics & Posts For Public Viewing BulletProofYogi 0 1 06-03-2026 21:29

Reply
Similar Threads
Thread Thread Starter Forum Replies Last Post
Sunni & Shia Memes BulletProofYogi Topics & Posts For Public Viewing 0 06-03-2026 21:32
Sunni Vs Shia: Who killed Muhammad's wife Ayesha? Neha.Kulkarni Comparative Studies in: Society, History & Religion 3 28-02-2024 01:59
Prophet Muhammad and Islamic Adult Breastfeeding (Aisha Did It Too) DeAth_St4r Topics & Posts For Public Viewing 1 24-02-2024 16:24
Nearly Half-a-Dozen People Killed In Kurram In Shia-Sunni Clashes raajveer Topics & Posts For Public Viewing 0 09-07-2023 11:01
Point Of View with #ArzooKazmi #Shia #Sunni omar_p0 Topics & Posts For Public Viewing 0 17-09-2020 00:03


Tags
christianity, politics, thinkers


Posting Rules
You may post new threads
You may post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Bookmarks

 
New To Site? Need Help?

All times are GMT +1. The time now is 13:21.

www.Asian-Massive-Crew.co.uk  | www.Asian-Massive-Crew.com  | www.AsianMassiveCrew.com  |   www.AsianMassiveCrew.co.uk  

 Graphics, Design & Layout  by Web Designerz - The Power To Create..!

Copyright © 2002 Onwards  www.Kalki.co.uk  | Website Hosted by Reality Host

   

DISCLAIMER: Every reasonable effort has been made, to make this site a peaceful yet an entertaining venue. 
The creator nor it's staff shall have neither liability nor responsibility to any person, company or entity whatsoever, 
with respect to any loss, damages or misunderstandings arising from any information or speculation contained
in any of the topics and its updates. Each member is responsible for his/her own thoughts of action when expressed!