Hallucination

Vol. 22 No. 4 – July/August 2024

Hallucination

GPTs and Hallucination:
Why do large language models hallucinate?

The findings in this experiment support the hypothesis that GPTs based on LLMs perform well on prompts that are more popular and have reached a general consensus yet struggle on controversial topics or topics with limited data. The variability in the applications's responses underscores that the models depend on the quantity and quality of their training data, paralleling the system of crowdsourcing that relies on diverse and credible contributions. Thus, while GPTs can serve as useful tools for many mundane tasks, their engagement with obscure and polarized topics should be interpreted with caution. LLMs' reliance on probabilistic models to produce statements about the world ties their accuracy closely to the breadth and quality of the data they're given.

by Jim Waldo, Soline Boussard

Confidential Computing Proofs:
An alternative to cryptographic zero-knowledge

Proofs are powerful tools for integrity and privacy, enabling the verifier to delegate a computation and still verify its correct execution, and enabling the prover to keep the details of the computation private. Both CCP and ZKP can achieve soundness and zero-knowledge but with important differences. CCP relies on hardware trust assumptions, which yield high performance and additional confidentiality protection for the prover but may be unacceptable for some applications. CCP is also often easier to use, notably with existing code, whereas ZKP comes with a large prover overhead that may be unpractical for some applications.

by Mark Russinovich, Cédric Fournet, Greg Zaverucha, Josh Benaloh, Brandon Murdoch, Manuel Costa

Assessing IT Project Success: Perception vs. Reality:
We would not be in the digital age if it were not for the recurrent success of IT projects.

This study has significant implications for practice, research, and education by providing new insights into IT project success. It expands the body of knowledge on project management by reporting project success (and not exclusively project management success), grounded in several objective criteria such as deliverables usage by the client in the post-project stage, hiring of project-related support/maintenance services by the client, contracting of new projects by the client, and vendor recommendation by the client to potential clients. Researchers can find a set of criteria they can use when studying and reporting the success of IT projects, thus expanding the current perspective on evaluation and contributing to more accurate conclusions. For practitioners, this study provides a rich set of criteria that can be used for evaluating their projects, as well as strong evidence of the importance of considering not only project execution, but also post-project outcomes and impacts in the evaluation.

by João Varajão, António Trigo

Questioning the Criteria for Evaluating Non-cryptographic Hash Functions:
Maybe we need to think more about non-cryptographic hash functions.

Although cryptographic and non-cryptographic hash functions are everywhere, there seems to be a gap in how they are designed. Lots of criteria exist for cryptographic hashes motivated by various security requirements, but on the non-cryptographic side there is a certain amount of folklore that, despite the long history of hash functions, has not been fully explored. While targeting a uniform distribution makes a lot of sense for real-world datasets, it can be a challenge when confronted by a dataset with particular patterns.

by Catherine Hayes, David Malone

Program Merge: What's Deep Learning Got to Do with It?:
A discussion with Shuvendu Lahiri, Alexey Svyatkovskiy, Christian Bird, Erik Meijer and Terry Coatta

If you regularly work with open-source code or produce software for a large organization, you're already familiar with many of the challenges posed by collaborative programming at scale. Some of the most vexing of these tend to surface as a consequence of the many independent alterations inevitably made to code, which, unsurprisingly, can lead to updates that don't synchronize. Difficult merges are nothing new, of course, but the scale of the problem has gotten much worse. This is what led a group of researchers at MSR (Microsoft Research) to take on the task of complicated merges as a grand program-repair challenge, one they believed might be addressed at least in part by machine learning.

by Shuvendu Lahiri, Alexey Svyatkovskiy, Christian Bird, Erik Meijer, Terry Coatta