AI in Computer Games:
Smarter games are making for a better user experience. What does the future hold?
If you’ve been following the game development scene, you’ve probably heard many remarks such as: "The main role of graphics in computer games will soon be over; artificial intelligence is the next big thing!" Although you should hardly buy into such statements, there is some truth in them. The quality of AI (artificial intelligence) is a high-ranking feature for game fans in making their purchase decisions and an area with incredible potential to increase players’ immersion and fun.
AI Gets a Brain:
New technology allows software to tap real human intelligence.
In the 50 years since John McCarthy coined the term artificial intelligence, much progress has been made toward identifying, understanding, and automating many classes of symbolic and computational problems that were once the exclusive domain of human intelligence. Much work remains in the field because humans still significantly outperform the most powerful computers at completing such simple tasks as identifying objects in photographs - something children can do even before they learn to speak.
Natural Language Translation at the Intersection of AI and HCI:
Old questions being answered with both AI and HCI
The fields of artificial intelligence (AI) and human-computer interaction (HCI) are influencing each other like never before. Widely used systems such as Google Translate, Facebook Graph Search, and RelateIQ hide the complexity of large-scale AI systems behind intuitive interfaces. But relations were not always so auspicious. The two fields emerged at different points in the history of computer science, with different influences, ambitions, and attendant biases. AI aimed to construct a rival, and perhaps a successor, to the human intellect. Early AI researchers such as McCarthy, Minsky, and Shannon were mathematicians by training, so theorem-proving and formal models were attractive research directions.
The Chess Player who Couldn’t Pass the Salt:
AI: Soft and hard, weak and strong, narrow and general
The problem inherent in almost all nonspecialist work in AI is that humans actually don’t understand intelligence very well in the first place. Now, computer scientists often think they understand intelligence because they have so often been the "smart" kid, but that’s got very little to do with understanding what intelligence actually is. In the absence of a clear understanding of how the human brain generates and evaluates ideas, which may or may not be a good basis for the concept of intelligence, we have introduced numerous proxies for intelligence, the first of which is game-playing behavior.
Making Money Using Math:
Modern applications are increasingly using probabilistic machine-learned models.
A big difference between human-written code and learned models is that the latter are usually not represented by text and hence are not understandable by human developers or manipulable by existing tools. The consequence is that none of the traditional software engineering techniques for conventional programs (such as code reviews, source control, and debugging) are applicable anymore. Since incomprehensibility is not unique to learned code, these aspects are not of concern here.
Prediction-Serving Systems:
What happens when we wish to actually deploy a machine learning model to production?
This installment of Research for Practice features a curated selection from Dan Crankshaw and Joey Gonzalez, who provide an overview of machine learning serving systems. What happens when we wish to actually deploy a machine learning model to production, and how do we serve predictions with high accuracy and high computational efficiency? Dan and Joey’s selection provides a thoughtful selection of cutting-edge techniques spanning database-level integration, video processing, and prediction middleware.
The Mythos of Model Interpretability:
In machine learning, the concept of interpretability is both important and slippery.
Supervised machine-learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world?
Knowledge Base Construction in the Machine-learning Era:
Three critical design points: Joint-learning, weak supervision, and new representations
More information is accessible today than at any other time in human history. From a software perspective, however, the vast majority of this data is unusable, as it is locked away in unstructured formats such as text, PDFs, web pages, images, and other hard-to-parse formats. The goal of knowledge base construction is to extract structured information automatically from this "dark data," so that it can be used in downstream applications for search, question-answering, link prediction, visualization, modeling and much more.
Troubling Trends in Machine Learning Scholarship:
Some ML papers suffer from flaws that could mislead the public and stymie future research.
Flawed scholarship threatens to mislead the public and stymie future research by compromising ML’s intellectual foundations. Indeed, many of these problems have recurred cyclically throughout the history of AI and, more broadly, in scientific research. In 1976, Drew McDermott chastised the AI community for abandoning self-discipline, warning prophetically that "if we can’t criticize ourselves, someone else will save us the trouble." The current strength of machine learning owes to a large body of rigorous research to date, both theoretical and empirical. By promoting clear scientific thinking and communication, our community can sustain the trust and investment it currently enjoys.
The Effects of Mixing Machine Learning and Human Judgment:
Collaboration between humans and machines does not necessarily lead to better outcomes.
Based on the theoretical findings from the existing literature, some policymakers and software engineers contend that algorithmic risk assessments such as the COMPAS software can alleviate the incarceration epidemic and the occurrence of violent crimes by informing and improving decisions about policing, treatment, and sentencing. Considered in tandem, these findings indicate that collaboration between humans and machines does not necessarily lead to better outcomes, and human supervision does not sufficiently address problems when algorithms err or demonstrate concerning biases.
Putting Machine Learning into Production Systems:
Data validation and software engineering for machine learning
Breck et al. share details of the pipelines used at Google to validate petabytes of production data every day. With so many moving parts it’s important to be able to detect and investigate changes in data distributions before they can impact model performance. "Software Engineering for Machine Learning: A Case Study" shares lessons learned at Microsoft as machine learning started to pervade more and more of the company’s systems, moving from specialized machine-learning products to simply being an integral part of many products and services.
Biases in AI Systems:
A survey for practitioners
This article provides an organization of various kinds of biases that can occur in the AI pipeline starting from dataset creation and problem formulation to data analysis and evaluation. It highlights the challenges associated with the design of bias-mitigation strategies, and it outlines some best practices suggested by researchers. Finally, a set of guidelines is presented that could aid ML developers in identifying potential sources of bias, as well as avoiding the introduction of unwanted biases. The work is meant to serve as an educational resource for ML developers in handling and addressing issues related to bias in AI systems.
Declarative Machine Learning Systems:
The future of machine learning will depend on it being in the hands of the rest of us.
The people training and using ML models now are typically experienced developers with years of study working within large organizations, but the next wave of ML systems should allow a substantially larger number of people, potentially without any coding skills, to perform the same tasks. These new ML systems will not require users to fully understand all the details of how models are trained and used for obtaining predictions, but will provide them a more abstract interface that is less demanding and more familiar.
Interpretable Machine Learning:
Moving from mythos to diagnostics
The emergence of machine learning as a society-changing technology in the past decade has triggered concerns about people's inability to understand the reasoning of increasingly complex models. The field of IML (interpretable machine learning) grew out of these concerns, with the goal of empowering various stakeholders to tackle use cases, such as building trust in models, performing model debugging, and generally informing real human decision-making.
Steampunk Machine Learning:
Victorian contrivances for modern data science
Fitting models to data is all the rage nowadays but has long been an essential skill of engineers. Veterans know that real-world systems foil textbook techniques by interleaving routine operating conditions with bouts of overload and failure; to be practical, a method must model the former without distortion by the latter. Surprisingly effective aid comes from an unlikely quarter: a simple and intuitive model-fitting approach that predates the Babbage Engine. The foundation of industrial-strength decision support and anomaly detection for production datacenters, this approach yields accurate yet intelligible models without hand-holding or fuss.
Taking Flight with Copilot:
Early insights and opportunities of AI-powered pair-programming tools
Over the next five years, AI-powered tools likely will be helping developers in many diverse tasks. For example, such models may be used to improve code review, directing reviewers to parts of a change where review is most needed or even directly providing feedback on changes. Models such as Codex may suggest fixes for defects in code, build failures, or failing tests. These models are able to write tests automatically, helping to improve code quality and downstream reliability of distributed systems. This study of Copilot shows that developers spend more time reviewing code than actually writing code.
Designing a Framework for Conversational Interfaces:
Combining the latest advances in machine learning with earlier approaches
Wherever possible, business logic should be described by code rather than training data. This keeps our system's behavior principled, predictable, and easy to change. Our approach to conversational interfaces allows them to be built much like any other application, using familiar tools, conventions, and processes, while still taking advantage of cutting-edge machine-learning techniques.
Cargo Cult AI:
Is the ability to think scientifically the defining essence of intelligence?
Evidence abounds that the human brain does not innately think scientifically; however, it can be taught to do so. The same species that forms cargo cults around widespread and unfounded beliefs in UFOs, ESP, and anything read on social media also produces scientific luminaries such as Sagan and Feynman. Today's cutting-edge LLMs are also not innately scientific. But unlike the human brain, there is good reason to believe they never will be unless new algorithmic paradigms are developed.
Echoes of Intelligence:
Textual interpretation and large language models
We are now in the presence of a new medium disguised as good old text, but that text has been generated by an LLM, without authorial intention—an aspect that, if known beforehand, completely changes the expectations and response a human should have from a piece of text. Should our interpretation capabilities be engaged? If yes, under what conditions? The rules of the language game should be spelled out; they should not be passed over in silence.
Improving Testing of Deep-learning Systems:
A combination of differential and mutation testing results in better test data.
We used differential testing to generate test data to improve diversity of data points in the test dataset and then used mutation testing to check the quality of the test data in terms of diversity. Combining differential and mutation testing in this fashion improves mutation score, a test data quality metric, indicating overall improvement in testing effectiveness and quality of the test data when testing deep learning systems.
Is There Another System?:
Computer science is the study of what can be automated.
One of the easiest tests to determine if you are at risk is to look hard at what you do every day and see if you, yourself, could code yourself out of a job. Programming involves a lot of rote work: templating, boilerplate, and the like. If you can see a way to write a system to replace yourself, either do it, don't tell your bosses, and collect your salary while reading novels in your cubicle, or look for something more challenging to work on.
Resolving the Human-subjects Status of Machine Learning's Crowdworkers:
What ethical framework should govern the interaction of ML researchers and crowdworkers?
In recent years, machine learning (ML) has relied heavily on crowdworkers both for building datasets and for addressing research questions requiring human interaction or judgment. The diversity of both the tasks performed and the uses of the resulting data render it difficult to determine when crowdworkers are best thought of as workers versus human subjects. These difficulties are compounded by conflicting policies, with some institutions and researchers regarding all ML crowdworkers as human subjects and others holding that they rarely constitute human subjects. Notably few ML papers involving crowdwork mention IRB oversight, raising the prospect of non-compliance with ethical and regulatory requirements.
Toward Effective AI Support for Developers:
A survey of desires and concerns
The journey of integrating AI into the daily lives of software engineers is not without its challenges. Yet, it promises a transformative shift in how developers can translate their creative visions into tangible solutions. As we have seen, AI tools such as GitHub Copilot are already reshaping the code-writing experience, enabling developers to be more productive and to spend more time on creative and complex tasks. The skepticism around AI, from concerns about job security to its real-world efficacy, underscores the need for a balanced approach that prioritizes transparency, education, and ethical considerations.
Virtual Machinations: Using Large Language Models as Neural Computers:
LLMs can function not only as databases, but also as dynamic, end-user programmable neural computers.
We explore how Large Language Models (LLMs) can function not just as databases, but as dynamic, end-user programmable neural computers. The native programming language for this neural computer is a Logic Programming-inspired declarative language that formalizes and externalizes the chain-of-thought reasoning as it might happen inside a large language model.
GPTs and Hallucination:
Why do large language models hallucinate?
The findings in this experiment support the hypothesis that GPTs based on LLMs perform well on prompts that are more popular and have reached a general consensus yet struggle on controversial topics or topics with limited data. The variability in the applications's responses underscores that the models depend on the quantity and quality of their training data, paralleling the system of crowdsourcing that relies on diverse and credible contributions. Thus, while GPTs can serve as useful tools for many mundane tasks, their engagement with obscure and polarized topics should be interpreted with caution.