60 avsnitt • Längd: 25 min • Veckovis: Lördag
Listen to resources from the AI Safety Fundamentals: Governance course!https://aisafetyfundamentals.com/governance
The podcast AI Safety Fundamentals: Governance is created by BlueDot Impact. The podcast and the artwork on this page are embedded on this page using the public podcast feed (RSS).
In the fall of 2023, the US Bipartisan Senate AI Working Group held insight forms with global leaders. Participants included the leaders of major AI labs, tech companies, major organizations adopting and implementing AI throughout the wider economy, union leaders, academics, advocacy groups, and civil society organizations. This document, released on March 15, 2024, is the culmination of those discussions. It provides a roadmap that US policy is likely to follow as the US Senate begins to create legislation.
Original text:
https://www.politico.com/f/?id=0000018f-79a9-d62d-ab9f-f9af975d0000
Author(s):
Majority Leader Chuck Schumer, Senator Mike Rounds, Senator Martin Heinrich, and Senator Todd Young
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This paper explores the under-discussed strategies of adaptation and resilience to mitigate the risks of advanced AI systems. The authors present arguments supporting the need for societal AI adaptation, create a framework for adaptation, offer examples of adapting to AI risks, outline the concept of resilience, and provide concrete recommendations for policymakers.
Original text:
https://drive.google.com/file/d/1k3uqK0dR9hVyG20-eBkR75_eYP2efolS/view?usp=sharing
Author(s):
Jamie Bernardi, Gabriel Mukobi, Hilary Greaves, Lennart Heim, and Markus Anderljung
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
In this paper from CSET, Ben Buchanan outlines a framework for understanding the inputs that power machine learning. Called "the AI Triad", it focuses on three inputs: algorithms, data, and compute.
Original text:
https://cset.georgetown.edu/wp-content/uploads/CSET-AI-Triad-Report.pdf
Author(s):
Ben Buchanan
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This document from the OECD is split into two sections: principles for responsible stewardship of trustworthy AI & national policies and international co-operation for trustworthy AI. 43 governments around the world have agreed to adhere to the document. While originally written in 2019, updates were made in 2024 which are reflected in this version.
Original text:
https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449
Author(s):
The Organization for Economic Cooperation and Development
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This summary of UNESCO's Recommendation on the Ethics of AI outlines four core values, ten core principles, and eleven actionable policies for responsible AI governance. The full text was agreed to by all 193 member states of the United Nations.
Original text:
https://unesdoc.unesco.org/ark:/48223/pf0000385082
Author(s):
The United Nations Educational, Scientific, and Cultural Organziation
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This report by the UK's Department for Science, Technology, and Innovation outlines a regulatory framework for UK AI policy. Per the report, "AI is helping to make our jobs safer and more satisfying, conserve our wildlife and fight climate change, and make our public services more efficient. Not only do we need to plan for the capabilities and uses of the AI systems we have today, but we must also prepare for a near future where the most powerful systems are broadly accessible and significantly more capable"
Original text: https://www.gov.uk/government/consultations/ai-regulation-a-pro-innovation-approach-policy-proposals/outcome/a-pro-innovation-approach-to-ai-regulation-government-response#executive-summary
Author(s): UK Department of Science, Technology, and Innovation
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This statement was released by the UK Government as part of their Global AI Safety Summit from November 2023. It notes that frontier models pose unique risks and calls for international cooperation, finding that "many risks arising from AI are inherently international in nature, and so are best addressed through international cooperation." It was signed by multiple governments, including the US, EU, India, and China.
Original text:
https://www.gov.uk/government/publications/ai-safety-summit-2023-the-bletchley-declaration/the-bletchley-declaration-by-countries-attending-the-ai-safety-summit-1-2-november-2023
Author(s):
UK Government
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This yearly report from Stanford’s Center for Humane AI tracks AI governance actions and broader trends in policies and legislation by governments around the world in 2023. It includes a summary of major policy actions taken by different governments, as well as analyses of regulatory trends, the volume of AI legislation, and different focus areas governments are prioritizing in their interventions.
Original text:
https://aiindex.stanford.edu/wp-content/uploads/2024/04/HAI_AI-Index-Report-2024_Chapter_7.pdf
Authors:
Nestor Maslej et al.
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This high-level overview by CISA summarizes major US policies on AI at the federal level. Important items worth further investigation include Executive Order 14110, the voluntary commitments, the AI Bill of Rights, and Executive Order 13859.
Original text:
https://www.cisa.gov/ai/recent-efforts
Author(s):
The US Cybersecurity and Infrastructure Security Agency
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This fact sheet from The White House summarizes President Biden's AI Executive Order from October 2023. The President's AI EO represents the most aggressive approach to date from the US executive branch on AI policy.
Original text:
https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/
Author(s):
The White House
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This primer by the Future of Life Institute highlights core elements of the EU AI Act. It includes a high level summary alongside explanations of different restrictions on prohibited AI systems, high-risk AI systems, and general purpose AI.
Original text:
https://artificialintelligenceact.eu/high-level-summary/
Author(s):
The Future of Life Institute
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This report from the Carnegie Endowment for International Peace summarizes Chinese AI policy as of mid-2023. It also provides analysis of the factors motivating Chinese AI Governance. We're providing a more structured analysis to Chinese AI policy relative to other governments because we expect learners will be less familiar with the Chinese policy process.
Original text:
https://carnegieendowment.org/2023/07/10/china-s-ai-regulations-and-how-they-get-made-pub-90117
Author(s):
Matt Sheehan
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This report by the Center for Security and Emerging Technology first analyzes the tensions and tradeoffs between three strategic technology and national security goals: driving technological innovation, impeding adversaries’ progress, and promoting safe deployment. It then identifies different direct and enabling policy levers, assessing each based on the tradeoffs they make.
While this document is designed for US policymakers, most of its findings are broadly applicable.
Original text:
https://cset.georgetown.edu/wp-content/uploads/The-Policy-Playbook.pdf
Authors:
Jack Corrigan, Melissa Flagg, and Dewi Murdick
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This report from the Centre for Emerging Technology and Security and the Centre for Long-Term Resilience identifies different levers as they apply to different stages of the AI life cycle. They split the AI lifecycle into three stages: design, training, and testing; deployment and usage; and longer-term deployment and diffusion. It also introduces a risk mitigation hierarchy to rank different approaches in decreasing preference, arguing that “policy interventions will be most effective if they intervene at the point in the lifecycle where risk first arises.”
While this document is designed for UK policymakers, most of its findings are broadly applicable.
Original text:
https://cetas.turing.ac.uk/sites/default/files/2023-08/cetas-cltr_ai_risk_briefing_paper.pdf
Authors:
Ardi Janjeva, Nikhil Mulani, Rosamund Powell, Jess Whittlestone, and Shahar Avi
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This report by the Nuclear Threat Initiative primarily focuses on how AI's integration into biosciences could advance biotechnology but also poses potentially catastrophic biosecurity risks. It’s included as a core resource this week because the assigned pages offer a valuable case study into an under-discussed lever for AI risk mitigation: building resilience.
Resilience in a risk reduction context is defined by the UN as “the ability of a system, community or society exposed to hazards to resist, absorb, accommodate, adapt to, transform and recover from the effects of a hazard in a timely and efficient manner, including through the preservation and restoration of its essential basic structures and functions through risk management.” As you’re reading, consider other areas where policymakers might be able to build a more resilient society to mitigate AI risk.
Original text:
https://www.nti.org/wp-content/uploads/2023/10/NTIBIO_AI_FINAL.pdf
Authors:
Sarah R. Carter, Nicole E. Wheeler, Sabrina Chwalek, Christopher R. Isaac, and Jaime Yassif
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This excerpt from CAIS’s AI Safety, Ethics, and Society textbook provides a deep dive into the CAIS resource from session three, focusing specifically on the challenges of controlling advanced AI systems.
Original Text:
https://www.aisafetybook.com/textbook/1-5
Author:
The Center for AI Safety
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
To solve rogue AIs, we’ll have to align them. In this article by Adam Jones of BlueDot Impact, Jones introduces the concept of aligning AIs. He defines alignment as “making AI systems try to do what their creators intend them to do.”
Original text:
https://aisafetyfundamentals.com/blog/what-is-ai-alignment/
Author:
Adam Jones
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This article from the Center for AI Safety provides an overview of ways that advanced AI could cause catastrophe. It groups catastrophic risks into four categories: malicious use, AI race, organizational risk, and rogue AIs. The article is a summary of a larger paper that you can read by clicking here.
Original text:
https://www.safe.ai/ai-risk
Authors:
Dan Hendrycks, Thomas Woodside, Mantas Mazeika
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This report from the UK’s Government Office for Science envisions five potential risk scenarios from frontier AI. Each scenario includes information on the AI system’s capabilities, ownership and access, safety, level and distribution of use, and geopolitical context. It provides key policy issues for each scenario and concludes with an overview of existential risk. If you have extra time, we’d recommend you read the entire document.
Original text:
https://assets.publishing.service.gov.uk/media/653bc393d10f3500139a6ac5/future-risks-of-frontier-ai-annex-a.pdf
Author:
The UK Government Office for Science
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This resource, written by Adam Jones at BlueDot Impact, provides a comprehensive overview of the existing and anticipated risks of AI. As you're going through the reading, consider what different futures might look like should different combinations of risks materialize.
Original text:
https://aisafetyfundamentals.com/blog/ai-risks/
Author:
Adam Jones
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This blog post from Holden Karnofsky, Open Philanthropy’s Director of AI Strategy, explains how advanced AI might overpower humanity. It summarizes superintelligent takeover arguments and provides a scenario where human-level AI disempowers humans without achieving superintelligence. As Holden summarizes: “if there's something with human-like skills, seeking to disempower humanity, with a population in the same ballpark as (or larger than) that of all humans, we've got a civilization-level problem."
Original text:
https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/#the-standard-argument-superintelligence-and-advanced-technology
Authors:
Holden Karnofsky
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This blog by Sam Altman, the CEO of OpenAI, provides insight into what AI company leaders are saying and thinking about their reasons for pursuing advanced AI. It lays out how Altman thinks the world will change because of AI and what policy changes he believes we will need to make.
As you’re reading, consider Altman’s position and how it might affect the way he discusses this technology or his policy recommendations.
Original text:
https://moores.samaltman.com
Author:
Sam Altman
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This paper by Ross Gruetzemacher and Jess Whittlestone examines the concept of transformative AI, which significantly impacts society without necessarily achieving human-level cognitive abilities. The authors propose three categories of transformation: Narrowly Transformative AI, affecting specific domains like the military; Transformative AI, causing broad changes akin to general-purpose technologies such as electricity; and Radically Transformative AI, inducing profound societal shifts comparable to the Industrial Revolution.
Note: this resource uses “GPT” to refer to general purpose technologies, which they define as “a technology that initially has much scope for improvement and eventually comes to be widely used.” Keep in mind that this is a different term than a generative pre-trained transformer (GPT), which is a type of large language model used in systems like ChatGPT.
Original text:
https://arxiv.org/pdf/1912.00747.pdf
Authors:
Ross Gruetzemacher and Jess Whittlestone
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This insight report from the World Economic Forum summarizes some positive AI outcomes. Some proposed futures include AI enabling shared economic benefit, creating more fulfilling jobs, or allowing humans to work less – giving them time to pursue more satisfying activities like volunteering, exploration, or self-improvement. It also discusses common problems that prevent people from making good predictions about the future.
Note: this report was released before ChatGPT, which seems to have shifted expert predictions about when AI systems might be broadly capable at completing most cognitive labor (see Section 3 exhibit 6 of the McKinsey resource below). Keep this in mind when reviewing section 1.1.
Original text:
https://www3.weforum.org/docs/WEF_Positive_AI_Economic_Futures_2021.pdf
Authors:
Stuart Russell, Daniel Susskind
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This report from McKinsey discusses the huge potential for economic growth that generative AI could bring, examining key drivers and exploring potential productivity boosts in different business functions. While reading, evaluate how realistic its claims are, and how this might affect the organization you work at (or organizations you might work at in the future).
Original text:
https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier
Authors:
Michael Chui et al.
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
A single sentence can summarize the complexities of modern artificial intelligence: Machine learning systems use computing power to execute algorithms that learn from data. Everything that national security policymakers truly need to know about a technology that seems simultaneously trendy, powerful, and mysterious is captured in those 13 words. They specify a paradigm for modern AI—machine learning—in which machines draw their own insights from data, unlike the human-driven expert systems of the past.
The same sentence also introduces the AI triad of algorithms, data, and computing power. Each element is vital to the power of machine learning systems, though their relative priority changes based on technological developments.
Source:
https://cset.georgetown.edu/wp-content/uploads/CSET-AI-Triad-Report.pdf
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Despite the current popularity of machine learning, I haven’t found any short introductions to it which quite match the way I prefer to introduce people to the field. So here’s my own. Compared with other introductions, I’ve focused less on explaining each concept in detail, and more on explaining how they relate to other important concepts in AI, especially in diagram form. If you're new to machine learning, you shouldn't expect to fully understand most of the concepts explained here just after reading this post - the goal is instead to provide a broad framework which will contextualise more detailed explanations you'll receive from elsewhere. I'm aware that high-level taxonomies can be controversial, and also that it's easy to fall into the illusion of transparency when trying to introduce a field; so suggestions for improvements are very welcome! The key ideas are contained in this summary diagram: First, some quick clarifications: None of the boxes are meant to be comprehensive; we could add more items to any of them. So you should picture each list ending with “and others”. The distinction between tasks and techniques is not a firm or standard categorisation; it’s just the best way I’ve found so far to lay things out. The summary is explicitly from an AI-centric perspective. For example, statistical modeling and optimization are fields in their own right; but for our current purposes we can think of them as machine learning techniques.
Original text:
https://www.alignmentforum.org/posts/qE73pqxAZmeACsAdF/a-short-introduction-to-machine-learning
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
The field of AI has undergone a revolution over the last decade, driven by the success of deep learning techniques. This post aims to convey three ideas using a series of illustrative examples:
I’ll focus on four domains: vision, games, language-based tasks, and science. The first two have more limited real-world applications, but provide particularly graphic and intuitive examples of the pace of progress.
Original article:
https://medium.com/@richardcngo/visualizing-the-deep-learning-revolution-722098eb9c5
Author:
Richard Ngo
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
If you thought the pace of AI development had sped up since the release of ChatGPT last November, well, buckle up. Thanks to the rise of autonomous AI agents like Auto-GPT, BabyAGI and AgentGPT over the past few weeks, the race to get ahead in AI is just getting faster. And, many experts say, more concerning.
Source:
Narrated for AI Safety Fundamentals by TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Specification gaming is a behaviour that satisfies the literal specification of an objective without achieving the intended outcome. We have all had experiences with specification gaming, even if not by this name. Readers may have heard the myth of King Midas and the golden touch, in which the king asks that anything he touches be turned to gold - but soon finds that even food and drink turn to metal in his hands. In the real world, when rewarded for doing well on a homework assignment, a student might copy another student to get the right answers, rather than learning the material - and thus exploit a loophole in the task specification.
Original article:
https://www.deepmind.com/blog/specification-gaming-the-flip-side-of-ai-ingenuity
Authors:
Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Rahtz, Tom Everitt, Ramana Kumar, Zac Kenton, Jan Leike, Shane Legg
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Developments in AI could exacerbate long-running catastrophic risks, including bioterrorism, disinformation and resulting institutional dysfunction, misuse of concentrated power, nuclear and conventional war, other coordination failures, and unknown risks. This document compiles research on how AI might raise these risks. (Other material in this course discusses more novel risks from AI.) We draw heavily from previous overviews by academics, particularly Dafoe (2020) and Hendrycks et al. (2023).
Source:
https://aisafetyfundamentals.com/governance-blog/overview-of-ai-risk-exacerbation
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This page gives an overview of the alignment problem. It describes our motivation for running courses about technical AI alignment. The terminology should be relatively broadly accessible (not assuming any previous knowledge of AI alignment or much knowledge of AI/computer science).
This piece describes the basic case for AI alignment research, which is research that aims to ensure that advanced AI systems can be controlled or guided towards the intended goals of their designers. Without such work, advanced AI systems could potentially act in ways that are severely at odds with their designers’ intended goals. Such a situation could have serious consequences, plausibly even causing an existential catastrophe.
In this piece, I elaborate on five key points to make the case for AI alignment research.
Source:
https://aisafetyfundamentals.com/alignment-introduction
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
I’ve previously argued that machine learning systems often exhibit emergent capabilities, and that these capabilities could lead to unintended negative consequences. But how can we reason concretely about these consequences? There’s two principles I find useful for reasoning about future emergent capabilities:
Using these principles, I’ll describe two specific emergent capabilities that I’m particularly worried about: deception (fooling human supervisors rather than doing the intended task), and optimization (choosing from a diverse space of actions based on their long-term consequences).
Source:
https://bounded-regret.ghost.io/emergent-deception-optimization/
Narrated for AGI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
You may have seen arguments (such as these) for why people might create and deploy advanced AI that is both power-seeking and misaligned with human interests. This may leave you thinking, “OK, but would such AI systems really pose catastrophic threats?” This document compiles arguments for the claim that misaligned, power-seeking, advanced AI would pose catastrophic risks.
We’ll see arguments for the following claims, which are mostly separate/independent reasons for concern:
Source:
Narrated for AGI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Observing from afar, it’s easy to think there’s an abundance of people working on AGI safety. Everyone on your timeline is fretting about AI risk, and it seems like there is a well-funded EA-industrial-complex that has elevated this to their main issue. Maybe you’ve even developed a slight distaste for it all—it reminds you a bit too much of the woke and FDA bureaucrats, and Eliezer seems pretty crazy to you.
That’s what I used to think too, a couple of years ago. Then I got to see things more up close. And here’s the thing: nobody’s actually on the friggin’ ball on this one!
There’s no secret elite SEAL team coming to save the day. This is it. We’re not on track.
Source:
https://www.forourposterity.com/nobodys-on-the-ball-on-agi-alignment/
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
In previous pieces, I argued that there’s a real and large risk of AI systems’ developing dangerous goals of their own and defeating all of humanity - at least in the absence of specific efforts to prevent this from happening. A young, growing field of AI safety research tries to reduce this risk, by finding ways to ensure that AI systems behave as intended (rather than forming ambitious aims of their own and deceiving and manipulating humans as needed to accomplish them).
Maybe we’ll succeed in reducing the risk, and maybe we won’t. Unfortunately, I think it could be hard to know either way. This piece is about four fairly distinct-seeming reasons that this could be the case - and that AI safety could be an unusually difficult sort of science.
This piece is aimed at a broad audience, because I think it’s important for the challenges here to be broadly understood. I expect powerful, dangerous AI systems to have a lot of benefits (commercial, military, etc.), and to potentially appear safer than they are - so I think it will be hard to be as cautious about AI as we should be. I think our odds look better if many people understand, at a high level, some of the challenges in knowing whether AI systems are as safe as they appear.
Source:
https://www.cold-takes.com/ai-safety-seems-hard-to-measure/
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Much has been written framing and articulating the AI governance problem from a catastrophic risks lens, but these writings have been scattered. This page aims to provide a synthesized introduction to some of these already prominent framings. This is just one attempt at suggesting an overall frame for thinking about some AI governance problems; it may miss important things. Some researchers think that unsafe development or misuse of AI could cause massive harms. A key contributor to some of these risks is that catastrophe may not require all or most relevant decision makers to make harmful decisions. Instead, harmful decisions from just a minority of influential decision makers—perhaps just a single actor with good intentions—may be enough to cause catastrophe. For example, some researchers argue, if just one organization deploys highly capable, goal-pursuing, misaligned AI—or if many businesses (but a small portion of all businesses) deploy somewhat capable, goal-pursuing, misaligned AI—humanity could be permanently disempowered. The above would not be very worrying if we could rest assured that no actors capable of these harmful actions would take them. However, especially in the context of AI safety, several factors are arguably likely to incentivize some actors to take harmful deployment actions: Misjudgment: Assessing the consequences of AI deployment may be difficult (as it is now, especially given the nature of AI risk arguments), so some organizations could easily get it wrong—concluding that an AI system is safe or beneficial when it is not. “Winner-take-all” competition: If the first organization(s) to deploy advanced AI is expected to get large gains, while leaving competitors with nothing, competitors would be highly incentivized to cut corners in order to be first—they would have less to lose.
Original text:
https://www.agisafetyfundamentals.com/governance-blog/global-vulnerability
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This primer introduces various aspects of safety standards and regulations for industrial-scale AI development: what they are, their potential and limitations, some proposals for their substance, and recent policy developments. Key points are:
Source:
https://aisafetyfundamentals.com/governance-blog/standards-and-regulations-primer
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through “dangerous capability evaluations”) and the propensity of models to apply their capabilities for harm (through “alignment evaluations”). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
Source:
https://arxiv.org/pdf/2305.15324.pdf
Narrated for AGI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term “frontier AI” models — highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety. Frontier AI models pose a distinct regulatory challenge: dangerous capabilities can arise unexpectedly; it is difficult to robustly prevent a deployed model from being misused; and, it is difficult to stop a model’s capabilities from proliferating broadly. To address these challenges, at least three building blocks for the regulation of frontier models are needed: (1) standard-setting processes to identify appropriate requirements for frontier AI developers, (2) registration and reporting requirements to provide regulators with visibility into frontier AI development processes, and (3) mechanisms to ensure compliance with safety standards for the development and deployment of frontier AI models. Industry self-regulation is an important first step. However, wider societal discussions and government intervention will be needed to create standards and to ensure compliance with them. We consider several options to this end, including granting enforcement powers to supervisory authorities and licensure regimes for frontier AI models. Finally, we propose an initial set of safety standards. These include conducting pre-deployment risk assessments; external scrutiny of model behavior; using risk assessments to inform deployment decisions; and monitoring and responding to new information about model capabilities and uses post-deployment. We hope this discussion contributes to the broader conversation on how to balance public safety risks and innovation benefits from advances at the frontier of AI development.
Source:
https://arxiv.org/pdf/2307.03718.pdf
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Some are concerned that regulating AI progress in one country will slow that country down, putting it at a disadvantage in a global AI arms race. Many proponents of AI regulation disagree; they have pushed back on the overall framework, pointed out serious drawbacks and limitations of racing, and argued that regulations do not have to slow progress down.
Another disagreement is about whether countries are in fact in a neck and neck arms race; some believe that the United States and its allies have a significant lead which would allow for regulation even if that does come at the cost of slowing down AI progress. [1]
This overview uses simple metrics and indicators to illustrate and discuss the state of frontier AI development in different countries — and relevant factors that shape how the picture might change.
Source:
https://aisafetyfundamentals.com/governance-blog/state-of-ai-in-different-countries
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
If governments could regulate the large-scale use of “AI chips,” that would likely enable governments to govern frontier AI development—to decide who does it and under what rules.
In this article, we will use the term “AI chips” to refer to cutting-edge, AI-specialized computer chips (such as NVIDIA’s A100 and H100 or Google’s TPUv4).
Frontier AI models like GPT-4 are already trained using tens of thousands of AI chips, and trends suggest that more advanced AI will require even more computing power.
Source:
https://aisafetyfundamentals.com/governance-blog/primer-on-ai-chips
Narrated for AGI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Introduction On October 7, 2022, the Biden administration announced a new export controls policy on artificial intelligence (AI) and semiconductor technologies to China. These new controls—a genuine landmark in U.S.-China relations—provide the complete picture after a partial disclosure in early September generated confusion. For weeks the Biden administration has been receiving criticism in many quarters for a new round of semiconductor export control restrictions, first disclosed on September 1. The restrictions block leading U.S. AI computer chip designers, such as Nvidia and AMD, from selling their high-end chips for AI and supercomputing to China. The criticism typically goes like this: China’s domestic AI chip design companies could not win customers in China because their chip designs could not compete with Nvidia and AMD on performance. Chinese firms could not catch up to Nvidia and AMD on performance because they did not have enough customers to benefit from economies of scale and network effects.
Source:
https://www.csis.org/analysis/choking-chinas-access-future-ai
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Push AI forward too fast, and catastrophe could occur. Too slow, and someone else less cautious could do it. Is there a safe course?
Source:
https://www.cold-takes.com/racing-through-a-minefield-the-ai-deployment-problem/
Crossposted from the Cold Takes Audio podcast.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Historically, progress in the field of cryptography has been enormously consequential. Over the past century, for instance, cryptographic discoveries have played a key role in a world war and made it possible to use the internet for business and private communication. In the interest of exploring the impact the field may have in the future, I consider a suite of more recent developments. My primary focus is on blockchain-based technologies (such as cryptocurrencies) and on techniques for computing on confidential data (such as secure multiparty computation). I provide an introduction to these technologies that assumes no mathematical background or previous knowledge of cryptography. Then, I consider several speculative predictions that some researchers and engineers have made about the technologies’ long-term political significance. This includes predictions that more “privacy-preserving” forms of surveillance will become possible, that the roles of centralized institutions ranging from banks to voting authorities will shrink, and that new transnational institutions known as “decentralized autonomous organizations” will emerge. Finally, I close by discussing some challenges that are likely to limit the significance of emerging cryptographic technologies. On the basis of these challenges, it is premature to predict that any of them will approach the transformativeness of previous technologies. However, this remains a rapidly developing area well worth following.
Source:
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
As advanced machine learning systems’ capabilities begin to play a significant role in geopolitics and societal order, it may become imperative that (1) governments be able to enforce rules on the development of advanced ML systems within their borders, and (2) countries be able to verify each other’s compliance with potential future international agreements on advanced ML development. This work analyzes one mechanism to achieve this, by monitoring the computing hardware used for large-scale NN training. The framework’s primary goal is to provide governments high confidence that no actor uses large quantities of specialized ML chips to execute a training run in violation of agreed rules. At the same time, the system does not curtail the use of consumer computing devices, and maintains the privacy and confidentiality of ML practitioners’ models, data, and hyperparameters. The system consists of interventions at three stages: (1) using on-chip firmware to occasionally save snapshots of the the neural network weights stored in device memory, in a form that an inspector could later retrieve; (2) saving sufficient information about each training run to prove to inspectors the details of the training run that had resulted in the snapshotted weights; and (3) monitoring the chip supply chain to ensure that no actor can avoid discovery by amassing a large quantity of un-tracked chips. The proposed design decomposes the ML training rule verification problem into a series of narrow technical challenges, including a new variant of the Proof-of-Learning problem [Jia et al. ’21.]
Source:
https://arxiv.org/pdf/2303.11341.pdf
Narrated for AGI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
The following excerpts summarize historical case studies that are arguably informative for AI governance. The case studies span nuclear arms control, militaries’ adoption of electricity, and environmental agreements. (For ease of reading, we have edited the formatting of the following excerpts and added bolding.)
Source:
https://aisafetyfundamentals.com/governance-blog/historical-case-studies
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
International institutions may have an important role to play in ensuring advanced AI systems benefit humanity. International collaborations can unlock AI’s ability to further sustainable development, and coordination of regulatory efforts can reduce obstacles to innovation and the spread of benefits. Conversely, the potential dangerous capabilities of powerful and general-purpose AI systems create global externalities in their development and deployment, and international efforts to further responsible AI practices could help manage the risks they pose. This paper identifies a set of governance functions that could be performed at an international level to address these challenges, ranging from supporting access to frontier AI systems to setting international safety standards. It groups these functions into four institutional models that exhibit internal synergies and have precedents in existing organizations: 1) a Commission on Frontier AI that facilitates expert consensus on opportunities and risks from advanced AI, 2) an Advanced AI Governance Organization that sets international standards to manage global threats from advanced models, supports their implementation, and possibly monitors compliance with a future governance regime, 3) a Frontier AI Collaborative that promotes access to cutting-edge AI, and 4) an AI Safety Project that brings together leading researchers and engineers to further AI safety research. We explore the utility of these models and identify open questions about their viability.
Source:
https://arxiv.org/pdf/2307.04699.pdf
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
We’ve created OpenAI LP, a new “capped-profit” company that allows us to rapidly increase our investments in compute and talent while including checks and balances to actualize our mission.
The original text contained 1 footnote which was omitted from this narration.
---
Source:
https://openai.com/blog/openai-lp---
Narrated by TYPE III AUDIO.
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Our Charter describes the principles we use to execute on OpenAI’s mission.
---
Source:
https://openai.com/charter---
Narrated by TYPE III AUDIO.
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
I’ve been writing about tangible things we can do today to help the most important century go well. Previously, I wrote about helpful messages to spread and how to help via full-time work.
This piece is about what major AI companies can do (and not do) to be helpful. By “major AI companies,” I mean the sorts of AI companies that are advancing the state of the art, and/or could play a major role in how very powerful AI systems end up getting used.1
This piece could be useful to people who work at those companies, or people who are just curious.
Generally, these are not pie-in-the-sky suggestions - I can name2 more than one AI company that has at least made a serious effort at each of the things I discuss below (beyond what it would do if everyone at the company were singularly focused on making a profit).3
I’ll cover:
Source:
https://www.cold-takes.com/what-ai-companies-can-do-today-to-help-with-the-most-important-century/
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
If you fear that someone will build a machine that will seize control of the world and annihilate humanity, then one kind of response is to try to build further machines that will seize control of the world even earlier without destroying it, forestalling the ruinous machine’s conquest. An alternative or complementary kind of response is to try to avert such machines being built at all, at least while the degree of their apocalyptic tendencies is ambiguous.
The latter approach seems to me like the kind of basic and obvious thing worthy of at least consideration, and also in its favor, fits nicely in the genre ‘stuff that it isn’t that hard to imagine happening in the real world’. Yet my impression is that for people worried about extinction risk from artificial intelligence, strategies under the heading ‘actively slow down AI progress’ have historically been dismissed and ignored (though ‘don’t actively speed up AI progress’ is popular).
Source:
https://www.lesswrong.com/posts/uFNgRumrDTpBfQGrs/let-s-think-about-slowing-down-ai
Narrated for AGI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
About two years ago, I wrote that “it’s difficult to know which ‘intermediate goals’ [e.g. policy goals] we could pursue that, if achieved, would clearly increase the odds of eventual good outcomes from transformative AI.” Much has changed since then, and in this post I give an update on 12 ideas for US policy goals[1]Many […]
The original text contained 7 footnotes which were omitted from this narration.
---
Source:
https://www.openphilanthropy.org/research/12-tentative-ideas-for-us-ai-policy---
Narrated by TYPE III AUDIO.
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
I carried out a short project to better understand talent needs in AI governance. This post reports on my findings.
How this post could be helpful:
Source:
https://aisafetyfundamentals.com/governance-blog/some-talent-needs-in-ai-governance
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
People who want to improve the trajectory of AI sometimes think their options for object-level work are (i) technical safety work and (ii) non-technical governance work. But that list misses things; another group of arguably promising options is technical work in AI governance, i.e. technical work that mainly boosts AI governance interventions. This post provides a brief overview of some ways to do this work—what they are, why they might be valuable, and what you can do if you’re interested. I discuss:
Source:
https://aisafetyfundamentals.com/governance-blog/ai-governance-needs-technical-work
Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
(Last updated August 31, 2022) Summary and Introduction One potential way to improve the impacts of AI is helping various actors figure out good AI strategies—that is, good high-level plans focused on AI. To support people who are interested in that, we compile some relevant career i
---
Source:
https://aisafetyfundamentals.com/governance-blog/ai-strategy-careers---
Narrated by TYPE III AUDIO.
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Expertise in China and its relations with the world might be critical in tackling some of the world’s most pressing problems. In particular, China’s relationship with the US is arguably the most important bilateral relationship in the world, with these two countries collectively accounting for over 40% of global GDP.1 These considerations led us to publish a guide to improving China–Western coordination on global catastrophic risks and other key problems in 2018. Since then, we have seen an increase in the number of people exploring this area.
China is one of the most important countries developing and shaping advanced artificial intelligence (AI). The Chinese government’s spending on AI research and development is estimated to be on the same order of magnitude as that of the US government,2 and China’s AI research is prominent on the world stage and growing.
Because of the importance of AI from the perspective of improving the long-run trajectory of the world, we think relations between China and the US on AI could be among the most important aspects of their relationship. Insofar as the EU and/or UK influence advanced AI development through labs based in their countries or through their influence on global regulation, the state of understanding and coordination between European and Chinese actors on AI safety and governance could also be significant.
That, in short, is why we think working on AI safety and governance in China and/or building mutual understanding between Chinese and Western actors in these areas is likely to be one of the most promising China-related career paths. Below we provide more arguments and detailed information on this option.
If you are interested in pursuing a career path described in this profile, contact 80,000 Hours’ one-on-one team and we may be able to put you in touch with a specialist advisor.
Source:
https://80000hours.org/career-reviews/china-related-ai-safety-and-governance-paths/
Narrated for AGI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This is a quickly written post listing opportunities for people to apply for funding from funders that are part of the EA community. …
---
First published:
October 26th, 2021
Source:
https://forum.effectivealtruism.org/posts/DqwxrdyQxcMQ8P2rD/list-of-ea-funding-opportunities---
Narrated by TYPE III AUDIO.
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
This post summarizes the way I currently think about career choice for longtermists. I have put much less time into thinking about this than 80,000 Hours, but I think it’s valuable for there to be multiple perspectives on this topic out there.
Edited to add: see below for why I chose to focus on longtermism in this post.
While the jobs I list overlap heavily with the jobs 80,000 Hours lists, I organize them and conceptualize them differently. 80,000 Hours tends to emphasize “paths” to particular roles working on particular causes; by contrast, I emphasize “aptitudes” one can build in a wide variety of roles and causes (including non-effective-altruist organizations) and then apply to a wide variety of longtermist-relevant jobs (often with options working on more than one cause). Example aptitudes include: “helping organizations achieve their objectives via good business practices,” “evaluating claims against each other,” “communicating already-existing ideas to not-yet-sold audiences,” etc.
(Other frameworks for career choice include starting with causes (AI safety, biorisk, etc.) or heuristics (“Do work you can be great at,” “Do work that builds your career capital and gives you more options.”) I tend to feel people should consider multiple frameworks when making career choices, since any one framework can contain useful insight, but risks being too dogmatic and specific for individual cases.)
For each aptitude I list, I include ideas for how to explore the aptitude and tell whether one is on track. Something I like about an aptitude-based framework is that it is often relatively straightforward to get a sense of one’s promise for, and progress on, a given “aptitude” if one chooses to do so. This contrasts with cause-based and path-based approaches, where there’s a lot of happenstance in whether there is a job available in a given cause or on a given path, making it hard for many people to get a clear sense of their fit for their first-choice cause/path and making it hard to know what to do next. This framework won’t make it easier for people to get the jobs they want, but it might make it easier for them to start learning about what sort of work is and isn’t likely to be a fit.
Source:
https://forum.effectivealtruism.org/posts/bud2ssJLQ33pSemKH/longtermist-career-choice
Narrated for AI Safety Fundamentalsby TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
People who want to improve the trajectory of AI sometimes think their options for object-level work are (i) technical safety work and (ii) non-technical governance work. But that list misses things; another group of arguably promising options is technical work in AI governance, i.e. technical work that mainly boosts AI governance interventions. This post provides a brief overview of some ways to do this work—what they are, why they might be valuable, and what you can do if you’re interested. I discuss: Engineering technical levers to make AI coordination/regulation enforceable (through hardware engineering, software/ML engineering, and heat/electromagnetism-related engineering) Information security Forecasting AI development Technical standards development Grantmaking or management to get others to do the above well Advising on the above.
Original text:
https://forum.effectivealtruism.org/posts/BJtekdKrAufyKhBGw/ai-governance-needs-technical-work
Narrated for AI Safety Fundamentals by TYPE III AUDIO.
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
En liten tjänst av I'm With Friends. Finns även på engelska.