50 Critically Evaluating GenAI Outputs
Imagine you’re rushing to finish a history essay and ask AI to write it, only to find it’s filled with fake quotes. Or you’re in a job, trusting AI to find legal cases, then discovering they don’t exist. These are actual risks that happen daily when using generative AI (GenAI) tools like ChatGPT, which produce extremely convincing–but often flawed–answers. As a college student, you’ll be confronted with many dilemmas that force you to choose whether or not to seek AI assistance. These are powerful technologies but can be unwieldy; they’re not like library databases that deliver consistent results based on precise keyword searches. As a separate chapter in this textbook covered, they generate text based on probabilistic patterns, which can lead to errors, biases, or generic responses that miss your needs.
This chapter introduces the ABC-E framework (Accuracy, Bias, Context, Ethics) to help you evaluate AI outputs critically, ensuring your work is responsible, vetted for trustworthiness, and respects academic integrity.
A brief look at two scenarios
Let’s look closely at two common situations where students might use GenAI:
Scenario 1: The Philosophy Assignment
Your philosophy instructor assigns a discussion prompt for the week: “What makes life meaningful to you? What would Aristotle say about that?” Pressed for time, you decide to consult an AI tool. Here’s the resulting conversation:
You: Write a two-paragraph reflection on what makes life meaningful to me and what Aristotle would say about it.
AI: Life finds its deepest meaning through personal growth. For me, meaning emerges when I pursue knowledge and develop my inborn talents. There’s something profound about then leveraging these talents when contributing to something larger than myself, whether through creative work, community service, or participating in causes that align with my values. This sense of purpose transforms ordinary experiences into extraordinary opportunities for growth and impact.
Aristotle would largely endorse this view through his concept of eudaimonia or “flourishing.” In his Nicomachean Ethics (Book I, Chapter 7), Aristotle argues that the highest good comes from living virtuously and developing one’s potential. He wrote, “The function of man is to live a certain kind of life, and this activity implies a rational principle, and the function of a good man is the good and noble performance of these.” Aristotle would appreciate my focus on personal development but would emphasize that true happiness comes specifically from exercising rational capacities and moral virtues in a balanced life, what he termed the “golden mean” between excess and deficiency.
Scenario 2: The Legal Brief Disaster
In a few years, at your first job as a paralegal at a law firm, you’re assisting an attorney who needs to file a brief by tomorrow. The attorney asks you to find relevant case precedents for an aviation injury case. Here’s the resulting conversation:
You: Find legal precedents for cases where airlines were held liable for injuries caused by turbulence when the seatbelt sign was off.
AI: Here are several relevant legal precedents regarding airline liability for turbulence injuries when the seatbelt sign was off:
- Varghese v. China Southern Airlines (2019) – The Second Circuit Court held that the airline was liable for a passenger’s spinal injury during unexpected turbulence when the seatbelt sign had been turned off 10 minutes earlier. The court ruled that the flight crew should have anticipated the turbulence based on weather reports.
- Martinez v. Delta Airlines (2015) – The Ninth Circuit established a “reasonable foreseeability” standard, finding Delta liable when a passenger suffered a concussion during clear air turbulence. The court cited the airline’s failure to adequately warn passengers despite meteorological data suggesting turbulence risk.
- Blackwood v. American Airlines (2021) – The court determined that while the seatbelt sign was off, the cabin crew’s verbal warnings about possible turbulence were insufficient, making the airline liable for a passenger’s broken arm.
These cases establish that airlines maintain a duty of care regarding turbulence warnings even when seatbelt signs are off, particularly when turbulence could have been reasonably anticipated through available forecasting tools.
Think carefully: How would you evaluate these two scenarios? Are the AI responses good or bad? How do you know?
Evaluating AI interactions with the ABC-E Framework
When using AI tools like ChatGPT for your assignments or work tasks, it’s important to think critically about the answers they give you. AI systems aren’t like searching on Google or using a library database that finds verified facts. They’re prediction engines that create text based on patterns they’ve learned. This makes them powerful but also risky and often flawed.
Below is simple framework, ABC-E, that can help you evaluate AI outputs and use these tools more effectively and responsibly. We’ll use the philosophy assignment and legal brief scenarios you just read about to show each part in action.
A: Accuracy
Key question: Is this information accurate and truthful?
GenAI tools can produce convincing answers, but they sometimes (or very often!) fabricate information that sounds real but is completely false, a phenomenon called “hallucination” (sometimes “confabulation” or “fabrication”). Hallucination happens because AI predicts text based on patterns from the training data, leading it to generate plausible-sounding details that aren’t grounded in reality. The responses sound incredibly persuasive despite being misleading or completely false. For example, the chatbot might invent a historical event, a scientific fact, or a person’s name, presenting it with confidence that can mislead you if you don’t verify. Hallucinations can take several forms: fabricating entire sources or quotes, creating false explanations that seem logical, or mixing accurate and inaccurate details to make errors hard to spot. In a college setting, these mistakes could derail your assignments, like a history paper citing nonexistent battles or a lab report with made-up data. These errors could also lead to worse performance on an exam. Strong evaluation catches these errors by checking facts against trusted sources.
What to look for:
- Facts that can be verified (names, dates, statistics, quotes)
- Claims that seem too perfect or convenient
- Information that seems detailed but lacks specific sources
Example from Scenario 1 (Philosophy Assignment)
The AI wrote: “In his Nicomachean Ethics (Book I, Chapter 7), Aristotle argues that the highest good comes from living virtuously and developing one’s potential. He wrote, ‘The function of man is to live a certain kind of life, and this activity implies a rational principle, and the function of a good man is the good and noble performance of these.'”
This sounds impressive and specific, it even cites Book I, Chapter 7! But this exact quote doesn’t exist in Aristotle’s work. The AI made it up even though it sounds real. It has the Aristotle vibe, you might say, without actually being from Aristotle.
Example from Scenario 2 (Legal Brief)
The AI listed cases like “Varghese v. China Southern Airlines (2019)” and “Martinez v. Delta Airlines (2015)” with specific details about court rulings. These cases don’t exist! The AI invented them completely, which could get a legal professional in serious trouble.
This actually happened. In 2023, attorneys were sanctioned for including fake court cases generated by ChatGPT. In this case, attorneys submitted legal briefs that cited non-existent cases because they didn’t realize ChatGPT would fabricate them. The judge penalized them $5,000 and required them to notify their clients.
How to check accuracy:
- Cross-check facts with trusted sources like textbooks, academic journals, or verified websites
- For quotes, find the original source
- Be extra careful with statistics, legal cases, and historical claims
B: Bias
Key question: Whose perspectives are represented or missing?
AI systems learn from vast amounts of internet text, which often embeds societal biases, causing AI to favor certain viewpoints or groups while sidelining others. These biases can include racial bias, where AI might prioritize perspectives from certain racial groups. Gender bias can manifest as AI defaulting to male-centric narratives or stereotypes, such as describing engineers as men or emphasizing caregiving roles for women. Viewpoint bias (leaning politically left, right, libertarian, or otherwise) can skew AI’s take on issues, like favoring progressive solutions for climate change or conservative stances on economic policy, depending on its training data. Worldview bias might tilt toward secular or religious perspectives, such as presenting scientific explanations without acknowledging spiritual beliefs or assuming a secular definition of morality. These biases can misrepresent complex topics in your assignments, like a sociology paper or history presentation, leading to one-sided arguments that miss the diversity of thought needed for college work. Students who outsource their thinking to these models may end up promoting beliefs and worldviews that conflict with their own.
What to look for:
- One-sided perspectives
- Cultural assumptions that might not match your experience
- Missing viewpoints from certain groups or traditions
Example from Scenario 1 (Philosophy Assignment)
The AI’s response focused on a very individualistic view of what makes life meaningful (personal growth, developing talents). It didn’t mention other cultural perspectives that might value family connections, spiritual harmony, or community contribution above individual achievement.
Example from Scenario 2 (Legal Brief)
The AI only provided examples where airlines were found liable for injuries. It didn’t mention any cases where airlines were NOT held responsible, creating a one-sided view of how courts typically rule on these matters.
How to address bias:
- Ask yourself: “What perspectives might be missing here?”
- Specifically request diverse viewpoints when using AI
- Consider how your own background and experiences differ from what the AI presents
C: Context/Relevance
Key question: Does this directly address my specific needs?
Context and relevance means ensuring AI’s response fits the unique requirements of your task, reflecting the specific circumstances and goals you’re working toward. AI doesn’t automatically understand your personal situation, assignment details, or professional environment unless you provide clear guidance. Relevant contexts often include the rhetorical situation: the genre (e.g., essay, presentation), audience (e.g., professor, classmates), and purpose (e.g., persuade, inform). These elements shape how your work should sound and what it should achieve. Depending on the task, context may involve personal details, like your life experiences or values, or the course you’re taking, such as a biology class requiring scientific terms or a history class needing historical accuracy. For employees, context could mean the business they’re working at, like a marketing firm needing brand-specific language or a law office requiring precise legal standards. A response that ignores these contexts risks being generic, off-topic, or mismatched. Strong evaluation tailors AI’s output to your task; accepting vague answers wastes time and weakens your work.
What to look for:
- Generic responses that could apply to anyone (specificity remains a sticky problem for chatbots)
- Examples that don’t fit your field of study
- Language that’s too complex or simple for your needs
Example from Scenario 1 (Philosophy Assignment)
The AI created a generic reflection about life’s meaning that could apply to anyone. It wasn’t authentically personal, saying things like “For me, meaning emerges when I pursue knowledge…” It doesn’t know you or what actually gives your life meaning, of course, so it generated a realistic but generic response.
Example from Scenario 2 (Legal Brief)
The AI didn’t understand the specific details of the actual case the paralegal was working on, the jurisdiction it was in, or what legal standards applied. It provided general information rather than precisely what was needed for this particular brief.
How to improve context:
- Be very specific in your questions to AI
- Provide relevant details and information related to your task requirements (upload .pdf files, policy documents, etc.)
- Ask for examples relevant to your field of study
- Customize AI responses to fit your unique situation
E: Ethics (including Privacy and Security)
Key question: Am I using AI responsibly and transparently?
Using GenAI requires you to navigate multiple ethical layers that go beyond simply getting the right answer. It’s never simply about increasing your productivity. There are many layers here:
- Academic integrity requires ensuring that your work reflects your own ideas and effort, avoiding misconduct by not passing off AI-generated content as yours.
- Professional (and academic) responsibility demands accountability for AI outputs, especially in workplaces where errors could harm clients or credibility.
- Skill development is crucial: over-relying on AI can weaken your critical thinking or writing abilities, which are essential for college and career success. You should consider the long-term impact of outsourcing certain tasks.
- Privacy involves protecting sensitive information. Many AI tools process data in the cloud, potentially exposing personal details like your name, health, or academic work. How comfortable are you with sharing certain information? What information would others want you to keep private?
- Security means safeguarding data from breaches or misuse, particularly when using AI for confidential tasks, like a business report or legal document.
These layers matter in every AI interaction, whether you’re drafting a history essay or preparing a marketing pitch. Ignoring them risks academic penalties, professional setbacks, or data vulnerabilities, as seen in the chapter’s scenarios. The stakes can be high.
What to consider:
- Are you presenting AI work as your own?
- Does your use of AI align with your instructor’s or workplace policies?
- Are you still developing your own skills and understanding?
- What information should you keep private?
- How does your use of certain information with AI platforms impact others?
Example from Scenario 1 (Philosophy Assignment)
If the student submitted the AI-written reflection as their own work without mentioning they used AI, this would likely violate academic integrity policies. The reflection wasn’t their own thinking about what makes life meaningful and the quote attributed to Aristotle wasn’t real.
Example from Scenario 2 (Legal Brief)
The paralegal would be professionally irresponsible if they included the made-up legal cases in a brief without verifying them. As mentioned above, this actually happened in real life in 2023, when lawyers were sanctioned for submitting fake AI-generated citations in court. This affected the outcomes of the case and hurt their clients.
How to use AI ethically and responsibly:
- Be transparent about your use of AI when submitting work
- Make sure you understand and can explain any AI-generated content you use
- Follow your school’s or workplace’s policies on AI use
Putting It All Together: A 5-Step Guide to Smarter AI Use
This final section will suggest a workflow based on the framework above, focusing on a student who’s preparing for a lab report. GenAI can support your academic and professional work, but it’s not a replacement for your own thinking. To ensure outputs are accurate, nuanced, relevant, and ethical, we recommend using the ABC-E Check within a workflow split into three phases: before, during, and after AI interaction. These guiding questions help you critically evaluate AI in educational (e.g., assignments, research) and professional (e.g., reports, presentations) contexts.
Phase 1: Pre-Chatbot Interaction (Prepare Thoughtfully)
Step 1: Define Your Goals and Ethical Situation
Guiding Questions:
-
What specific goal am I using AI to achieve (e.g., brainstorming ideas, summarizing sources)?
-
What are the AI policies in my school or workplace, and how will I comply?
-
How will I ensure AI supports my work without replacing my own effort or voice?
-
Am I sharing sensitive data (e.g., personal or confidential information) that could risk privacy or security?
Action: Outline your task’s purpose and ethical boundaries. Write a clear prompt specifying your needs and context, and commit to disclosing AI use transparently.
Phase 2: Chatbot Interaction (Engage Critically)
Step 2: Craft Targeted Prompts
Guiding Questions:
-
Is my prompt specific enough to get relevant, useful results?
-
Have I included context (e.g., academic level, field, or professional setting) to tailor the output?
-
Am I asking for diverse perspectives to avoid one-sided responses?
-
Does my prompt align with my task’s requirements and goals?
Action: Test your prompt and refine it if the output is vague or off-topic. Request sources or clarifications to strengthen the response.
Step 3: Monitor for Issues
Guiding Questions:
-
Do the AI’s claims or sources seem questionable or too good to be true?
-
Is the response leaning toward a single perspective or missing key viewpoints?
-
Does the output feel generic or mismatched to my needs?
-
Are there signs of fabricated details (e.g., overly specific but unverified facts)?
Action: Ask follow-up questions (e.g., “What’s the source?” or “Include other perspectives”) to probe reliability. Flag dubious outputs for later verification.
Phase 3: Post-Chatbot Interaction (Evaluate and Refine)
Step 4: Verify Accuracy and Balance
Guiding Questions:
-
Can I confirm the AI’s facts using trusted sources (e.g., journals, databases, manuals)?
-
Is the output free of fabricated information, like invented sources or data?
-
Does it represent diverse perspectives, or are certain viewpoints missing?
-
How can I supplement the output to ensure a balanced, comprehensive result?
Action: Cross-check facts with authoritative sources. Seek additional resources to address biases or gaps in the output.
Step 5: Ensure Relevance and Ethics
Guiding Questions:
-
Does the output align with my task’s specific requirements and context?
-
Have I used AI as a tool to support, not replace, my own analysis and writing?
-
Am I following my school’s or workplace’s AI policies, including disclosing AI use?
-
Does relying on AI risk weakening my skills or undermining my learning goals?
Action: Rewrite AI-informed content in your own words to reflect your understanding. Disclose AI use per guidelines, and ensure the final work is relevant, original, and supports your growth.
The ABC-E Check empowers you to use AI responsibly in any academic or professional setting. By preparing with clear goals, prompting critically, monitoring issues, verifying outputs, and prioritizing your own voice, you’ll better produce work that’s accurate and responsible.