49 Beyond Text: How AI Tools and Multimodal Capabilities Are Transforming Student Workflows
Introduction
Picture two students preparing for a midterm in their first-year college course in biology:
Student 1, Maya, relies on traditional note-taking strategies. During lectures, she sits near the front, her notebook open, diligently handwriting key points from the professor’s slides and explanations. She uses colored pens to highlight important terms, like “mitosis” or “photosynthesis,” and draws diagrams to visualize complex processes. After class, Maya reviews her notes, rewriting them into concise summaries on index cards for flashcards. She organizes her study sessions with a paper planner, blocking out time to review chapters and practice problems from the textbook. She creates physical flashcards to help study for the midterm and final. Her approach is tactile, structured, and deliberate, built on methods she’s honed since high school.
Student 2, Liam, leverages a suite of AI tools to manage his workflow. During the same biology lecture, sitting across from Maya, he records the session using an app powered by OpenAI’s Whisper, which transcribes the lecture in real-time into searchable text. After class, he uploads the transcript to Google’s NotebookLM, which generates a summary and organizes key concepts into a study guide. To help organize his study sessions, he asks ChatGPT to review his schedule and suggest the best times for studying and then create an updated version. Liam also uses a custom chatbot he created with ChatGPT to quiz himself on biology terms, tailoring questions to his weak areas and receive feedback on how thorough his responses are. For additional exam prep, he uploads all of his course readings (as pdf, pptx, and doc files) into Google’s Notebook LM and asks it to create flash cards. Liam’s workflow is mostly digital and AI-infused, allowing him to process information quickly and adaptively.
Which student is going to learn more? How do you know that?
This chapter isn’t going to argue that Liam’s approach is better or more productive than Maya’s. Both students are effective in their own ways, but they do face different choices and tradeoffs. Maya’s traditional methods offer simplicity and deep engagement with the material, but they can be time-intensive and less flexible for managing large volumes of information. Liam’s AI-driven workflow saves time and enhances accessibility, but it requires technical know-how, critical evaluation of AI outputs (Maya doesn’t have to worry about AI hallucinations!), and careful navigation of academic integrity policies. What unites both students, however, is the need to effectively manage information and tools, whether pen and paper, digital apps, or AI tools and models. As AI technologies reshape learning and work, students (and workers) increasingly face decisions about how to integrate these tools into their processes while balancing efficiency and ethics. Increasingly, students will need to make choices about how they want to learn.
Here’s what this chapter will cover:
- Generative vs. Assistive AI: Learn the difference between AI that creates new content (generative) and AI that supports your work (assistive) and why this matters for academic integrity and effective study habits.
- Speech-to-Text (STT): Explore how STT tools, like transcribing lectures into searchable notes, make learning more accessible and support brainstorming, with tips for finding reliable solutions.
- Text-to-Speech (TTS): Discover how TTS turns text into natural-sounding audio for studying on the go or improving accessibility, including where to find these tools.
- Multimodal AI: See how AI platforms combine text, images, audio, and data to enhance tasks like analyzing charts or creating visuals.
- AI Research Tools: Examine AI-powered search engines and research assistants, including Deep Research modes for complex projects, and learn to verify sources for credible work.
- GenAI in Productivity Suites: Explore how AI in Microsoft 365 and Google Workspace aids writing, data analysis, and presentations, but why these features may not always be the best choice.
- Ethical and Productivity Guide: Get practical steps to choose AI tools thoughtfully, balancing academic integrity, privacy, and learning impact to support your academic success.
Throughout, your skills in critical evaluation (covered in a separate chapter) remain essential. These tools offer powerful ways to enhance learning, but they demand informed, ethical engagement. Again, this chapter will not encourage or discourage their use! The goal is to inform you about technologies reshaping education and the workplace so you can make thoughtful choices.
Generative AI vs. Assistive AI
When you think of AI, tools like ChatGPT probably come to mind, models that can write essays or tackle complex questions. While those generative AI tools are impressive, there’s another type of AI that’s quietly changed how students (and professionals) learn and work for more than a decade: assistive AI. Unlike generative AI, assistive AI doesn’t create new content from scratch. Instead, it helps you manage and interact with information in smarter ways. A great example is speech-based AI, which can transform how you take notes or study.
To understand speech-based AI, it helps to know the difference between the two main types of AI tools. Generative AI (GenAI), like ChatGPT, creates new content, such as drafting an email, generating ideas for a paper, or writing code based on your prompts. Assistive AI (AAI), on the other hand, supports you by working with existing information. It might check your grammar, organize your notes, or, as we’ll explore, handle speech or vision-related tasks.
Assistive AI has been embedded in productivity tools for over a decade, helping students refine their work. Grammarly, launched in 2009, uses AI to analyze your writing and suggest improvements, like fixing grammar errors, clarifying sentences, or adjusting tone. For example, it might catch a run-on sentence or suggest a stronger word to make your essay sound more polished. Microsoft Word has offered assistive AI features since the early 2000s, starting with basic spell-check and grammar tools powered by early AI algorithms. Today, Word’s Editor feature uses advanced AI to provide detailed suggestions, such as improving readability or flagging overly complex phrases. Google Docs, introduced in 2006, also incorporates assistive AI, like its grammar and spelling checker, which suggests real-time corrections as you type, and features like Smart Compose, which predicts and offers sentence completions to speed up drafting. These tools don’t generate new ideas or content like ChatGPT; instead, they work with your writing to make it clearer and more effective, acting like a digital writing coach.
Speech-to-Text (STT)
Imagine sitting in a lecture and having every word the professor says turned into a text document you can search, review, or study later. That’s what modern Speech-to-Text (STT) technology can do. Unlike the somewhat clunky dictation tools from the 1990s, like Dragon NaturallySpeaking, today’s STT systems, such as OpenAI’s Whisper (released in 2022), are incredibly accurate. They can handle different accents, technical terms, and even background noise.
Whisper is an open-source tool, meaning developers have built it into apps you might already use, like Zoom for live captions or note-taking tools like Otter.ai and Notion. With these apps, you can record a lecture and get a written transcript to study from. You can search for specific terms, like “cell division” in a biology class, or even use the transcript with other AI tools to create summaries (just be sure to double-check those summaries for accuracy). Many Whisper-based tools can work offline on your device, keeping your recordings private, which is especially important for sensitive discussions. Since it’s open source, you can even download Whisper to your computer and use it for private use, not connected to the internet.
Making learning more accessible: STT tools can make classes more accessible. For example, a student with dyslexia might use an app powered by Whisper (see below) to get a real-time transcript of a lecture. Reading along as the professor speaks can make it easier to follow complex ideas. Later, they can search the transcript for key concepts they want to review. Some apps even allow you to chat with the transcript while recording (“What did I miss?”). A student with a hearing impairment can use STT to get instant captions for videos or audio that lack official subtitles, ensuring they don’t miss out on course content.
Note: Institutions’ disability services typically include guidance around these types of accessibility uses and can help you use AI tools ethically and securely.
Brainstorming out loud: STT is also used as a tool for working through someone’s ideas. Instead of staring at a blank page, you can talk through your ideas for an essay or project and an STT app will turn your words into text. This can feel more natural than typing, letting your thoughts flow freely and capturing your authentic voice. For example, you might verbally outline your response to a history essay prompt about the Civil War and the STT tool will create a rough draft you can build on (this would not violate academic integrity since it’s transcribing—not replacing—your ideas). Keep in mind that these transcripts are just a starting point. You’ll likely need to revise them carefully to meet academic or professional standards.
STT technology opens up new ways to study and create, but it’s not perfect. Always review transcripts for mistakes, especially with technical terms or unclear speech.
How to find STT: You can find STT tools in many places, often in apps or platforms you’re already using. Zoom, Microsoft Teams, and Google Meet offer live captioning powered by STT for virtual classes or meetings. Note-taking apps like Otter.ai, Microsoft OneNote, or Apple Notes can transcribe recordings or live speech. Your smartphone likely has built-in STT features, like dictation in the Notes app on iOS or Google Voice Typing in Docs. There are a bevy of custom apps, such as Superwhisper, that many now use for STT workflows. Superwhisper and others can be useful for getting access to local transcription, so your information remains private and secure (always read the privacy disclosures).
Text-to-Speech (TTS)
Remember those old-school GPS voices that sounded like a robot? That’s an older version of Text-to-Speech (TTS) technology, which turns written words into spoken audio. Today’s TTS is very different. The voices sound much more human, with natural tone and even a hint (or more than a hint) of emotion. Modern AI systems, like those from Microsoft, Amazon, and ElevenLabs, have transformed TTS into a powerful tool for students, making studying more flexible and accessible. Keep in mind these TTS systems do not use the same architecture as ChatGPT (LLMs) and therefore have different benefits and limitations.
TTS is being used by students in practical ways:
- Accessibility: For students with visual impairments or dyslexia, TTS can have significant benefits. Instead of struggling to read dense textbook pages, they can listen to articles or notes read aloud in clear, natural voices. This makes learning easier and less tiring.
- Study on the Go: Turn your class notes, textbook chapters, or research articles into audio. Listen while commuting, working out, or taking a break from screens.
- Creative Projects: Some TTS tools, like those built into AI chatbots, let you hear responses to your questions out loud. You could even use TTS for fun projects, like scripting a debate between historical figures (say, Einstein vs. Newton) and having the AI read it in distinct voices for a class presentation.
How to find TTS: You’re probably already using tools with TTS built in. Microsoft Word has a “Read Aloud” feature (check the Review tab) that reads your documents aloud. Google Docs offers similar accessibility options to listen to your text. There are also standalone apps that tend to offer more robust solutions, like ElevenLabs’ ElevenReader, which can read articles, PDFs, or books in high-quality voices. Plenty of browser extensions let you turn web pages into audio with just a click.
Assistive AI (AAI) and Academic Integrity
Universities are increasingly formulating policies that differentiate between AI-assisted and AI-generated work. Many honor codes now require students to indicate if large language models contributed substantive generated content to an assignment, since that can raise issues of originality and proper attribution. On the other hand, using AI as a spellchecker or minor rephraser is often permitted without special disclosure (treated similar to getting writing center feedback). Knowing whether a tool’s output is considered “AI-generated” or just “AI-guided” can thus determine if and how a student cites or credits it. Some academic journals make a similar distinction, banning AI-generated text but allowing authors to use assistive AI in editing. This means students should learn to identify what type of help they are getting from AI.
Multimodal AI
The assistive AI tools we’ve explored, like Speech-to-Text (STT) and Text-to-Speech (TTS), help you turn lectures into notes or listen to readings, focusing on audio and text. Multimodal AI takes this further, combining multiple types of input and output (text, audio, images, and data) in a single tool. Instead of just typing a question, you can speak to an AI, show it a picture, or upload a dataset, and get a response in text, speech, or both.
Multimodal AI means a single tool can handle different formats at once, unlike older chatbots that only worked with text. For example, you could upload a biology graph to ChatGPT, ask, “What does this show?” and hear a spoken explanation, blending image analysis, text, and speech. This shift, seen in tools like ChatGPT, Google’s Gemini, and others, makes AI more intuitive for students, but it also presents additional challenges and risks.
How Multimodal AI Can Help Students
Understanding Images: Multimodal AI can analyze visuals, saving you time on complex tasks. Upload a chart from a lab report, and the AI might describe trends.
Creating Images: Multimodal AI lets you generate custom images for projects, turning text prompts into visuals. Tools like OpenAI’s DALL·E or MidJourney (and most of the big platforms now) can create images and infographics, such as a diagram to help visualize a process you’re studying or a “futuristic city” for a creative writing assignment. These tools can help interact with information in different ways that may benefit some, BUT the results can be inconsistent or highly flawed.
Speaking and Listening: Multimodal AI supports spoken conversations, great for hands-free use or language practice. Ask, “How do I cite a website in APA style?” while cooking dinner and get a spoken reply. Language learners can practice Spanish by talking to the AI and hearing natural responses. These interactions feel human-like, but audio errors, like misheard words, can occur, especially with accents or background noise. Test the AI’s responses to ensure they make sense. Even better, test the responses with someone who specializes in that area (perhaps your instructor?) to make sure the responses are useful. The sophistication of the model matters here: ChatGPT’s advanced voice options will be better than some freely available options. If you need high accuracy, be selective.
Creating Audio: You can also use multimodal AI to produce audio, enhancing study materials or projects. Google’s NotebookLM can turn notes or PDFs into podcast-style audio overviews, like a dialogue about the Civil War from your history notes. Descript AI lets you create narrations from text, such as a voiceover for a presentation. These audio outputs may mispronounce terms or include factual errors, especially for complex topics. Listen carefully and compare your sources to catch mistakes.
Working with Data: Some multimodal AIs handle datasets, like spreadsheets, letting you ask questions without typing out the data. Upload a CSV file of survey results and the AI might summarize trends, such as student stress levels. This is handy for research projects, but the AI could misread data formats or draw faulty conclusions. Always check the dataset and the AI’s analysis to catch mistakes. Statistical inferences are an area that LLMs currently struggle with, ironically. Do not be persuaded by their confident tone.
AI Research Tools
Research is a key part of college work. AI research tools are changing how students find information, going beyond traditional search engines like Google by providing direct answers with citations. Tools like Perplexity AI, Elicit, and search features in chatbots like ChatGPT, Microsoft Copilot, and Google’s Gemini are reshaping how students (and faculty) interact with information and databases.
Perplexity AI: Quick Answers with Sources
Perplexity AI, launched in 2022, was early to connect LLMs to internet search. Ask something like, “What are the effects of microplastics in drinking water?” and it gives you a short answer, something like, “microplastics may cause inflammation,” with footnotes linking to articles or studies. Click the footnotes to see exactly where the information came from. This makes it easy to verify facts and build a reading list for your paper. Perplexity even has an “Academic” mode that focuses on research papers (from Semantic Search’s database), more suited for digging into scholarly sources.
Elicit: Academic Literature Assistant
Elicit is designed for students and academics (and others) who focus on research. Ask a specific question, like “What causes stress in college students?” and Elicit searches academic databases to find relevant studies. It presents a table with each paper’s title, year, and a snippet from the abstract that might answer your question. It’s somewhat akin to an AI-powered version of Google Scholar that highlights key findings for you. Elicit is often used for literature reviews or to get an overview of how others have discussed a topic.
Chatbot Search Features
Many AI chatbots you might know, like ChatGPT, Google’s Gemini, Microsoft Copilot, and xAI’s Grok, now include web search features. Ask a question and they’ll scour the internet, read articles or reports, and give you a conversational answer with links or footnotes to sources. For example, you could ask, “What are the latest trends in renewable energy?” and get a summary of recent developments, like advances in solar panels, with links to news or research sites. These chatbot search options have become very similar to Perplexity AI.
These tools are flexible, handling everything from current events to literary analysis, but be careful about source quality. A chatbot might cite a trusted journal one moment and a random blog the next. Without precise prompting, the platforms tend to grabs everything they find without always checking if it’s reliable. Build source expectations into your prompt (“focus on academic sources”), but you will still need to evaluate every source in high stakes situations: click the links, read the originals, and decide if they’re credible enough for academic work.
Deep Research Options
Some AI tools offer “Deep Research” modes for tackling complex projects, going beyond quick answers to create detailed reports. You can find this mode in Gemini, Grok, Perplexity AI, ChatGPT, and others.
What’s the difference between “Search” and “Deep Research”? Standard “Search” modes in these tools provide quick answers with citations, perfect for quick questions like “What’s the GPD of Venezuela?” You ask, get a summary, and check the sources. Done. Deep Research modes, however, tend to be used for more intense queries, like a term paper on “How have views about AI in education changed over time?” Instead of one answer, they create multi-page reports by combining information from many sources. The results can be up to 20 pages in length or more. Behind the scenes, these tools use AI agents, specialized programs that act like mini-researchers. These agents search the web, read articles, cross-check data, and organize findings into a report, often in minutes (sometimes taking up to 30 minutes). Deep Research saves hours but can include less reliable sources or miss nuances, so always verify the citations and refine the AI’s focus if the report feels too broad.
When they’re helpful: Deep Research can be useful for getting background on an idea or concept you want to learn more about, almost like building a custom Wikipedia page.
Tips for using AI Research tools
AI research tools can make finding information faster, but they’re only as good as your ability to check their work. Here are some tips:
- Always Verify Sources: In high stakes situations, never copy the AI’s answer. Use the citations to read the original sources and confirm the information is accurate and relevant. Hallucinations will likely be nested in there somewhere!
- Treat AI Like a Starting Point: View AI answers as a quick overview, like a Wikipedia page. Use them to identify key ideas or sources, then go to primary or secondary sources for your paper.
- Watch for Bias or Gaps: AI might present information as fact when there’s actually debate. For example, asking “Is recycling effective?” might get a one-sided answer that skips criticisms. Ask follow-up questions like, “What are the drawbacks of recycling?” to get a fuller picture (or better, ask for multiple perspectives in your prompt). Recall that LLMs produce the most probable response. When doing high-quality research, the answer that appears the most doesn’t always mean it’s truthful or accurate (in fact bad actors can leverage probability to stuff search results with false or misleading information).
- Citations, Source Quality: One positive with AI search engines is they encourage citing sources by providing them. Always preserve those citations in your notes. Some students might be tempted to just use the AI’s prose in their paper. Treat AI answers as you would Wikipedia: a tertiary source to get an overview, not something to copy. Instead, use it to find the primary/secondary sources and then cite those directly after reading them. Also, always check citations! AI-infused search can easily mix up or fabricate sources.
GenAI in Productivity Suites (Microsoft 365, Google Workspace, etc.)
GenAI is increasingly embedded in tools like Microsoft 365 and Google Workspace, offering features such as Microsoft Copilot and Google AI (powered by Gemini) to assist with writing, data analysis, presentations, and communication. These AI assistants are integrated into software you may already use, like Word, Excel, or Gmail, aiming to support academic tasks without requiring a separate AI platform. However, these features can be inconsistent; in fact dedicated AI tools like ChatGPT (or Gemini, etc.) often provide better results. Just because an AI option appears in your productivity suite doesn’t mean it’s the most reliable or effective solution!
Here’s a look at how these tools function, their potential, and the cautions to keep in mind.
For writing, Microsoft Word includes a Copilot feature that can draft text, such as an essay introduction on climate change’s impact on agriculture, based on your prompt. Google Docs offers a similar Gemini-powered “Help me write” tool, which might also summarize a PDF from your Drive if you allow access. These tools can suggest grammar fixes, rephrase sentences, or shorten long drafts, acting like an editor. However, their drafts may lack depth or include errors and they’re not always better than writing your own first draft or using a dedicated AI platform.
In data tasks, Microsoft Excel’s Copilot can analyze data, such as identifying trends in a business project’s sales figures, and may generate charts or formulas. Google Sheets’ Gemini feature offers similar analysis and can organize data into tables for project plans. While these tools aim to simplify complex tasks, they’re not foolproof. For instance, Copilot in Excel might suggest a chart that misrepresents your data or a formula that doesn’t fit your needs. Google Sheets’ AI could misclassify survey responses, requiring manual fixes. Always verify AI-generated data outputs, as errors or “hallucinations” can lead to incorrect conclusions.
For presentations, Microsoft includes a Copilot feature in PowerPoint that can create slides from a prompt or essay, but that doesn’t necessarily mean it’s the best solution. The slides may have generic content or poor layouts. Google Slides’ Gemini feature offers similar slide generation, and both tools can create images, like a solar panel illustration, to add to your slides. However, these AI-generated images may or may not be better options compared to Creative Commons images you can find online, and the design suggestions may not suit your presentation’s needs. Remain aware of your audience when deciding whether to use AI-generated images.
In communication, Microsoft Outlook and Teams, along with Google Gmail and Meet, use AI to draft emails or summarize meetings. For example, Outlook’s Copilot can write an email thanking a professor for an extension, while Google Meet’s Gemini can transcribe discussions or provide translation captions. These features sound convenient, but they come with risks. AI-drafted emails might sound robotic or inappropriate, raising ethical concerns if recipients expect personal communication (for example, your instructor may not be excited to receive an AI-crafted email after reaching out to you about a potential AI-policy violation). Meeting summaries could miss key details or misinterpret action items. You should review these outputs carefully and consider whether manual drafting or note-taking might be more effective in some cases.
For note-taking, Google’s NotebookLM is a powerful tool and includes AI features that let students upload many sources, like lecture notes, PDFs, or YouTube links, to create study aids such as summaries, quizzes, study guides, or even podcast-style audio overviews where AI hosts discuss your content. You can ask questions about your sources and NotebookLM provides answers with citations, useful for reviewing class material or preparing for exams.
Guide for Using AI Tools Ethically, Securely, and to Promote Learning
As you’ve seen, AI tools offer powerful ways to streamline your academic work. But with these tools come complex choices about ethics, privacy, and learning. Should you use AI to draft an essay, transcribe a lecture, or generate a presentation image? How do you balance efficiency with integrity? This concluding guide provides practical steps to help you navigate these decisions, focusing on academic integrity, privacy and security, and whether AI promotes or hinders your learning.
Step 1: Prioritize Academic Integrity
Academic integrity means doing your own work and giving credit where it’s due, even when using AI. Many colleges view AI-generated content, like an essay drafted by ChatGPT, as a potential violation of honor codes if not disclosed. Assistive AI, like grammar checkers or STT for note-taking, is often allowed without special acknowledgment, but generative AI requires transparency.
Ask Yourself: Am I using AI to replace my thinking or to support it? For example, using NotebookLM to summarize your notes for review may support learning, but copying a Gemini-generated essay without revision usually violates academic integrity.
Step 2: Protect Your Privacy and Security
AI tools often process your data and this raises flags around privacy and security. Most tools send data to the cloud where it could be stored or analyzed, while others work locally on your device, offering more control. Consider how much data you’re comfortable sharing and choose tools that match your comfort level. Note that working as an employee usually involves very strict security policies that prohibit some of many of these tools.
You may want to opt for tools with local processing when possible, especially for sensitive tasks. For example, a local TTS tool, like Microsoft’s built-in Read Aloud in Word, keeps your textbook readings on your device, ideal if you’re concerned about privacy and want to reduce screen time. For cloud-based tools, like Google’s NotebookLM, review their privacy policies to understand data storage and sharing. You may want to avoid sharing personal details (e.g., names, health information) in prompts and use tools with strong security, like Microsoft 365, for institutional accounts.
Ask Yourself: Does this tool require me to share sensitive information, like a recorded class discussion or personal essay, with a remote server? Can I use a local or privacy-focused alternative?
Step 3: Evaluate Learning Impact
AI can save time and appear to increase productivity, but it’s not always the best tool for either. It really depends. Tools that automate tasks, like generating slides or summarizing articles, might reduce your engagement with the material, while those that support active learning, like STT for note review or TTS for aural study, can deepen understanding. Choose tools that help you learn, not just finish tasks faster.
Select tools that enhance your process without replacing it. For instance, you could use NotebookLM’s quiz feature to test your knowledge of history notes, promoting active recall, rather than relying on its summaries, which might oversimplify concepts. If you struggle with reading, a TTS tool like ElevenReader can make articles accessible, supporting learning while reducing screen fatigue. Avoid over-relying on GenAI, like Gemini or ChatGPT for essay drafts, since that may weaken your writing skills over time, but consider using it for brainstorming complex problems or getting additional background on something covered in class.
Ask Yourself: Will this AI tool help me understand the material better or is it a shortcut that skips critical thinking? Does it align with my learning style?