• Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Trending

Apple Pulls China’s Top Gay Dating Apps After Government Order

November 17, 2025

Some Good News About ‘Euphoria’ Season 3’s Release Date On HBO

November 17, 2025

The Former Staffer Calling Out OpenAI’s Erotica Claims

November 16, 2025
Facebook Twitter Instagram
  • Newsletter
  • Submit Articles
  • Privacy
  • Advertise
  • Contact
Facebook Twitter Instagram
UptownBudget
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Subscribe for Alerts
UptownBudget
Home » Why The Future Of Generative AI Lies In A Company’s Own Data
Innovation

Why The Future Of Generative AI Lies In A Company’s Own Data

adminBy adminOctober 17, 20230 ViewsNo Comments5 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email

Alex Ratner is the co-founder and CEO at Snorkel AI, and an Affiliate Assistant Professor of Comput. Sci. at the University of Washington.

The age of large language models (LLMs) and generative AI has sparked excitement for business leaders. But those who want to launch their own LLM face many hurdles between wanting a production generative AI tool and developing one that delivers real business value and sustained advantage.

Foundation models themselves have quickly become commoditized. Any developer can build on the Google Bard or OpenAI APIs. More mature organizations can deploy models like Llama 2 in their own walled gardens. But if their competitors also use Llama 2, what advantage do they have? Proprietary data—and the knowledge of how to develop and use it—provides the only sustainable enterprise AI moat.

As generative AI has reached the peak of the Gartner AI Hype Cycle, enterprises are learning that off-the-shelf LLMs can’t solve every problem—particularly not unique, high-value problems. Proprietary data can close the gap, but only when properly curated and developed.

Your Data, Your Moat

Off-the-shelf LLMs yield fun experiments and demos, but using them in a business setting will rarely achieve the accuracy needed to deliver business value. Businesses don’t need chatbots that can discuss poetry as competently as they explain computer code—they need highly accurate specialists.

Private data is a moat—a potential competitive advantage. By leveraging your proprietary data and subject matter expertise, you can build generative models that work better for your domain, your chosen tasks and your customers.

Enterprises can gain these advantages in three ways:

1. Retrieval augmentation.

2. Fine-tuning with prompts and responses.

3. Self-supervised pre-training.

Let’s briefly look at each.

Retrieval Augmentation

Retrieval augmented generation, better known as RAG, allows your generative AI pipeline to enrich prompts with query-specific knowledge from a company’s proprietary databases or document archives. This generally yields better, more accurate answers, even with a standard LLM.

But this is similar to giving an intern access to your intranet: Even with all the information before them, the intern may misunderstand or miscommunicate. To get better performance, your data team needs a customized model.

Fine-Tuning With Prompts And Responses

Data-driven organizations can fine-tune LLMs with curated prompts and responses. This sharpens and improves the model’s output on the organization’s most important tasks. To use a metaphor, a doctor needs access to a patient’s medical chart (retrieval augmentation) and specialty training (fine-tuning) to render an accurate diagnosis. Data scientists can carefully choose the prompts and responses used to fine-tune the LLM to improve performance on a wide variety of tasks or greatly boost performance on a very narrow set of them.

Self-Supervised Pre-Training

Some organizations may want to take their LLM customization further and build a model from scratch. However, this can demand more effort than it’s worth. Firms with business vocabularies well-represented in the embedding spaces of off-the-shelf LLMs can often achieve the necessary performance gains through fine-tuning alone.

If an organization feels that it needs a model custom-built from the ground up, its data team first selects a model architecture and then trains it on unstructured text—initially on a large, generalized corpus, then on proprietary data. This teaches the model to understand the relationships between words in a way that’s specific to the company’s domain, history, positioning and products. The data team can then further train the model on prompts and responses to make it not just knowledgeable but task-oriented.

The Data Lift

The ideal deployment would incorporate all three of the above approaches, but that represents a heavy data labeling load. Studies from McKinsey and Appen show that a lack of high-quality labeled data blocks enterprise AI projects more often than any other factor—and, to be clear, all three of these approaches require labeled data.

Fine-tuning with prompts and responses requires data teams to identify and label prompts according to the task and then determine high-quality responses. Pre-training with self-supervised learning requires companies to carefully curate the unstructured data they feed the model. Training on lunch orders and payroll could degrade performance or cause sensitive data to leak internally.

Even retrieval augmentation benefits from data labeling. Although vector databases efficiently handle relevance metrics, they won’t know if a retrieved document is accurate and up to date. No company wants its internal chatbot to return out-of-date prices or recommend discontinued products.

Data Is Essential To Delivering Generative AI Value

Using your proprietary data to build your AI moat requires work, and that work rests heavily on data-centric approaches including data labeling and curation. Firms can outsource some labeling to crowd workers, but much of it will be too complex, specialized or sensitive for gig workers to handle. And it’s still time-consuming and expensive, similar to relying on internal experts for data labeling.

Data science teams can use active learning approaches such as label spreading to amplify the impact of internal labelers. Programmatic labeling is another option. Our researchers used those tools to build a better LLM.

Your data—properly prepared—is the most important thing your organization brings to AI and where your organization should spend the most time to extract the most value. Your data is your moat. Use it.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Read the full article here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Articles

Some Good News About ‘Euphoria’ Season 3’s Release Date On HBO

Innovation November 17, 2025

Google’s Updated Special Offer To Pixel 9 And Pixel 10 Buyers

Innovation November 16, 2025

‘Quordle’ Hints And Answers For Sunday, November 16

Innovation November 15, 2025

6 Things To Know About Slowing Aging, According To Dr. David Sinclair

Innovation November 14, 2025

The OnePlus 15 Solves Battery Anxiety But Trips Over Familiar Flaws

Innovation November 13, 2025

‘ARC Raiders’ Lowers Cosmetic Prices, Has Duo Matchmaking, Talks Raider Deck Plans

Innovation November 12, 2025
Add A Comment

Leave A Reply Cancel Reply

Editors Picks

Apple Pulls China’s Top Gay Dating Apps After Government Order

November 17, 2025

Some Good News About ‘Euphoria’ Season 3’s Release Date On HBO

November 17, 2025

The Former Staffer Calling Out OpenAI’s Erotica Claims

November 16, 2025

Google’s Updated Special Offer To Pixel 9 And Pixel 10 Buyers

November 16, 2025

OpenAI’s Open-Weight Models Are Coming to the US Military

November 15, 2025

Latest Posts

6 Things To Know About Slowing Aging, According To Dr. David Sinclair

November 14, 2025

Tesla Shareholders Approve Elon Musk’s $1 Trillion Pay Package

November 13, 2025

The OnePlus 15 Solves Battery Anxiety But Trips Over Familiar Flaws

November 13, 2025

Trump’s CZ Pardon Has the Crypto World Bracing for Impact

November 12, 2025

‘ARC Raiders’ Lowers Cosmetic Prices, Has Duo Matchmaking, Talks Raider Deck Plans

November 12, 2025
Advertisement
Demo

UptownBudget is your one-stop website for the latest news and updates about how to start a business, follow us now to get the news that matters to you.

Facebook Twitter Instagram Pinterest YouTube
Sections
  • Growing a Business
  • Innovation
  • Leadership
  • Money & Finance
  • Starting a Business
Trending Topics
  • Branding
  • Business Ideas
  • Business Models
  • Business Plans
  • Fundraising

Subscribe to Updates

Get the latest business and startup news and updates directly to your inbox.

© 2025 UptownBudget. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Press Release
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.