Standardization
Standardization
Standardization is the core transformation process that converts raw data from various sources into consistent, unified objects. This enables seamless data synchronization across different systems regardless of their native data formats.
π Transformation Philosophy
Rather than forcing complex custom mappings, Outrun standardizes data into four universal object types that represent how businesses naturally think about their data.
The Standardization Process
Standardization transforms raw stream data through a multi-stage process:
1. Stream Processing
Raw data from [sourceId]_stream
collections is processed using source-specific mapping rules:
- Field Mapping: Native fields mapped to standardized equivalents
- Data Type Conversion: Ensures consistent data types across sources
- Validation: Checks data quality and completeness
- Enrichment: Adds computed fields and metadata
2. Consolidation
Processed data moves to [sourceId]_consolidate
collections for merging and deduplication:
- Duplicate Detection: Identifies potential duplicate records
- Record Merging: Combines duplicate records intelligently
- Conflict Resolution: Handles conflicting data from multiple sources
- Quality Scoring: Assigns quality scores to consolidated records
3. Object Creation
Consolidated data transforms into standardized objects ready for destination sync:
- Object Classification: Determines which standardized object type to create
- Relationship Mapping: Establishes connections between objects
- Metadata Preservation: Maintains source lineage and processing history
- Validation: Final quality checks before destination sync
The Four Standardized Objects
Outrun standardizes all data into four universal object types that represent the fundamental building blocks of business data:
π€ People
Individual humans in your business ecosystem
- β’ Contacts, Leads, Users
- β’ Customers, Prospects
- β’ Team Members, Authors
- β’ Anyone with personal identity
π’ Organizations
Companies and business entities
- β’ Companies, Accounts
- β’ Customers, Prospects
- β’ Partners, Vendors
- β’ Any business entity
π Facts
Measurable data points and metrics
- β’ Analytics data, KPIs
- β’ Performance metrics
- β’ Search console data
- β’ Quantifiable measurements
π Relationships
Connections between other objects
- β’ Person-to-Organization links
- β’ Hierarchical structures
- β’ Business relationships
- β’ Association mappings
Object Mapping Examples
HubSpot β Standardized Objects
// HubSpot Contact β Person
{
"sourceData": {
"vid": 12345,
"properties": {
"email": "john@example.com",
"firstname": "John",
"lastname": "Doe",
"jobtitle": "Marketing Manager"
}
},
"standardizedObject": {
"type": "Person",
"email": "john@example.com",
"firstName": "John",
"lastName": "Doe",
"jobTitle": "Marketing Manager",
"sourceId": "hubspot_abc123",
"sourceObjectId": "12345",
"sourceObjectType": "contact"
}
}
// HubSpot Company β Organization
{
"sourceData": {
"companyId": 67890,
"properties": {
"name": "Acme Corp",
"domain": "acme.com",
"industry": "Technology"
}
},
"standardizedObject": {
"type": "Organization",
"name": "Acme Corp",
"domain": "acme.com",
"industry": "Technology",
"sourceId": "hubspot_abc123",
"sourceObjectId": "67890",
"sourceObjectType": "company"
}
}
Salesforce β Standardized Objects
// Salesforce Lead β Person
{
"sourceData": {
"Id": "00Q000000123456",
"Email": "jane@startup.com",
"FirstName": "Jane",
"LastName": "Smith",
"Company": "Startup Inc"
},
"standardizedObject": {
"type": "Person",
"email": "jane@startup.com",
"firstName": "Jane",
"lastName": "Smith",
"company": "Startup Inc",
"sourceId": "salesforce_def456",
"sourceObjectId": "00Q000000123456",
"sourceObjectType": "Lead"
}
}
// Salesforce Account β Organization
{
"sourceData": {
"Id": "001000000234567",
"Name": "Enterprise Solutions Ltd",
"Website": "enterprise.com",
"Industry": "Financial Services"
},
"standardizedObject": {
"type": "Organization",
"name": "Enterprise Solutions Ltd",
"website": "enterprise.com",
"industry": "Financial Services",
"sourceId": "salesforce_def456",
"sourceObjectId": "001000000234567",
"sourceObjectType": "Account"
}
}
Consolidation Process
The consolidation stage merges and deduplicates data from multiple sources:
Duplicate Detection
Outrun uses multiple strategies to identify potential duplicates:
- Email Matching: Primary identifier for People objects
- Domain Matching: Primary identifier for Organizations
- Name Similarity: Fuzzy matching for similar names
- Phone Numbers: Secondary matching criteria
- Custom Rules: Source-specific matching logic
Record Merging
When duplicates are detected, Outrun intelligently merges records:
// Before Consolidation (Two Sources)
{
"hubspot_record": {
"email": "john@acme.com",
"firstName": "John",
"lastName": "Doe",
"phone": null,
"jobTitle": "Manager"
},
"salesforce_record": {
"email": "john@acme.com",
"firstName": "John",
"lastName": "Doe",
"phone": "+1-555-0123",
"jobTitle": "Marketing Manager"
}
}
// After Consolidation (Merged)
{
"consolidated_record": {
"email": "john@acme.com",
"firstName": "John",
"lastName": "Doe",
"phone": "+1-555-0123", // Filled from Salesforce
"jobTitle": "Marketing Manager", // More specific from Salesforce
"sources": ["hubspot_abc123", "salesforce_def456"],
"qualityScore": 0.95,
"lastUpdated": "2024-01-15T10:30:00Z"
}
}
Conflict Resolution
When sources provide conflicting data, Outrun applies resolution rules:
- Most Recent Wins: Newer data takes precedence
- Most Complete Wins: Records with more fields preferred
- Source Priority: Configurable source ranking
- Field-Level Rules: Specific rules for individual fields
- Manual Review: Flag complex conflicts for human review
Standardized Object Schema
Person Object
{
"type": "Person",
"email": "string (primary key)",
"firstName": "string",
"lastName": "string",
"fullName": "string (computed)",
"phone": "string",
"jobTitle": "string",
"company": "string",
"department": "string",
"location": "string",
"linkedInUrl": "string",
"twitterHandle": "string",
"website": "string",
"tags": ["array of strings"],
"customFields": "object",
"sourceId": "string",
"sourceObjectId": "string",
"sourceObjectType": "string",
"qualityScore": "number (0-1)",
"createdAt": "datetime",
"updatedAt": "datetime"
}
Organization Object
{
"type": "Organization",
"name": "string (primary key)",
"domain": "string",
"website": "string",
"industry": "string",
"size": "string",
"revenue": "number",
"location": "string",
"address": "object",
"phone": "string",
"description": "string",
"foundedYear": "number",
"tags": ["array of strings"],
"customFields": "object",
"sourceId": "string",
"sourceObjectId": "string",
"sourceObjectType": "string",
"qualityScore": "number (0-1)",
"createdAt": "datetime",
"updatedAt": "datetime"
}
Facts Object
{
"type": "Facts",
"metric": "string",
"value": "number",
"unit": "string",
"dimension": "object",
"timestamp": "datetime",
"period": "string",
"source": "string",
"category": "string",
"tags": ["array of strings"],
"metadata": "object",
"sourceId": "string",
"sourceObjectId": "string",
"sourceObjectType": "string",
"createdAt": "datetime"
}
Relationships Object
{
"type": "Relationships",
"fromType": "string (Person|Organization)",
"fromId": "string",
"toType": "string (Person|Organization)",
"toId": "string",
"relationshipType": "string",
"strength": "number (0-1)",
"verified": "boolean",
"startDate": "datetime",
"endDate": "datetime",
"metadata": "object",
"sourceId": "string",
"sourceObjectId": "string",
"sourceObjectType": "string",
"createdAt": "datetime",
"updatedAt": "datetime"
}
Quality Scoring
Outrun assigns quality scores to help prioritize and validate data:
Scoring Factors
- Completeness: Percentage of fields populated
- Accuracy: Validation against known patterns
- Consistency: Agreement across multiple sources
- Freshness: How recently the data was updated
- Source Reliability: Historical accuracy of the source
Quality Thresholds
- 0.9-1.0: Excellent quality, ready for immediate use
- 0.7-0.9: Good quality, minor issues possible
- 0.5-0.7: Fair quality, review recommended
- 0.0-0.5: Poor quality, manual review required
Benefits of Standardization
Simplified Integration
- Universal Format: Same object structure regardless of source
- Consistent APIs: Standardized access patterns
- Reduced Complexity: No need to understand each source's schema
- Faster Development: Accelerated integration projects
Enhanced Data Quality
- Deduplication: Eliminates duplicate records across sources
- Enrichment: Combines data from multiple sources
- Validation: Consistent quality checks
- Monitoring: Unified data quality metrics
Business Intelligence
- Cross-Source Analytics: Analyze data across all systems
- Complete Customer View: 360-degree customer profiles
- Relationship Mapping: Understand connections between entities
- Trend Analysis: Track changes over time
Best Practices
Mapping Configuration
- Review Default Mappings: Understand how fields map to standardized objects
- Custom Field Handling: Plan for source-specific custom fields
- Data Type Consistency: Ensure compatible data types across sources
- Validation Rules: Set up appropriate validation for your data
Quality Management
- Monitor Quality Scores: Track data quality trends over time
- Review Low-Quality Records: Investigate and improve poor-quality data
- Source Data Hygiene: Maintain clean data in source systems
- Regular Audits: Periodically review standardization accuracy
Performance Optimization
- Batch Processing: Configure appropriate batch sizes
- Resource Allocation: Monitor processing resource usage
- Error Handling: Set up alerts for processing failures
- Capacity Planning: Plan for data volume growth
Next Steps
π€ Learn About Delivery
Discover how standardized objects sync to your destinations.
Delivery Process βπ― Explore Destinations
See how standardized objects map to destination systems.
View Destinations βStandardization is the key to Outrun's power - transforming chaos into consistency, enabling seamless data synchronization across any system.