Troubleshooting Provisioning Failures

Provisioning failures fall into a few predictable categories. This deep-dive covers the most common error patterns, how to diagnose them systematically, and how to recover from quarantine and escrow situations.

Diagnostic Approach

Before diving into specific error types, establish a consistent diagnostic workflow:

  1. Check the provisioning progress bar for quarantine status and overall health.
  2. Read the provisioning logs, filtered by status = Failure or Skipped.
  3. Inspect the provisioning steps in the log entry for the failing user. The step detail shows exactly where the process broke down.
  4. Test with on-demand provisioning to reproduce the failure in isolation and get detailed step-by-step output.
  5. Check the target application directly. Sometimes the root cause is on the app side (API down, rate limiting, schema changes).

The provisioning logs are the single most important diagnostic tool. Every operation - successful or failed - is recorded with full detail about what the engine attempted and what response it received.

Common Error Patterns

Missing Required Attributes

Symptom: Provisioning log shows a failure with a message like “Required attribute [attributeName] is missing” or the target API returns a 400 error about a missing field.

Cause: The source user does not have a value for an attribute that the target application requires, and no default value or expression is configured in the mapping.

Fix:

  • Check the failing user’s attributes in Entra ID. Is the required attribute populated?
  • If the attribute is commonly empty, add a default value in the attribute mapping. For example, set a default department of “Unassigned” for users without a department.
  • If the attribute should come from a different source attribute, update the mapping to use an expression with a fallback: IIF(IsNullOrEmpty([department]), "Unassigned", [department]).

Unique Constraint Violations (Duplicate Values)

Symptom: The target application returns a 409 Conflict or a message about a duplicate value for an attribute like userName or email.

Cause: Another user in the target system already has the same value for a uniqueness-enforced attribute. This commonly happens when:

  • A user was previously provisioned manually and a duplicate entry exists.
  • Two source users map to the same target attribute value due to an expression issue.
  • The user was soft-deleted and re-created in Entra ID but the old entry persists in the target system.

Fix:

  • Check the target application for the conflicting entry. Remove or update it if it is stale.
  • Review the attribute mapping expression to ensure it produces unique values. For example, if Join(".", [givenName], [surname]) creates duplicates for people with the same name, add a disambiguator like Join(".", [givenName], [surname], Left([objectId], 4)).
  • Use on-demand provisioning to test the updated mapping before restarting the service.

Matching Failures

Symptom: The provisioning engine creates a new user in the target system even though the user already exists there. Or it fails to match and reports a matching error.

Cause: The matching attribute values do not align between the source and target systems. For example, the matching attribute is userPrincipalName to userName, but the existing user in the target app was created with a different username format.

Fix:

  • Review the matching attribute configuration in the attribute mappings. Ensure the matching attribute actually contains the same value in both systems.
  • Consider adding a second matching attribute with a lower precedence (e.g., match on email if UPN matching fails).
  • If users were pre-provisioned manually, you may need to update the target user records to include the correct matching attribute value before enabling automatic provisioning.

SCIM Compliance Errors

Symptom: The target application returns unexpected HTTP status codes or malformed responses. The provisioning log shows “SCIM compliance issue” or unexpected response formats.

Cause: The target application’s SCIM endpoint does not fully comply with the SCIM 2.0 specification. Common violations include:

  • Returning 404 instead of an empty result set for user search queries.
  • Not supporting PATCH operations.
  • Returning the wrong content-type header.
  • Not handling filter parameters correctly.

Fix:

  • If this is a gallery app, check Microsoft’s known issues list for the application.
  • If this is a custom SCIM endpoint, validate it against Microsoft’s SCIM validator tool.
  • Contact the application vendor if their endpoint is non-compliant.

Invalid Admin Credentials

Symptom: All provisioning operations fail. The provisioning job enters quarantine with reason “EncounteredQuarantineException” or “Invalid credentials.”

Cause: The admin credentials (API token, OAuth token, or username/password) for the target application are expired, revoked, or incorrect.

Fix:

  1. Navigate to Provisioning > Admin Credentials.
  2. Re-enter valid credentials.
  3. Click Test Connection to verify.
  4. Restart provisioning.

For OAuth-based gallery apps, you may need to re-authorize by clicking the authorization button, which redirects to the app’s OAuth flow.

Reference Attribute Failures

Symptom: User provisioning succeeds, but manager assignments or group memberships fail with reference resolution errors.

Cause: The referenced object (the manager or the group) does not exist yet in the target system, or its ID cannot be resolved. This is common during initial cycles when a manager has not been provisioned before their reports.

Fix:

  • Reference failures are typically self-healing. The provisioning engine retries reference assignments in subsequent cycles. After the referenced user is created, the reference is resolved.
  • These failures do not count toward the quarantine escrow threshold.
  • If reference failures persist, ensure the referenced user is in scope for provisioning and has been successfully created.

Quarantine Recovery

When a provisioning job enters quarantine, follow this recovery procedure:

Step 1: Identify the Quarantine Reason

Check the provisioning progress bar or query the Graph API:

GET /servicePrincipals/{id}/synchronization/jobs/{jobId}

The status.quarantine.reason field tells you why:

  • EncounteredQuarantineException: Credential or connectivity failure.
  • EncounteredEscrowProportionThreshold: Too many individual operation failures.
  • QuarantineOnDemand: Manual quarantine by Microsoft support.

Step 2: Fix the Root Cause

  • For credential issues: update credentials and test connection.
  • For escrow threshold: review the provisioning logs to identify the most common failure pattern. Fix the underlying issue (mapping errors, missing attributes, target API problems).

Step 3: Restart Provisioning

Click Restart provisioning in the portal, or use the Graph API:

POST /servicePrincipals/{id}/synchronization/jobs/{jobId}/restart
{
  "criteria": {
    "resetScope": "Full"
  }
}

Use "resetScope": "Quarantine" to clear only the quarantine flag without forcing a full initial cycle. Use "resetScope": "Full" when you want a clean re-evaluation of all users.

Step 4: Monitor the Recovery

Watch the progress bar and provisioning logs as the initial or incremental cycle runs. Ensure the failure rate drops and the job does not re-enter quarantine.

Accidental Deletion Protection

The provisioning service includes built-in protection against accidental mass deletions. If a single cycle would delete more users than the configured threshold (default: 500), the service pauses and sends a notification.

This protects against scenarios like:

  • A scoping filter change that accidentally removes all users from scope.
  • A source system data issue that makes it appear as if all users were removed.

When deletion protection triggers:

  1. Review the pending deletions in the provisioning logs.
  2. If the deletions are intentional (e.g., you changed the scoping filter on purpose), approve them.
  3. If the deletions are unintentional, fix the scoping configuration and restart provisioning.

You can configure the deletion threshold via the Graph API:

PATCH /servicePrincipals/{id}/synchronization/jobs/{jobId}
{
  "schedule": {
    "accidentalDeletionThreshold": 100
  }
}

Debugging Attribute Mappings

When attribute values are not flowing correctly (the right user is created but with wrong attribute values), use this process:

  1. On-demand provision the user and examine the “Attribute Mapping” step. It shows the source value, the mapping expression, and the resulting target value for every mapped attribute.
  2. Check expression syntax in the expression builder. Paste your expression and test it against a specific user’s attributes.
  3. Watch for null handling. If a source attribute is null and no default value is configured, the target attribute may be skipped or set to empty. Use IIF(IsNullOrEmpty([attr]), "default", [attr]) patterns for critical attributes.
  4. Check mapping scope. Some mappings are configured to apply only during creation, not during updates. If an attribute is correct at creation but not updated later, check the “Apply this mapping” setting.

Troubleshooting Checklist

SymptomFirst thing to check
No users provisionedScoping (assigned users? scoping filters?)
Users skipped in logsScoping filter evaluation or “not effectively entitled” status
Users created but with wrong attributesAttribute mapping expressions
Duplicate users in targetMatching attribute configuration
All operations failingAdmin credentials (test connection)
Job in quarantineQuarantine reason in progress bar or Graph API
Manager assignments failingReferenced user not yet provisioned (self-healing)
Mass deletions pendingAccidental deletion protection threshold

Next Steps