Data Validation in Juno: Best Practices and Security Considerations
Photo by Johann Walter Bantz
Why Data Validation Matters in Decentralized Apps
Data validation is always important. However, web3 comes with its own set of challenges which makes validation an even more important part of building trustworthy apps:
- No Central Administrator: Unlike traditional systems, decentralized apps have no admin backdoor to fix data issues
- Limited Data Access: Developers often can't directly access or examine user data due to encryption and/or privacy
- Data Immutability: Once written to the blockchain, data can be difficult or impossible to modify
- Client-Side Vulnerability: Front-end validation can be bypassed by determined users (like in web2)
- Security Risks: Invalid or malicious data can compromise application integrity and user trust
Getting validation right from the start is not just a best practice—it's essential for the secure and reliable operation of your application.
Available Approaches
Juno offers three main approaches for data validation:
- Hooks (on_set_doc)
- Custom Endpoints
- Assertion Hooks (assert_set_doc) 👈 --- Recommended approach
Let's explore each approach with simple examples:
on_set_doc Hooks
on_set_doc
is a Hook that is triggered after a document has been written to the database. It offers a way to execute custom logic whenever data is added or updated to a collection using the setDoc
function executed on the client side.
This allows for many use-cases, even for certain types of validation, but this hook runs after the data has already been written.
// Example of validation and cleanup in on_set_doc
#[on_set_doc(collections = ["users"])]
fn on_set_doc(context: OnSetDocContext) -> Result<(), String> {
// Step 1: Get all context data we'll need upfront
let collection = context.data.collection;
let key = context.data.key;
let doc = &context.data.data.after; // Reference to the full document after update
let user_data: UserData = decode_doc_data(&doc.data)?; // Decoded custom data from the document
// Step 2: Validate the data
if user_data.username.len() < 3 {
// Step 3: If validation fails, delete the document using low-level store function
delete_doc_store(
ic_cdk::id(), // Use Satellite's Principal ID since this is a system operation
collection,
key,
DelDoc {
version: Some(doc.version), // Use the version from our doc reference
}
)?;
// Log the error instead of returning it to avoid trapping
ic_cdk::print("Username must be at least 3 characters");
}
Ok(())
}
Issues:
- The on_set_doc hook only executes AFTER data is already written to the database, which is not ideal for validation.
- Since it only happens after the data is already written, it can lead to unwanted effects. For example: let's say a new data needs to be added to some list. If it is invalid, we can't add it to the list, but since the hook runs after the data is written, the data will be added to the list anyway before we can reject them. This adds unwanted complexity to your code, forcing the developer to manage multiple on_set_doc hooks in the same function.
- Overhead: invalid data is written (costly operation) then might be rejected and need to be deleted (another costly operation)
- Not ideal for validation since it can't prevent invalid writes
- Can't return success/error messages to the frontend
There are also other Juno hooks, but in general, they provide a way to execute custom logic whenever data is added, modified, or deleted from a Juno datastore collection.
Custom Endpoints using Serverless Functions
Custom Endpoints are Juno serverless functions that expose new API endpoints through Candid (the Internet Computer's interface description language). They provide a validation layer through custom API routes before data reaches Juno's datastore, allowing for complex multi-step operations with custom validation logic.
This example is provided as-is and is intended for demonstration purposes only. It does not include comprehensive security validations.
use junobuild_satellite::{set_doc_store, SetDoc}; // SetDoc is the struct type for document creation/updates
use junobuild_utils::encode_doc_data;
use ic_cdk::caller;
use candid::{CandidType, Deserialize};
// Simple user data structure
#[derive(CandidType, Deserialize)]
struct UserData {
username: String,
}
// Custom endpoint for user creation with basic validation
#[ic_cdk_macros::update]
async fn create_user(key: String, user_data: UserData) -> Result<(), String> {
// Step 1: Validate username (only alphanumeric characters)
if !user_data.username.chars().all(|c| c.is_alphanumeric()) {
return Err("Username must contain only letters and numbers".to_string());
}
// Step 2: Create and store document
// First encode our data into a blob that Juno can store into the 'data' field
let encoded_data = encode_doc_data(&user_data)
.map_err(|e| format!("Failed to encode user data: {}", e))?;
// Create a SetDoc instance - this is the required format for setting documents in Juno
// SetDoc contains only what we want to store - Juno handles all metadata:
// - created_at/updated_at timestamps
// - owner (based on caller's Principal)
// - version management
let doc = SetDoc {
data: encoded_data, // The actual data we want to store (as encoded blob)
description: None, // Optional field for filtering/searching
version: None // None for new docs, Some(version) for updates
};
// Use set_doc_store to save the document
// This is Juno's low-level storage function that:
// 1. Takes ownership of the document (caller's Principal)
// 2. Adds timestamps (created_at, updated_at)
// 3. Handles versioning
// 4. Stores the document in the specified collection
set_doc_store(
caller(), // Who is creating this document
String::from("users"), // Which collection to store in
key, // The document's unique key
doc // The SetDoc we prepared above
).await
}
While custom endpoints offer great flexibility for building specialized workflows, they introduce important security considerations. A key issue is that the original setDoc
endpoint remains accessible — meaning users can, to some extension, still bypass your custom validation logic by calling the standard Juno SDK methods directly from the frontend. As a result, even if you've added strict validation in your custom endpoints, the underlying collection can still be modified unless you take additional steps to restrict access.
The common workaround is to restrict the datastore collection to "controller" access so the public can't write to it directly, forcing users to interact only through your custom functions. However, this approach creates its own problems:
- All documents will now be "owned" by the controller, not individual users
- You lose Juno's built-in permission system for user-specific data access
- You'll need to build an entirely new permission system from scratch
- This creates a complex, error-prone "hacky workaround" instead of using Juno as designed
Key Limitations:
- Original
setDoc
endpoint remains accessible to users - Users can bypass custom endpoint entirely by using Juno's default endpoints directly (setDoc, setDocs, etc)
- Restricting collections to controller access breaks Juno's permission model
- Requires building a custom permission system from scratch
- Splits validation logic from data storage
assert_set_doc Hooks (Recommended)
The assert_set_doc
hook runs BEFORE any data is written to the database, allowing you to validate and reject invalid submissions immediately. This is the most secure validation method in Juno as it integrates directly with the core data storage mechanism.
When a user calls setDoc
through the Juno SDK, the assert_set_doc
hook is automatically triggered before any data is written to the blockchain. If your validation logic returns an error, the entire operation is cancelled and any changes are rolled back, and the error is returned to the frontend. This ensures invalid data never reaches your datastore in the first place, saving computational resources and maintaining data integrity.
Unlike other approaches, assert_set_doc
hooks:
- Cannot be bypassed by end users
- Integrate seamlessly with Juno's permission model
- Allow users to continue using the standard Juno SDK
- Keep validation logic directly in your data model
- Conserve blockchain resources by validating before storage
- Can reject invalid data with descriptive error messages that flow back to the frontend (unlike on_set_doc which runs after storage and can't return validation errors to users)
// Simple assert_set_doc example
#[assert_set_doc(collections = ["users"])]
fn assert_set_doc(context: AssertSetDocContext) -> Result<(), String> {
match context.data.collection.as_str() {
"users" => {
// Access username from the document
let data = context.data.data.proposed.data.as_object()
.ok_or("Invalid data format")?;
let username = data.get("username")
.and_then(|v| v.as_str())
.ok_or("Username is required")?;
// Validate username
if username.len() < 3 {
return Err("Username must be at least 3 characters".to_string());
}
Ok(())
},
_ => Ok(())
}
}
Key Advantages:
- Always runs BEFORE data is written - prevents invalid data entirely
- Zero overhead - validation happens in memory before expensive on-chain operations
- Cannot be bypassed or circumvented
- Prevents invalid data from ever being written
- Conserves resources by validating before storage
- Integrates directly with Juno's permission model
- Keeps validation (assert_set_doc) separate from business logic triggers (on_set_doc)
- Makes use of Juno's built-in permissions system
- Allows users to use setDoc as intended in Juno
- Can return custom error messages to the frontend
Hook Execution Flow
Here's the sequence of events during a document write operation:
- User calls
setDoc
assert_set_doc
hook runs (pre-validation)- If validation passes → continue
- If validation fails → operation cancelled entirely
- Data is written to Datastore
on_set_doc
hook runs (post-processing)- Operation completes
When and How to Use Each Approach
Use assert_set_doc For
- Essential data validation
- Structure and format verification
- Required field checking
- Value range constraints
- Uniqueness validation
- Relationship verification
Use on_set_doc For:
- Post-processing operations
- Notifications and logging
- Derived data calculation
- Asynchronous side effects
- Cascading updates
- Analytics and metrics
Use Custom Endpoints For:
- Complex multi-step workflows
- Specialized flows with custom logic
- Batch processing
Best Practices Summary
- Use assert_set_doc for Validation: Always validate data before storage
- Keep Validation Close to Data: Build validation directly into your data model
- Layer Your Security: Combine multiple approaches for defense in depth
- Set Appropriate Permissions: Configure collection access rights correctly
- Use Version Control: Prevent race conditions with proper versioning
- Implement Error Handling: Provide clear feedback for validation failures
- Maintain Audit Trails: Log validation events for security analysis
Production Use-Case Examples
Below are more detailed, production-ready examples for each validation approach:
assert_set_doc Example
use junobuild_satellite::{
set_doc, list_docs, decode_doc_data, encode_doc_data,
Document, ListParams, ListMatcher
};
use ic_cdk::api::time;
use std::collections::HashMap;
#[assert_set_doc(collections = ["users", "votes", "tags"])]
fn assert_set_doc(context: AssertSetDocContext) -> Result<(), String> {
match context.data.collection.as_str() {
"users" => validate_user_document(&context),
"votes" => validate_vote_document(&context),
"tags" => validate_tag_document(&context),
_ => Err(format!("Unknown collection: {}", context.data.collection))
}
}
fn validate_user_document(context: &AssertSetDocContext) -> Result<(), String> {
// Decode and validate the user data structure
let user_data: UserData = decode_doc_data(&context.data.data.proposed.data)
.map_err(|e| format!("Invalid user data format: {}", e))?;
// Validate username format (3-20 chars, alphanumeric + limited symbols)
if !is_valid_username(&user_data.username) {
return Err("Username must be 3-20 characters and contain only letters, numbers, and underscores".to_string());
}
// Check username uniqueness by searching existing documents
let search_pattern = format!("username={};", user_data.username.to_lowercase());
let existing_users = list_docs(
String::from("users"),
ListParams {
matcher: Some(ListMatcher {
description: Some(search_pattern),
..Default::default()
}),
..Default::default()
},
);
// If this is an update operation, exclude the current document
let is_update = context.data.data.before.is_some();
for (doc_key, _) in existing_users.items {
if is_update && doc_key == context.data.key {
continue;
}
return Err(format!("Username '{}' is already taken", user_data.username));
}
Ok(())
}
fn validate_vote_document(context: &AssertSetDocContext) -> Result<(), String> {
// Decode vote data
let vote_data: VoteData = decode_doc_data(&context.data.data.proposed.data)
.map_err(|e| format!("Invalid vote data format: {}", e))?;
// Validate vote value constraints
if vote_data.value < -1.0 || vote_data.value > 1.0 {
return Err(format!("Vote value must be -1, 0, or 1 (got: {})", vote_data.value));
}
// Validate vote weight constraints
if vote_data.weight < 0.0 || vote_data.weight > 1.0 {
return Err(format!("Vote weight must be between 0.0 and 1.0 (got: {})", vote_data.weight));
}
// Validate tag exists
let tag_params = ListParams {
matcher: Some(ListMatcher {
key: Some(vote_data.tag_key.clone()),
..Default::default()
}),
..Default::default()
};
let existing_tags = list_docs(String::from("tags"), tag_params);
if existing_tags.items.is_empty() {
return Err(format!("Tag not found: {}", vote_data.tag_key));
}
// Prevent self-voting
if vote_data.author_key == vote_data.target_key {
return Err("Users cannot vote on themselves".to_string());
}
Ok(())
}
fn validate_tag_document(context: &AssertSetDocContext) -> Result<(), String> {
// Decode tag data
let tag_data: TagData = decode_doc_data(&context.data.data.proposed.data)
.map_err(|e| format!("Invalid tag data format: {}", e))?;
// Validate tag name format and uniqueness
if !is_valid_tag_name(&tag_data.name) {
return Err("Tag name must be 3-50 characters and contain only letters, numbers, and underscores".to_string());
}
// Check tag name uniqueness
let search_pattern = format!("name={};", tag_data.name.to_lowercase());
let existing_tags = list_docs(
String::from("tags"),
ListParams {
matcher: Some(ListMatcher {
description: Some(search_pattern),
..Default::default()
}),
..Default::default()
},
);
let is_update = context.data.data.before.is_some();
for (doc_key, _) in existing_tags.items {
if is_update && doc_key == context.data.key {
continue;
}
return Err(format!("Tag name '{}' is already taken", tag_data.name));
}
// Validate description length
if tag_data.description.len() > 1024 {
return Err(format!(
"Tag description cannot exceed 1024 characters (current length: {})",
tag_data.description.len()
));
}
// Validate time periods
validate_time_periods(&tag_data.time_periods)?;
// Validate vote reward
if tag_data.vote_reward < 0.0 || tag_data.vote_reward > 1.0 {
return Err(format!(
"Vote reward must be between 0.0 and 1.0 (got: {})",
tag_data.vote_reward
));
}
Ok(())
}
fn validate_time_periods(periods: &[TimePeriod]) -> Result<(), String> {
if periods.is_empty() {
return Err("Tag must have at least 1 time period".to_string());
}
if periods.len() > 10 {
return Err(format!(
"Tag cannot have more than 10 time periods (got: {})",
periods.len()
));
}
// Last period must be "infinity" (999 months)
let last_period = periods.last().unwrap();
if last_period.months != 999 {
return Err(format!(
"Last period must be 999 months (got: {})",
last_period.months
));
}
// Validate each period's configuration
for (i, period) in periods.iter().enumerate() {
// Validate multiplier range (0.05 to 10.0)
if period.multiplier < 0.05 || period.multiplier > 10.0 {
return Err(format!(
"Multiplier for period {} must be between 0.05 and 10.0 (got: {})",
i + 1, period.multiplier
));
}
// Validate multiplier step increments (0.05)
let multiplier_int = (period.multiplier * 100.0).round();
let remainder = multiplier_int % 5.0;
if remainder > 0.000001 {
return Err(format!(
"Multiplier for period {} must use 0.05 step increments (got: {})",
i + 1, period.multiplier
));
}
// Validate month duration
if period.months == 0 {
return Err(format!(
"Months for period {} must be greater than 0 (got: {})",
i + 1, period.months
));
}
}
Ok(())
}
Remember: Security is about preventing unauthorized or invalid operations, not just making them difficult. assert_set_doc hooks provide the only guaranteed way to validate all data operations in Juno's Datastore.
References
- Deep Dive into Serverless Functions
- Available Hooks
- List of Assertions
- Examples of Writing Functions in Rust
✍️ This blog post was contributed by Fairtale, creators of Solutio.
Solutio is a new kind of platform where users crowdfund the software they need, and developers earn by building it. Instead of waiting for maintainers or hiring devs alone, communities can come together to fund bug fixes, new features, or even entire tools — paying only when the result meets their expectations.