hckrnws
Show HN: Vekos – a Rust OS with Built-In Cryptographic Verification
by jgiraldo29
VEKOS(Verified Experimental Kernel OS) is a Rust-based experimental kernel that focuses on runtime verification and security. Every filesystem operation and memory allocation generates cryptographic proofs that can be verified in real-time, ensuring system integrity. Think of it like a blockchain for OS operations rather than just storage verification.
Key features: - Merkle tree-backed filesystem with operation verification - Memory manager with CoW and proof generation - Buddy allocator with zone-based memory management - Shell with VT100 support - Verified boot sequence with stage attestation
The kernel is still in alpha, but it demonstrates a novel approach to OS security by making verification a first-class citizen. All critical operations (memory allocations, filesystem writes, process creation) generate proofs that are stored in an append-only chain, similar to a blockchain but for system operations.
GitHub: https://github.com/JGiraldo29/vekos
I would be excited to get feedback on this project, especially on the verification approach and potential use cases. If you have any question the innerworkings of the development, just ask and I will gladly answer all questions. The code is under the Apache 2.0 license and welcomes contributors.
Curious what the threat model for the cryptographic verification is. It looks like verify_signature_software[1] doesn't actually verify ed25519 signatures, but rather computes a truncated sha512 hash of the data and compares that with the supplied hash.
fn verify_signature_software(&self, data: &[u8], signature: &[u8; ED25519_SIGNATURE_LENGTH]) -> bool {
let mut h = [0u8; 64];
let data_hash = self.compute_sha512(data);
for i in 0..32 {
h[i] = data_hash[i];
}
self.verify_ed25519_reduced(h, signature)
}
This calls [1] which merely performs a byte equality check on the first 32 bytes of the hash: fn verify_ed25519_reduced(&self, h: [u8; 64], signature: &[u8; ED25519_SIGNATURE_LENGTH]) -> bool {
// ...
let mut matches = true;
for i in 0..32 {
if signature[i] != h[i] {
matches = false;
break;
}
}
// ...
matches && key_valid
}
With this design, an adversary who knows data can simply calculate their own hash of the input data and supply it as a "signature", no?It is difficult to comment on the verification approach when there are no secrets and only hash verification occurs. Do you have documentation on the approach and future plans? At best, this "signature verification" looks like placeholders for future verification.
[0]: https://github.com/JGiraldo29/vekos/blob/d34e6454f3f7290e4b5...
[1]: https://github.com/JGiraldo29/vekos/blob/d34e6454f3f7290e4b5...
Thank your question. The OS does have currently some limitations in the signature area, as this is one of those things that is still a work in progress due to trying to focus in many common threats first before expanding on the verification. It does perform hash comparisons, but then again, the proper ED25519 signature verification is still a work in progress.
The threat model is still evolving, but the core goal is to provide verifiable attestation of system operations with these key properties:
1. Non-repudiation of operations
2. Tamper-evidence for operation chains
3. Verifiable boot sequence attestation
You are absolutely right that the cryptographic aspects need significant hardening. I even have some key improvements planned for future versions:
1. Proper ED25519 signature implementation using the ring or ed25519-dalek crates
2. Secure key management for signing operation proofs
3. TPM integration for hardware-backed key storage and verification
4. Formal verification of the proof generation and verification logic
The core verification chain itself (in merkle_tree.rs and hash_chain.rs) provides tamper detection, but it does require significant hardening in the cryptographic area. Now being sincere, I really wanted to emphasize on the VKFS(verified kernel file system based on the Linux Ext2) implementation first as that was a very tough one to make.
Anyways, I really appreciate you diving into the code and highlighting this. ( * ´ ω ` * )
Interesting. I think it would be good if you added a section to your README explaining who the target user for this is. Is it for systems that need extensive auditing? Also, what kind of performance hit do you get by doing cryptography for every allocation?
Thank you for your questions.
The primary target users would be systems where integrity verification and auditing are critical requirements, such as:
- Financial/banking systems that need cryptographic proof of all operations
- Medical devices where operation verification is essential for safety
- High-security environments requiring complete system attestation
- Research systems that need reproducible and verifiable execution paths
Though, my end goal with this to be sincere is making it general purpose in the future. I am a firm believer in the privacy of the user, and that's really what I wanted to achieve with this. An OS that can be run anywhere, including hardware that can't be trusted, having the confidence that there won't be a third actor watching every action the user takes.
Regarding the performance impact: The verification system is actually quite efficient since it uses hardware-accelerated operations where available (SHA-256 instructions on modern CPUs) and an optimized FNV-1a fallback implementation. The proof generation adds roughly 3-5% overhead for memory operations and 7-9% for filesystem operations in the benchmarks.
This is achieved by:
- Batching proof generation for small allocations
- Using a zone-based memory allocator that pre-verifies large memory regions
- Implementing an efficient proof storage system with automatic pruning
- Taking advantage of modern CPU crypto acceleration
You're absolutely right about adding this to the README - I'll create a dedicated section covering use cases and performance characteristics.
> ensuring system integrity
The threat model here is defective hardware and not an attacker? Presumably an attacker with RCE could just make "more" activity via the verification path in order to accomplish their goals. Although I suppose this activity would be present in the tree, so you could conceivably audit it (somehow) and discover the attack.
Good observation about the threat model. The verification system actually serves multiple purposes:
1. Hardware integrity: Yes, it helps detect hardware faults, but that's not the primary focus.
2. Attack detection and auditing: The system creates an un-forgeable chain of evidence for all operations. An attacker with RCE could indeed create "legitimate" operations, but:
- Each operation is cryptographically signed and chained
- Operations must maintain valid state transitions
- The Merkle tree structure makes it impossible to modify past operations
- Anomaly detection can flag suspicious operation patterns
3. Runtime verification: The system enforces invariants at runtime. For example:
- Memory operations must maintain zone integrity
- File operations must preserve filesystem consistency
- Process transitions must follow valid state changes
It's technically true that the attacked could theoretically use the verification subsystem itself, but this creates what we call "high-fidelity forensics" - every action leaves cryptographic evidence. Think of it like a tamper-evident seal - you can't break in without leaving proof.
The code in verification.rs and operation_proofs.rs demonstrates these security mechanisms if you're interested in the implementation details of the verification as a whole.
How quickly does the append-only chain grow? What are the storage needs for it?
I've put a lot of thought into managing the storage growth, the chain grows proportionally to system activity, but I've implemented several optimizations to keep it manageable:
1. Efficient proof encoding: Each proof is typically 128 bytes (64-byte operation hash + 64-byte signature). For context, a 1GB system performing ~1000 operations/second would generate roughly 10MB of proof data per minute before optimizations.
2. Smart pruning strategies:
- Automatic pruning of validated proof chains after state transitions
- Configurable retention windows (default: 1 hour) for non-critical proofs
- Merkle tree summarization of older proofs (keeping root hashes only)
- Proof batching for high-frequency operations
3. Storage management: - In-memory proof cache (default 10,000 proofs)
- Efficient disk serialization format
- Automatic archive rotation
In practice, a typical desktop workload generates about 100-200MB of proof data per day after optimizations. High-security environments can keep full chains (roughly 1-2GB/day), while standard deployments can use pruned chains (~100MB/day).
I'm also working on implementing selective proof generation where you can choose which operations require verification, allowing even finer control over storage growth.
The code in proof_storage.rs shows the implementation details if you're interested in the specifics.
very nice to see this, one of the more interesting little OS projects i've come across. had a similar idea for my OS (waaay not far enough in development - and i lack the skills!) and it's really cool to see someone managed to actually do this bit!
what i thought, is that perhaps utilizing a hypervisor, a log (maybe a blockchain but i didn't get that far :P) can be kept out of reach of the OS, which logs every task that's executed along with its results (in some way or form). this way, there's - from the OS point of view, no tampering on this log possible, and from the hypervisor point of view, the ability to deny certain interactions. - upon this log, things like machine learning could be implemented to do anomaly detection. - maybe you have the skills to do something like that XD... i am forever lost in the earlier code of an OS :D....
There's tons of nice features in the architecture which can be combined with ML / crypto along with techniques like taint tracking which could make operating systems and the programs running within them much more secure.
aaaanyhow... really cool project, can't wait to see how it would develop in the future :). good job!
Thank you so much. I really like the idea of the hypervisor-based verification approach as it would provide stronger isolation for the verification chain than the current in-kernel approach. It also aligns really well with some of VEKOS's core components. For instance, the current verification chain (in verification.rs) already maintains an append-only log of system operations:
pub struct VerificationRegistry { proofs: Vec<OperationProof>, current_state: AtomicU64, }
The current proof system could be extended (operation_proofs.rs) to communicate with a hypervisor-level verification layer.
About the ML, I actually had a previous scrapped component that would have allowed an ML model to run natively in the kernel by dividing the memory zones into 4 different components. Now for issues related to the memory, and for security concerns, I decided to not follow with it. ML are really good at detecting specific components, but I am afraid of the false alarms, as these could cause the system to have for example, spontaneous slow downs in the verifications.
Super interesting insights, thanks! I see you are much further already in this stuff, that's amazing.
you are right the ML might be tricky, i have a very specific design in which it might simplify what it's trying to analyse, but it does raise the question if its really going to be useful to add ML into security domain. one of the first things i learned about ML is that it shouldn't be implemented within systems that are not going to handle probabilistic errors well. (if you think about billions of operations, 0.001% false positives can already kill you etc.).
in the design i am going for, each subsystem of the os has its own little 'task language', which i'd hope simplifies what the ML is operating on (separate learning per subsystem to have only a relatively small domain to operate in - memory, disk, cpu, network, etc.). the tasks would be bytecodes interpreted a bit like java. (want to randomise mapping of bytecodes -> functions each time a module starts etc)
obviously this design is kind of leaning into being slow, and performing badly, to experiment with security ideas. i tested a bunch of it in user-mode code but getting the OS infra far enough to build it on baremetal has been an infinite struggle :D
will watch your progress with keen anticipation - i think i can learn tons of it, thanks!
Comment was deleted :(
no mention of SMP or lack of SMP support. I am curious if this has SMP support
Thanks for your question. Currently, VEKOS runs in single-core mode - SMP support is planned but not yet implemented.
The challenge with SMP in VEKOS isn't just synchronization, but maintaining verifiable operation chains across cores. The current verification system in operation_proofs.rs and verification.rs assumes sequential operations for proof generation and validation. Key considerations for SMP implementation include:
1. The VERIFICATION_REGISTRY already uses atomic operations and Mutex protection:
pub struct VerificationRegistry { proofs: Vec<OperationProof>, current_state: AtomicU64, }
2. Critical data structures like BlockCache and BufferManager are wrapped in Mutex:
pub struct Superblock { pub block_cache: Mutex<BlockCache>, pub buffer_manager: Mutex<BufferManager>, ... }
3. The scheduler (in scheduler.rs) is designed with multi-core in mind:
lazy_static! { pub static ref SCHEDULER: Mutex<Scheduler> = Mutex::new(Scheduler::new()); }
The main work needed for SMP is: - Per-core scheduling queues - Distributed verification chain generation - Cross-core memory barriers for proof validation - CPU-local operation proof caches
What are the tradeoffs to something like this? What attack vectors are still present and what attack vectors does this prevent? Thanks, cool project!
Thank you so much. Now let me break this questions down:
Key Security Benefits:
1. Cryptographic Verification:
- Prevents silent corruption of system state
- Makes tampering with system logs cryptographically difficult
- Provides verifiable audit trails of all system operations
- Enables detection of hardware memory faults
2. Runtime Integrity:
- Prevents invalid memory access patterns
- Ensures filesystem operations maintain consistency
- Verifies process state transitions
- Guards against buffer overflows in key subsystems
Main Tradeoffs:
1. Performance Impact: - 3-5% overhead for memory operations
- 7-9% overhead for filesystem operations
- Additional storage needed for proof chains
- Increased memory usage for verification structures
2. Complexity: - More complex memory management
- Additional failure modes to handle
- Higher system initialization overhead
- More complex recovery procedures
Attack Vectors Still Present:
- Physical hardware attacks (DMA, cold boot)
- Side-channel attacks
- Race conditions (though reduced by verification)
- Attacks that operate within valid operation boundaries
- Core CPU/firmware-level vulnerabilities
Attack Vectors Prevented/Mitigated:
- Memory corruption exploits
- Filesystem integrity attacks
- Unauthorized state transitions
- Historical state tampering
- Many types of privilege escalation
Im actively working on making the other attack vectors disappear as a whole. It's quite extensive as it is, so it's got a lot of things packed on it. ( * ´ ω ` * )
Crafted by Rajat
Source Code