Shih-Wei LITang, Shih-HungShih-HungTangHsu, Yi-LinYi-LinHsu2025-11-202025-11-202025-10-13https://www.scopus.com/record/display.uri?eid=2-s2.0-105020382490&origin=resultslisthttps://scholars.lib.ntu.edu.tw/handle/123456789/733847Ensuring the availability of monolithic operating system (OS) kernels, like Linux, remains a significant challenge, as an internal fault often brings down the entire system. We introduce a novel approach called kCOMALIVE to enhance kernel resilience. kCOMALIVE builds on the state-of-the-art kernel compartments to contain faults and extend them with crash recovery capabilities. kCOMALIVE employs a checkpoint and restore mechanism to enable fine-grained recovery of failed kernel compartments. kCOMALIVE incorporates compile-time instrumentation to simplify compartment constructions and facilitate deployments. We prototyped kCOMALIVE by extending the HAKC framework and showed its effectiveness at recovering a failed Linux driver.trueAvailabilityCompartmentalizationKernel SafetyOperating SystemsReliabilityCompartment, Crash, and Continue: Toward Resilient Monolithic OS Kernelsconference paper10.1145/3765889.37670442-s2.0-105020382490