Learning the Rules by Breaking Them: Exception-Aware Constraint Mining for Care Scheduling

A shift schedule can be perfectly valid and still be a terrible policy manual.

Consider a care-facility manager facing an unpleasant Wednesday: several employees have requested leave, available staffing barely covers demand, and somebody must work a day shift immediately after completing a night shift. The manager makes the assignment because residents still require care. The completed roster records what happened. It does not necessarily record what the facility considers acceptable under normal conditions.

A scheduling system trained to treat every historical assignment as an approved practice will learn the emergency along with the rule. Given enough automation, yesterday’s reluctant compromise becomes tomorrow’s standard operating procedure.

That is the problem addressed by Koki Suenaga, Tomohiro Furuta, and Satoshi Ono in A Study on Constraint Extraction and Exception Exclusion in Care Worker Scheduling.¹ Their method mines facility-specific scheduling constraints from historical rosters, filters patterns associated with staffing pressure and unusually restrictive leave requests, and then relaxes learned constraints gradually when they make a future schedule infeasible.

The paper’s most useful contribution is therefore not another solver. It is a mechanism for deciding which parts of operational history deserve authority.

A roster records policy, preference, and damage control

Automating care-worker scheduling is difficult because the formal requirements are only the beginning.

A facility must assign enough qualified staff, respect working-hour limits, cover night shifts, and process leave requests. It may also have less visible preferences: a worker should rest after a night shift, certain sequences should be avoided, night duties should be distributed fairly, and weekend staffing should reflect different service demand.

Many of these conditions differ by facility. A generic scheduler therefore requires managers to explain local rules before it can produce useful rosters. According to the paper, collecting those constraints through interviews creates a substantial implementation burden.

Historical schedules appear to offer an easier route. They already contain information about who works which shifts, which sequences occur, and how staffing changes across the week.

The tempting assumption is that recurring assignments reveal the facility’s preferred rules.

They reveal more than that.

Historical rosters also contain:

emergency assignments made during understaffing;
unusual sequences caused by concentrated leave requests;
compromises that managers accepted because no better schedule was feasible;
patterns that occurred only once and should not become general permissions.

Constraint mining without exception handling cannot distinguish a tolerated assignment from a preferred assignment. It sees only that the assignment occurred.

The paper’s mechanism addresses this ambiguity in three stages:

Historical rosters and leave requests → template-based constraint extraction → exception filtering → schedule generation with gradual relaxation

Each stage corrects a different failure mode. Templates make local rules extractable. Exception filters reduce the chance of learning crisis behaviour. Gradual relaxation prevents the learned rules from making future schedules impossible.

Four templates turn rosters into solver-ready rules

The proposed system does not ask a machine-learning model to infer an opaque scheduling policy. Instead, it uses four reusable constraint templates that define what kinds of patterns should be extracted.

Template	What it extracts	Operational information it can reveal
T1	Consecutive-day shift patterns for each individual worker	Which personal shift sequences appear acceptable, including rest patterns and limits on consecutive day shifts
T2	Consecutive-day shift patterns across all workers	Facility-wide sequences, including the structure of night assignments
T3	Monthly shift counts for each worker	Available shift types, expected working days, working-hour ranges, and night-shift distribution
T4	Required staff for each shift by weekday	Recurring demand differences between weekdays and weekends

The distinction between T1 and T2 matters.

T1 captures worker-specific patterns. One employee may be available only for day shifts, while another regularly covers nights. These extracted patterns become soft preferences in the scheduling experiment.

T2 searches for patterns that apply across the workforce. Because these patterns can represent structural rules—such as how night work spanning two calendar days must be recorded—the system initially treats them as hard constraints.

T3 and T4 extract counts rather than sequences. T3 can identify that an employee works part-time or should receive fewer night shifts. T4 can detect that the facility needs fewer day-shift workers on weekends while maintaining fixed night coverage.

Together, the templates cover much of the information the researchers obtained through manager interviews. They do not eliminate all interviews. Leave-request handling was added as a default constraint, easily clarified rules were still entered manually, and constraints for newly hired workers were created by mirroring similar employees.

That distinction is important for business adoption. The method changes interviews from an open-ended rule-discovery exercise into a review and completion exercise. The paper argues that this should reduce implementation effort, although it does not measure the time or cost saved.

The exception filter asks whether the roster was written under pressure

Templates solve the extraction problem. They do not solve the interpretation problem.

Suppose a day shift immediately after a night shift appears in a historical roster. A naïve extractor treats that sequence as evidence that the assignment is allowed. The proposed method first asks whether the sequence occurred when the facility had enough alternatives.

The paper uses three filters.

Staffing margin excludes low-choice days

For each day (d), the method calculates staffing margin as:

$$ u_d = \frac{a_d}{r_d} $$

where (a_d) is the number of available workers and (r_d) is the number required.

When the staffing margin falls below a threshold, patterns from that day are excluded from constraint extraction. For a multi-day sequence, the method uses the minimum staffing margin across the entire window.

This is a practical proxy for managerial choice. When ten workers are available for eight required positions, the resulting roster probably contains more discretion than a roster produced with exactly eight available workers.

It remains a proxy. Adequate headcount does not guarantee that every employee has the required skills, and a low-margin day may still contain legitimate rules. The filter does not identify the manager’s intent; it estimates whether the manager had room to express it.

Flexibility excludes unusually constrained employee-months

The method also calculates a worker’s flexibility from requested leave relative to assigned working days:

$$ u_f = 1 - \frac{N_d^{(r)}}{N_d^{(a)}} $$

where (N_d^{(r)}) is the number of requested leave days and (N_d^{(a)}) is the number of assigned days.

When flexibility falls below a threshold, that employee’s patterns for the month are excluded. The reasoning is straightforward: if a worker requests leave on many days, the remaining sequence may reflect calendar compression rather than a desirable working pattern.

Frequency removes patterns with weak historical support

The third filter removes patterns whose occurrence frequency falls below a threshold.

A rare pattern may be valid. It may also be an accident, an emergency, or a data anomaly. The method declines to elevate it into a reusable constraint without sufficient recurrence.

For the experiment, the authors set the occurrence-frequency threshold to (0.15), staffing-margin threshold to (1.25), and flexibility threshold to (0.5). These values were determined heuristically from the facility’s data and observed exceptional assignments.

That makes the filtering mechanism operationally understandable, but also site-dependent. A threshold suitable for one facility should not quietly become a universal definition of an exception.

Filtering history does not make history complete

Removing suspicious patterns creates a cleaner rule set. It can also make that rule set too restrictive.

A historical schedule contains only assignments that happened. A valid sequence that never occurred will be absent. If the solver interprets absence as prohibition, a newly required but reasonable assignment may become impossible.

The paper addresses this through gradual constraint relaxation.

Facility-wide sequence constraints extracted with T2 initially operate as hard constraints. If the solver cannot find a feasible schedule, it converts the longest T2 constraints into soft constraints and tries again. It continues with progressively shorter constraints until a solution becomes feasible.

Only after relaxing those learned sequence constraints does the method begin removing employee shift or leave requests.

This ordering creates a useful hierarchy:

Preserve essential staffing and working constraints.
Prefer historically supported sequences.
Relax the most detailed learned sequences when they block feasibility.
Sacrifice employee requests only if the schedule remains impossible.

Longer sequences are sensible candidates for earlier relaxation because they encode more specific historical combinations and are more likely to exclude unseen but legitimate alternatives. The paper uses this ordering rather than experimentally comparing alternative relaxation strategies, so it should be understood as a practical design choice rather than a proven optimum.

Gradual relaxation also produces a valuable operational signal. If a facility repeatedly needs to relax the same learned rule, the issue may be more than solver inconvenience. The historical policy may have changed, the threshold may be too strict, or staffing conditions may have deteriorated.

The experiment separates extraction quality from scheduling feasibility

The researchers evaluated the method using 48 months of data from a long-term care facility in Kagoshima City with approximately 20 staff members.

The first 36 months were used to extract constraints. The remaining 12 months were used to generate schedules.

The facility originally used 16 detailed shift types. Because there was insufficient data to learn reliable constraints at that level of detail, the experiment collapsed the day shifts into an abstract day-shift category while retaining separate night-shift categories for the facility’s two units. The method extracted personal and general sequence patterns spanning two to seven days.

This design makes the central mechanism testable, but it also means the experiment evaluates abstract monthly rostering rather than fully detailed shift construction.

The evaluation contains two distinct experiments with different purposes.

Test	Likely purpose	What it supports	What it does not prove
Constraint extraction from 36 historical months	Main evidence that templates can recover facility-specific patterns and that filtering changes the extracted rule set	The method extracts sequence, count, and weekday-demand constraints; exception filtering selectively removes personal sequence patterns	The remaining constraints are universally correct or complete
Scheduling across 12 held-out months	Main evidence that extracted constraints can support feasible future schedules	The solver produced schedules satisfying hard constraints, with gradual relaxation where required	The system is optimal or ready for every care facility
Comparison with the same method without exception exclusion	Ablation-style comparison isolating the filter’s effect	Exception filtering materially improves the rule limiting excessive consecutive day shifts	Every soft preference improves after filtering
Comparison with manager schedules	Operational reference point	The generated schedules satisfied all employee requests in the tested setup, while manager schedules averaged two unmet requests per month	Exception filtering caused that request-satisfaction improvement

This separation matters because several attractive findings arise from different parts of the system. Better handling of excessive consecutive day shifts supports the exception-filter mechanism. Full satisfaction of leave requests primarily reflects how the solver prioritized those requests.

Combining the two into a single claim of general superiority would be convenient. It would also be inaccurate.

Exception filtering removed 914 personal sequence constraints

The clearest extraction result appears in the number of T1 constraints.

Template	Without exception exclusion	With exception exclusion	Change
T1: individual consecutive-day patterns	5,867	4,953	914 fewer
T2: general consecutive-day patterns	1,845	1,845	No change
T3: monthly worker-level counts	96	96	No change
T4: weekday staffing counts	7	7	No change

Exception exclusion reduced the personal sequence library by approximately 15.6%. It did not indiscriminately prune every category.

The removed examples included a day shift immediately following a night shift and an isolated day shift surrounded by days off. Within the facility’s operating context, these were treated as undesirable patterns associated with exceptional conditions.

This result supports the mechanism at a useful level of specificity. The filters affected precisely the category most exposed to individual staffing pressure: worker-specific multi-day sequences.

It does not establish that all 914 removed patterns were mistakes. The paper has no independently labelled catalogue of “true exceptions” against which the filters can be scored. Their value is judged indirectly through the schedules generated afterward.

Seven of twelve months required the system to loosen its learned rules

The gradual-relaxation mechanism was not an ornamental fallback. It was needed in most evaluation months.

Strongest relaxation required	Number of months
No relaxation; strict constraints were feasible	5
Relax seven-day sequence constraints	3
Relax through six-day sequence constraints	1
Relax through five-day sequence constraints	2
Relax through four-day sequence constraints	1

All 12 months produced feasible schedules without relaxing beyond four-day patterns.

This distribution reveals a central tension in learning operational rules from history.

The extracted constraints were useful enough to guide scheduling, yet incomplete enough that strict enforcement failed in seven months. The solver needed permission to distinguish “historically unseen” from “operationally impossible.”

For businesses, this is the stronger lesson behind gradual relaxation. A learned policy should rarely be deployed as an indivisible block. Constraints need explicit strengths, an ordered relaxation procedure, and a record of which rules were overridden.

Otherwise, the system will either fail unnecessarily or relax conditions invisibly. Neither outcome is particularly impressive, although both can be described as automation in a sales presentation.

Figure 7 improves one crucial rule, not every soft preference

Both the exception-aware method and the variant without exception exclusion generated schedules with no hard-constraint violations.

The difference appears among the soft constraints.

The strongest improvement from exception exclusion concerns S5, the rule prohibiting excessive consecutive day shifts. The method without exception filtering produces substantially more S5 violations because emergency-driven long sequences remain in the extracted set of allowable patterns. Once those sequences are filtered, the solver is less willing to reproduce them.

This is the paper’s most persuasive evidence because the result follows the proposed mechanism directly:

Understaffing creates undesirable long sequences → naïve extraction treats them as allowed → exception filtering removes them → future schedules violate the consecutive-day preference less often.

The result chart does not show an across-the-board reduction in every soft-constraint category. In particular, violations of S3—the preference concerning a night-off-night sequence—are higher for the exception-aware method than for the variant without filtering.

That trade-off deserves attention. The filtered rule set improves the preference most directly related to the removed emergency patterns, but changing the allowable sequence space can shift violations elsewhere. Constraint systems allocate scarcity; they do not abolish it.

The proposed schedules also satisfied all leave and shift requests during the test period, compared with an average of two unmet vacation requests per month in the manager schedules. This is operationally appealing, but it should not be credited to exception filtering. Both proposed variants initially treated requests as hard constraints, making this result primarily evidence about the solver’s prioritization and the feasibility of the tested months.

The business value is cheaper rule discovery with a visible escape route

The paper is narrowly evaluated in care scheduling, but the mechanism maps onto a broader class of operational systems.

Many organizations possess years of decision records but incomplete documentation of the rules behind them. These records mix normal policy with crisis responses, local workarounds, and one-off exceptions. A system that mines them without context will automate all four.

The practical deployment model suggested by this paper is not “replace the manager with historical data.” It is a governed workflow for converting history into reviewable rules.

Paper mechanism	Business implementation	Potential value	Principal risk
Constraint templates	Define reusable families of sequences, counts, capacities, and eligibility rules	Reduces the need to discover every local rule from scratch	Templates may omit important rule categories
Staffing-margin and flexibility filters	Attach operational context to historical decisions before learning from them	Prevents some emergency compromises from becoming standard policy	Proxy thresholds may misclassify valid or harmful patterns
Human assignment of hard and soft strengths	Require managers to approve the authority of extracted rules	Keeps policy ownership visible	Review burden remains and may be inconsistently performed
Gradual relaxation	Specify which learned rules yield first when no feasible plan exists	Preserves feasibility without discarding the entire policy set	Poor relaxation order may sacrifice important preferences
Relaxation logs	Track frequently overridden constraints and investigate causes	Creates feedback for policy updates and operational diagnosis	Overrides may become routine without governance

The likely return on investment begins with lower configuration and maintenance effort. Instead of asking managers to enumerate every rule, a system can present extracted candidates, evidence of recurrence, and the conditions under which each candidate appeared.

The paper does not measure interview hours, implementation cost, schedule preparation time, staff satisfaction, or retention. Those remain business hypotheses requiring prospective measurement.

A sensible deployment should therefore track at least four operational metrics:

manager time spent reviewing and correcting generated schedules;
frequency and depth of constraint relaxation;
employee-request satisfaction and undesirable sequence rates;
changes in extracted constraints as staffing conditions evolve.

Repeated relaxation is particularly informative. It can indicate policy drift, worsening staffing shortages, a bad threshold, or a constraint that was never as universal as historical data suggested.

The method still requires managers, thresholds, and missing knowledge

The paper demonstrates a coherent mechanism, but its evidence establishes feasibility more clearly than generality.

First, the evaluation covers one Japanese long-term care facility with approximately 20 employees. Facilities with different qualifications, union rules, shift structures, or staffing volatility may produce different exception patterns.

Second, the exception thresholds were chosen heuristically. Staffing margin, leave-request flexibility, and frequency are understandable signals, but their appropriate cutoffs remain facility-specific. The study does not include a sensitivity analysis showing how results change under different thresholds.

Third, the original 16 shift types were simplified because the available data could not support reliable extraction at full detail. The authors argue that detailed day shifts can subsequently be assigned through relatively straightforward subdivision, but the experiment does not validate the complete end-to-end roster at that granularity.

Fourth, managers still determine important parts of the system. Constraint strengths were assigned manually. Easily clarified constraints were manually added. New employees inherited rules from similar workers. Leave and shift requests were included separately rather than learned from historical schedules.

Finally, the baseline is the same proposed method without exception exclusion. This isolates the filter’s effect, which is useful, but it does not compare the system with a mature commercial scheduler, alternative exception-detection methods, or a fully prospective deployment.

These boundaries do not weaken the paper’s core mechanism. They define its appropriate use: a promising constraint-discovery and scheduling framework that still requires local validation and operational governance.

Good automation learns which history had authority

Historical records are attractive because they appear to contain decisions without requiring anyone to explain them.

They also contain the consequences of shortages, deadlines, absent employees, and managers making the least bad choice available. Treating every recorded decision as policy is an efficient way to preserve institutional mistakes.

This paper proposes a more disciplined sequence.

Extract recurring rules through reusable templates. Distrust patterns created when operational choice was severely constrained. Then relax learned rules visibly when future reality no longer fits the past.

The result is not a scheduler that has discovered the facility’s complete hidden constitution. The result is a system that can turn historical rosters into a workable first draft of local policy, while retaining a structured way to question what it learned.

For operational automation, that is a considerably more credible ambition.

Cognaptus: Automate the Present, Incubate the Future.

Koki Suenaga, Tomohiro Furuta, and Satoshi Ono, “A Study on Constraint Extraction and Exception Exclusion in Care Worker Scheduling,” arXiv:2512.24853, 2025. View the paper on arXiv. ↩︎

A roster records policy, preference, and damage control#

Four templates turn rosters into solver-ready rules#

The exception filter asks whether the roster was written under pressure#

Staffing margin excludes low-choice days#

Flexibility excludes unusually constrained employee-months#

Frequency removes patterns with weak historical support#

Filtering history does not make history complete#

The experiment separates extraction quality from scheduling feasibility#

Exception filtering removed 914 personal sequence constraints#

Seven of twelve months required the system to loosen its learned rules#

Figure 7 improves one crucial rule, not every soft preference#

The business value is cheaper rule discovery with a visible escape route#

The method still requires managers, thresholds, and missing knowledge#

Good automation learns which history had authority#