Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
923 changes: 923 additions & 0 deletions data/codex-reliability-gap-map-01.json

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions docs/career/openai-applied-ai-codex/cover-note-one-page.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# OpenAI Applied AI Engineer, Codex Core Agent - Cover Note

Dear OpenAI Codex team,

I am applying for the Applied AI Engineer, Codex Core Agent role because my strongest recent work is about agent reliability: preventing AI-agent workflows from claiming "done" without evidence.

My current public proof is an Agent Reliability False-Pass Gate. It blocks unsupported success claims when passing-evidence pointers or explicit `cannot_claim` boundaries are missing. While building it, I found and fixed a real false-pass issue: an empty or missing fixture directory could make a self-test appear successful with `self_test=PASS cases=0`. I hardened the gate, added negative tests, and published the evidence path through merged PRs and a reviewer-readable proof page.

This experience is relevant to Codex work on evals, failure modes, edge cases, and dependable completion of software-engineering tasks. My work is local and narrow, not a claim of production-scale eval infrastructure, but it shows how I think about turning model behavior into dependable systems.

Best,
Yang Fei (Xiaojiu)
50 changes: 50 additions & 0 deletions docs/career/openai-applied-ai-codex/cover-note.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# OpenAI Applied AI Engineer, Codex Core Agent - Cover Note Draft

Target role: Applied AI Engineer, Codex Core Agent
Candidate positioning: AI Agent Reliability / Evals / Developer Tools Engineer
Contact: [insert private contact details only in final application copy]

## Short Cover Note

Dear OpenAI Codex team,

I am applying for the Applied AI Engineer, Codex Core Agent role because my strongest recent work is about agent reliability: preventing AI-agent workflows from claiming "done" without evidence.

My current public proof is an Agent Reliability False-Pass Gate. It blocks unsupported success claims when passing-evidence pointers or explicit `cannot_claim` boundaries are missing. While building it, I found and fixed a real false-pass issue: an empty or missing fixture directory could make a self-test appear successful with `self_test=PASS cases=0`. I hardened the gate, added negative tests, and published the evidence path through merged PRs and a reviewer-readable proof page.

This experience is relevant to Codex work on evals, failure modes, edge cases, and dependable completion of software-engineering tasks. My work is local and narrow, not a claim of production-scale eval infrastructure, but it shows how I think about turning model behavior into dependable systems.

Best,
Yang Fei (Xiaojiu)

## Longer Version

Dear OpenAI Codex team,

I am interested in the Applied AI Engineer, Codex Core Agent role because I have been working on a concrete reliability problem: how to make AI-agent closeouts inspectable enough that a reviewer can trust the evidence instead of the claim.

My recent public work is an Agent Reliability False-Pass Gate. The premise is simple: an agent saying "done" is not evidence. The gate checks whether a success claim has passing-evidence pointers and explicit `cannot_claim` boundaries. If those are missing, it blocks the closeout instead of allowing unsupported success language.

The most useful part of the project was catching a false-pass inside the gate itself. A missing or empty fixture directory could produce a fake pass with zero cases. I changed the behavior to fail closed, added regression coverage, and kept the proof narrow: local validation, merged PR evidence, remote CI status, provider output, and production claims are not collapsed into one generic success state.

I also built a candidate-review bridge for sanitized agent summaries and locked its provider/model boundary. The goal was not to add another model for its own sake. The goal was to preserve model review as advisory while keeping local verification, GitHub evidence, and human approval as separate gates.

I believe this experience is relevant to Codex agent work because reliability is not only about model capability. It is also about eval design, failure modes, edge cases, closeout discipline, and the systems that make the difference between an impressive demo and a dependable tool.

Best,
Yang Fei (Xiaojiu)

## Role Match Notes

- OpenAI role theme: improve Codex agents from impressive demos into dependable tools.
- Candidate proof theme: false-pass prevention for AI-agent closeouts.
- OpenAI role theme: evals, failure modes, edge cases, robustness, real-world coding tasks.
- Candidate proof theme: fail-closed validation, negative tests, zero-case fixture fix, explicit `cannot_claim` state.
- OpenAI role theme: define what good completion looks like for agents handling complex tasks.
- Candidate proof theme: a task is not complete until evidence, validation, and limitations are visible.

## Final Submission Gaps

- Insert private contact details only in the final application copy.
- Confirm the exact application destination before transmitting any personal details.
- Keep the proof language narrow: local and GitHub-level evidence, not production readiness or customer validation.
88 changes: 88 additions & 0 deletions docs/career/openai-applied-ai-codex/resume-draft.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Yang Fei (Xiaojiu) - OpenAI Codex Applied AI Resume Draft

Johor Bahru, Malaysia | Email: [private email] | Phone: [private phone] | LinkedIn: [add LinkedIn] | GitHub: https://github.com/yangfei222666-9
AI Agent Reliability / Evals / Developer Tools Engineer

## Target Role

Applied AI Engineer, Codex Core Agent
Official role page: https://openai.com/careers/applied-ai-engineer-codex-core-agent-san-francisco/

This draft is local prep material. It should be converted into a private final application copy before submission.

## Summary

AI agent reliability engineer focused on evals, failure analysis, and evidence-gated workflows. I build reproducible checks that prevent unsupported "done" claims, preserve explicit uncertainty boundaries, and separate local validation, remote CI, provider output, and canonical truth.

My current public proof is narrow and inspectable: a False-Pass Gate, a zero-case validation fix, a provider-locked candidate-review bridge, and a recruiter-readable proof page merged to GitHub main with CI evidence. I am interested in Codex work that turns agent capability into dependable completion of real software-engineering tasks.

## Selected Engineering Work

### Agent Reliability False-Pass Gate | Python, Pytest, GitHub Actions | 2026

- Built a schema-level evidence gate that rejects AI-agent success claims when required passing-evidence pointers or explicit `cannot_claim` boundaries are missing.
- Identified and fixed a zero-case validation flaw where missing or empty fixtures could incorrectly produce `self_test=PASS cases=0`; added fail-closed regression coverage.
- Designed a provider-locked candidate-review bridge with sanitized stdin-only inputs, credential isolation, no repository reads, and explicitly non-canonical model output.
- Published the implementation, reviewer guide, limitations, and reproducible validation path to GitHub main with remote CI passing.

### Product Spine / Reliability Tooling | Python, TypeScript, CI/CD | 2026

- Built and maintained evidence-first workflow artifacts that distinguish local evidence, remote CI, provider output, and canonical truth before making completion claims.
- Added proof-index and reviewer-facing documentation so claims can be inspected by status, evidence command, limitation, and `cannot_claim` boundary.
- Used GitHub PR, CI, local regression tests, and closeout records as separate gates instead of treating a local pass as final truth.
- Practiced fail-closed review behavior: blocked or partial states are preserved instead of being rewritten into unsupported success language.

## Technical Skills

- Languages and tools: Python, TypeScript, Bash, Git, GitHub CLI, JSON, Markdown.
- Testing and validation: Pytest, CLI self-tests, regression fixtures, CI/CD, GitHub Actions, proof-index validation.
- AI-agent reliability: false-pass prevention, evidence gates, agent closeout review, provider-output boundaries, `cannot_claim` handling.
- Developer tooling: command-line validators, reviewer docs, reproducible local setup, Git evidence hygiene.

## Experience

### Independent AI Systems Engineer | March 2026 - Present

- Built public proof around AI-agent reliability, false-pass prevention, and evidence-gated task closeouts.
- Converted agent workflow failures into testable validation rules, proof documents, and reviewer-readable engineering artifacts.
- Maintained strict boundaries around credentials, provider output, local validation, remote CI, and human approval.

### Prior Experience

[Add verified prior company, role, dates, and measurable bullets before final submission. Do not invent experience.]

## Education

[Add verified school, degree, field, and dates before final submission.]

## Selected Links

- Agent Reliability proof: https://github.com/yangfei222666-9/taiji/blob/main/docs/portfolio/agent-reliability-proof.md
- Main repository: https://github.com/yangfei222666-9/taiji
- One-line technical case: Agent said done. Where is the evidence?

## Interview Stories

### 1. Zero-case false-pass bug

A validator designed to block false-pass behavior could itself pass with zero cases when fixtures were missing or empty. I changed the behavior to fail closed, added regression coverage, and documented the limitation so reviewers could inspect the evidence.

### 2. Local pass is not remote truth

The project keeps local validation, remote CI, provider output, and canonical truth as separate states. This prevents a common AI-agent failure mode where a local check becomes an unsupported completion claim.

### 3. Provider output remains advisory

The candidate-review bridge uses sanitized input and explicit provider/model boundaries, but model output is not treated as canonical. The final gate remains evidence, CI, and human approval.

## Final Submission Gaps

- Add private contact details in the final application copy only.
- Add verified education.
- Add verified prior work experience, if applicable.
- Confirm work authorization and relocation/sponsorship language for the specific application form.
- Convert to a clean PDF only after private details and final links are confirmed.

## Boundaries

This material should not claim deployed-system readiness, external customer adoption, third-party endorsement, fleet-scale sandboxing, production eval infrastructure, or self-updating agent authority.
53 changes: 53 additions & 0 deletions docs/career/openai-applied-ai-codex/resume-one-page.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Yang Fei (Xiaojiu)

Johor Bahru, Malaysia | Email: [private email] | LinkedIn: [add LinkedIn] | GitHub: https://github.com/yangfei222666-9
AI Agent Reliability / Evals / Developer Tools Engineer

## Summary

AI agent reliability engineer focused on evals, failure analysis, and evidence-gated workflows. I build reproducible checks that prevent unsupported "done" claims, preserve explicit uncertainty boundaries, and separate local validation, remote CI, provider output, and canonical truth. Recent public work includes a merged False-Pass Gate with fail-closed regression coverage and reviewer-readable proof.

## Selected Engineering Work

### Agent Reliability False-Pass Gate | Python, Pytest, GitHub Actions | 2026

- Built a schema-level evidence gate that rejects AI-agent success claims when required passing-evidence pointers or explicit `cannot_claim` boundaries are missing.
- Identified and fixed a zero-case validation flaw where missing or empty fixtures could incorrectly produce `self_test=PASS cases=0`; added fail-closed regression coverage.
- Designed a provider-locked candidate-review bridge with sanitized stdin-only inputs, credential isolation, no repository reads, and explicitly non-canonical model output.
- Published the implementation, reviewer guide, limitations, and reproducible validation path to GitHub main with remote CI passing.

### Product Spine / Reliability Tooling | Python, TypeScript, CI/CD | 2026

- Built and maintained evidence-first workflow artifacts that distinguish local evidence, remote CI, provider output, and canonical truth before making completion claims.
- Added proof-index and reviewer-facing documentation so project claims can be inspected by status, command, limitation, and `cannot_claim` boundary.
- Used GitHub PR, CI, local regression tests, and closeout records as separate gates instead of treating a local pass as final truth.

## Technical Skills

Python, Pytest, CLI validation tools, TypeScript, GitHub Actions, CI/CD, Git, JSON evidence manifests, LLM/API boundary design, agent workflow auditing, eval-style regression checks, developer documentation.

## Experience

### Independent AI Systems Engineer | March 2026 - Present

- Built public proof around AI-agent reliability, false-pass prevention, and evidence-gated task closeouts.
- Converted agent workflow failures into testable validation rules, proof documents, and reviewer-readable engineering artifacts.
- Maintained strict boundaries around credentials, provider output, local validation, remote CI, and human approval.

### Prior Experience

[Add verified prior company, role, dates, and 2-3 measurable bullets before final submission. Do not invent experience.]

## Education

[Add verified school, degree, field, and dates before final submission.]

## Selected Links

- Portfolio proof: https://github.com/yangfei222666-9/taiji/blob/main/docs/portfolio/agent-reliability-proof.md
- Main repository: https://github.com/yangfei222666-9/taiji
- Target technical case: Agent said done. Where is the evidence?

## Boundaries

Current public proof supports local and GitHub-level evidence for false-pass prevention. It does not claim deployed-system readiness, external customer adoption, third-party endorsement, fleet-scale sandboxing, or production eval infrastructure.
Loading
Loading