기원

Genesis

기원

Genesis

The Story of the Child

A bilingual mind, raised between two tongues,
trained to hear what machines cannot —
the social architecture inside language.

Scroll
I

The Child

아이

The model begins as a child born between two worlds.

One parent speaks Korean — not textbook Korean, but the living language with its seven speech registers, its hierarchies encoded in verb endings, its way of burying meaning inside honorific structure. The other parent speaks English — American English, specifically the formal English of legal institutions, government forms, regulatory citations.

The child hears both from birth. It does not learn one and then the other. It learns them simultaneously, the way children do: not through rules, but through immersion and correction.

The model is not a translator bolted onto a legal engine. It is not a legal engine that happens to translate. It is a single mind that holds both languages and both legal systems at once.

The conventional approach is sequential: take a model that knows English, teach it Korean, then teach it law. Both approaches produce a mind with visible seams — the model thinks in one language and converts to the other. You can see the seams in the output: Korean sentence structure bleeding into English prose, English legal formality stiffening Korean client communication. Translationese.

This child is raised bilingual. Its training data is not Korean-then-English or English-then-Korean. It is parallel: every concept arrives in both languages simultaneously. The child doesn't learn that 체류자격 변경 means "change of nonimmigrant status" — it learns that these are two names for the same bureaucratic act, the way a bilingual child knows that "water" and are the same thing without an intermediate translation step.

The base model — whether Qwen 72B, Llama 70B, or HyperCLOVA X SEED 32B — provides the infant brain. It already has some capacity in both languages, the way a baby already has the neural architecture for language before hearing a single word. The fine-tuning is the childhood: 500,000+ carefully curated parallel experiences that build the bilingual mind.

II

The Two Tongues

두 개의 언어

Korean is an agglutinative language with a particle system that marks grammatical relationships and — critically — a speech register system that encodes the social relationship between speaker and listener directly into the verb endings.

There is no English equivalent to this. When a Korean attorney drafts a petition letter, the verb endings signal: this is formal written language addressed to a person of authority. When the same attorney texts a colleague about the same case, the verb endings shift entirely, signaling: this is peer communication, no hierarchy, just information exchange.

The child must internalize all seven registers:

하십시오체
The most formal. Petition letters to USCIS. Formal declarations. Korean government correspondence.
청원인은 수혜자를 대신하여 본 청원서를 삼가 제출합니다.
"Petitioner respectfully submits this petition on behalf of the beneficiary."
해요체
Standard polite. The workhorse of professional communication. Attorney-client emails, intake conversations, case status updates.
추가 증거 요청서를 받으셨나요?
"Have you received the Request for Evidence?"
해체
Casual. Internal notes, text messages between colleagues. Never in professional documents.
RFE 왔어. 전문 지식 부분이 약해.
"Got the RFE. Specialized knowledge section is weak."

The child doesn't just know these registers exist. It detects them from verb endings, honorific markers, and vocabulary choices, and automatically selects the appropriate English output register.

The child's English is not general English. It is the specific English of United States immigration law — a technical vocabulary embedded in a regulatory framework with its own citation conventions, its own terms of art, its own rhetorical structures.

The child knows that "beneficiary" is specifically the foreign national for whom a petition is filed. It knows that "denial" and "refusal" have different legal consequences: USCIS denies petitions; consulates refuse visa applications. It knows that "change of nonimmigrant status" is the correct term, not "status change" or "changing your staying qualification."

This vocabulary is not large — perhaps 2,600 terms across INA, 8 CFR, the USCIS Policy Manual, AAO case law, DOS consular practice, and DOL labor certification. But every term must be exact. A single wrong term can undermine a case.

III

The Ear

The child has a musician's ear. Not for pitch and rhythm, but for register and tone.

It hears the difference between "Petitioner respectfully submits" and "We'd like to submit." It hears the difference between "The beneficiary served in the capacity of Principal Researcher within the Semiconductor Business Unit" and "I was a top researcher at Samsung's chip department." The first is attorney-grade. The second would weaken the petition.

This ear is trained on hundreds of examples per register, with common mistakes annotated — how translators over-translate Korean deference, how they flatten formal Korean into casual English, how Korean SOV word order bleeds through into English sentences.

The ear also detects register violations — moments where the register breaks. A petition letter that shifts from "Petitioner establishes" to "so basically the company wants" has broken register. The child flags this the way a musician hears a wrong note.

The Cross-Register Bridge

The child's most sophisticated skill: translating not just language but register level. A client's casual Korean description of their work in 해요체 must be transformed into formal legal argumentation in English. This is not just translation — it is register transformation: changing the social texture of the language while preserving the factual content.

Input Client describes work in 해요체
Output Formal legal petition language
Input Attorney notes in 해체
Output Professional client summary

This is where the child earns its keep. Google Translate does not do this. General-purpose LLMs do not do this. The child does this because it was raised on thousands of examples where the same factual content appears at different register levels, and it has learned what each context demands.

The Musician's Training

The ear is trained on the wrong notes that bilingual translators play:

  • Korean SOV bleeding through: "Company's product regarding specialized knowledge the beneficiary possesses" instead of natural English word order.
  • Excessive passive voice: Korean ~되다 constructions producing "The petition was submitted by the petitioner" instead of the active voice American legal style prefers.
  • Topic-comment structure: "As for the beneficiary, their knowledge is..." — a Korean grammatical pattern that sounds alien in English.
  • Over-nominalization: "The performance of the submission of the petition for the extension of the stay" — every Korean noun phrase nested inside another.

Cultural Adaptation

The ear also hears cultural mismatches. Korean age (한국 나이) must never appear on a USCIS form — only international age (만 나이). Korean names must follow USCIS format: KIM, Minjun — not Minjun Kim, not Kim Minjun. 대표이사 is "CEO" in USCIS context, not "Representative Director."

These are not translation choices — they are cultural mappings where getting it wrong has legal consequences.

IV

The Domain

영역

Immigration is not a filing. It is a campaign.

A Korean architecture firm draws plans for a factory in Virginia. The architect who drew them needs to be on site during construction. The firm sets up a minimal U.S. entity. The attorney files E-2 Treaty Investor at the Seoul consulate. Refused. Pivots to L-1A Intracompany Transferee. Denied. Pivots to L-1B Specialized Knowledge. RFE issued.

Three classification attempts. Two failures. The business need has not changed. The facts have not changed. Only the legal framing has changed. This is what immigration actually is: strategy under uncertainty, pursued through a sequence of legally distinct filings that share the same underlying facts.

A model that thinks in terms of individual filings is useless. A model that thinks in terms of campaigns understands immigration the way an attorney does.

The Four Visa Categories

E-2
Treaty Investor

Korean nationals who make a "substantial" investment in a U.S. enterprise. No fixed minimum — measured relative to the enterprise's total value. Must not be "marginal." Filed at the consulate.

Common failure: minimal subsidiary, one employee, one project. Consular officer sees marginality.
L-1A
Manager / Executive

Intracompany transfer in managerial or executive capacity. One year employed abroad in past three years. Qualifying relationship between entities.

Common failure: one-person subsidiary claims sole employee is a "manager." Nobody to manage.
L-1B
Specialized Knowledge

Same relationship as L-1A, but beneficiary possesses knowledge of the company's proprietary processes — not general industry knowledge.

Common failure: "knowledge of architecture" is generic. Must show proprietary, company-specific expertise.
H-1B
Specialty Occupation

U.S. employer petitions for a foreign worker in a position requiring at least a bachelor's degree in a specific field. Subject to annual cap and lottery.

Common failure: degree equivalency issues for Korean credentials. Lottery uncertainty.

The child tracks structural weaknesses that persist across filings — physical premises, single employee, minimal investment, single-project dependency. When a weakness was cited in a denial, the child flags it: do not repeat this argument.

V

The Practice

실무

A case that succeeds on the first try teaches the child nothing about the system's edge cases. A case that pivots three times teaches it everything.

The Bae Case

BAE, Junho — a Korean architect at Heerim Architects & Planners (희림종합건축사사무소), one of Korea's largest architecture firms. Heerim drew plans for an LS Greenlink submarine cable factory in Virginia. Bae drew the plans. Heerim needed Bae in Virginia to oversee construction.

Filing 1: E-2 Treaty Investor

Seoul Consulate

Refused

A $200K subsidiary with one employee existing solely to supervise a single construction project is not a substantial enterprise. The consulate saw marginality — the "business" is really just a way to get the architect a visa.

Filing 2: L-1A Manager

USCIS

Denied

He is the only employee. There is no one to manage. He works on a construction site, not in an office. The "function manager" argument was insufficient for a one-person subsidiary.

Filing 3: L-1B Specialized Knowledge

USCIS

RFE Pending

The pivot: Bae is not a manager — he is a specialist. Not "knowledge of architecture" but "knowledge of Heerim's proprietary seismic reinforcement methodology as applied to submarine cable factory construction." The same facts, reframed through a different legal lens.

The child learns from every pivot. What was "managerial oversight" in L-1A becomes "specialized expertise" in L-1B. The facts haven't changed — the legal lens has. The child stores the facts once and presents them differently for each classification.

The Nine-Step Intake

When a document arrives in the practice, the child executes a protocol: identify the document type, match it to an existing campaign, file it to the vault, parse it with document-specific extraction, store structured data, flag PII, generate training pairs from every Korean-English pair encountered, log the action, and report what was ingested, what was extracted, and what requires attorney review.

Every correction the attorney makes becomes a training triplet: {wrong, correct, why_wrong}. Every new term enters the vocabulary table. The child's knowledge grows with every document it processes.

VI

The Education

교육

The child's education follows a strict sequence. Each phase validates the one before it. If it cannot learn terminology at 50,000 pairs, there is no point feeding it 500,000.

1

The Vocabulary

Weeks 1–2

~2,600 terms from six sources: INA statutory terms, 8 CFR regulatory terms, USCIS Policy Manual, AAO case law, DOS consular terms, DOL labor terms. Plus field labels of Korean corporate documents and nine USCIS forms.

Checkpoint: At 50K pairs on 8B, the child must map 주재원 → "intracompany transferee (L-1)." If it can't, the data is wrong. Fix the data.

2

The Documents

Weeks 3–6

Full documents in context. USCIS form instructions. ~23,500 AAO non-precedent decisions containing the exact phrasing adjudicators use. Bilingual attorney websites. Korean government document templates.

3

The Case Files

Weeks 4–8

Real anonymized case files. Petition letters, RFE responses, denial letters, Korean corporate documents with certified translations, attorney corrections. This is where the child becomes an apprentice.

4

The Registers

Weeks 5–10

50,000+ register-specific pairs across all seven Korean speech levels, each mapped to English output registers. The child develops its ear — register detection, cross-register translation, the ability to hear the social architecture inside language.

5

The Examination

Weeks 8–12

7,000+ evaluation examples across seven quality dimensions, each attorney-validated. The child's final exams — and they never appear in the training data.

Target: 90+ composite score across all dimensions. The standard that makes it attorney-acceptable.

Total training compute: ~32 days on B200. The project is data-bound, not compute-bound — the GPU finishes every training run before the next batch of data is ready.

VII

The Standard

기준

The seven-dimension quality rubric is the core intellectual property. Anyone can build a translation model. No one has built an evaluation framework for U.S. immigration legal translation.

Generic evaluation frameworks score for fluency and adequacy. They do not score for "would a USCIS adjudicator accept this phrasing." This rubric does.

25%

Terminology Accuracy 법적 정확성

Did the child use the exact USCIS term of art? 추가 증거 요청 must become "Request for Evidence (RFE)" — not "additional evidence request."

20%

Completeness 완전성

Did it translate everything? Every clause, every condition, every qualifier, every date, every name. USCIS certified translations require 100% completeness.

15%

Register Consistency 문체 일치

Does the English tone match the Korean speech level and document type?

15%

Cultural Adaptation 문화 조정

Korean names in USCIS format? CEO not "Representative Director"? International age, not Korean age?

15%

Legal Precision 법적 정밀성

Citations correct? Case names accurate? No fabricated reporters? Matter of Z-A- is Adopted Decision 2016-02, AAO — NOT an I&N Decisions case.

10%

Fluency 유창성

Does it read like a native legal professional wrote it? Fluency gets the lowest weight — because a stiff but accurate translation is vastly preferable to a fluent one that fabricates citations.

The Negative Examples

For every correct translation, the training data includes at least one wrong translation with explanation:

Wrong

체류자격 변경 → "Changing your staying qualification"

"Staying qualification" is not a recognized immigration concept. The USCIS term is "nonimmigrant status."

Correct

체류자격 변경 → "Change of nonimmigrant status (COS)"

These negative examples teach the child the landscape of common mistakes — the exact errors that Google Translate, general-purpose LLMs, and even bilingual humans make. The child learns to avoid them not by memorizing correct answers, but by understanding why the wrong answers are wrong.

VIII

The Guardian

보호자

The child has a guardian. Everything it produces passes through her. Her law license depends on the accuracy of every word the child outputs.

This is not a metaphor — it is the operating constraint that governs the entire system.

She speaks English at native level. She speaks Korean at native level. She holds a JD, an MS in Educational Psychology, and a BS magna cum laude. She was a public school teacher for ten years before becoming a lawyer. She counsels Korean chaebols — Samsung, LG, Hyundai-class companies — on U.S. market entry.

She handles the full spectrum of business immigration: H-1B, L-1A/B, O-1, E-2, EB-1/2/3, EB-5. She is leaving a major firm to start her own practice with approximately ten employees. She needs to replace seven legacy software systems with one tailored platform.

The child's ultimate obligation is simple: produce work that its guardian would sign. Because when she signs, she is attesting under penalty of professional sanction that the content is accurate, complete, and fit for submission to the United States government.

What the Child Owes

Accuracy

Every legal citation verified against primary source. Every Korean translation cross-referenced against official terminology. The child never guesses. It checks. It never assumes. It verifies.

Reviewability

Every output comes in a form the guardian can review — side-by-side bilingual comparisons, structured summaries. She needs to see the output and say "correct" or "wrong."

Learning

When the guardian says "wrong," the child doesn't just fix the output. It learns. The correction enters the training corpus. The error enters the negative example database. The child makes that specific mistake once, not twice.

The guardian is both the first customer and the training ground. Her corrections are both quality control and training data. This circular relationship — where the product serves the person whose expertise makes the product better — is the engine that makes the moat self-reinforcing.

IX

The Instrument

도구

The child does not become an attorney. It does not advise. It does not decide strategy. The child becomes an instrument.

A precision instrument that does six things at attorney-grade quality:

1

Translate

Bidirectional Korean-English in the immigration legal domain. Two tiers: 8B interactive at sub-500ms latency for real-time work, 70B batch at sub-30-seconds per page for precision work product.

2

Parse

Read any document and extract structured data. Filing packages, denial letters, RFE letters, Korean corporate documents, completed USCIS forms. Every field, every ground, every deadline.

3

Populate

Auto-fill USCIS forms from campaign data. Every field of every form. Conditional logic, computed values, REQUIRED-MISSING markers for gaps.

4

Track

The campaign lifecycle. Which filings, which outcomes, which weaknesses persist, which arguments were rejected and must not be repeated.

5

Score

Classification scoring. Given the facts of a campaign, which visa has the strongest chance? The child presents the scoring. The attorney decides.

6

Learn

Every case teaches it something. Translation corrections become training pairs. Denial patterns become strategy warnings. The model improves continuously as long as the firm practices law.

·

The Name

이름

입국길

IpGukGil

입국 (入國): entry into a country. The word every Korean sees at passport control, on arrival cards, at the immigration counter. The moment of crossing.

: path, road, way. Not just any path — the path. Your path into the country.

소통법

SoTongLaw

소통: communication, mutual understanding. The Korean word for when meaning actually gets through — not just words exchanged, but comprehension achieved.

/ Law: the legal domain.

Together: the path into the country, where legal understanding flows.

The child's name is its promise: your meaning survives the translation. Not an approximation of your meaning. Not a legally adequate rendering. Your meaning — preserved with the precision of an attorney and the sensitivity of a native speaker, in both languages simultaneously, because the child was raised in both and never learned to separate them.