18-Digit Chinese Business Registration Number: Format Guide
By ChineseCheck Research Team
You received an 18-character string from a Chinese supplier. It has numbers, a few letters, and you were told it is the company's registration number. Before you send a wire, sign an MOU, or file customs paperwork, you need to know one thing: is the format even valid?
The 18-digit Chinese business registration number, formally the Unified Social Credit Code (USCC, 统一社会信用代码), is not a random string. It is a strictly specified identifier defined by the national standard GB 32100-2015. Every character has a meaning. Every position encodes a fact about the entity. And the final character is a mathematical check digit that lets you detect typos, OCR errors, and crude forgeries in under a second — without even calling a government API.
This guide is the technical companion to our unified social credit code overview. Instead of explaining what a USCC is, it shows you how to decode and validate one the way a backend engineer, compliance analyst, or paranoid procurement officer would. We will walk through the byte-level structure, the allowed character set, the checksum formula (with a full worked example), Python and JavaScript reference implementations, and the hard truth about what a valid format can — and cannot — prove.
Why 18 characters, and what each one does
Before 2015, Chinese entities carried a mess of identifiers: a 15-digit business registration number from AIC (the old Administration for Industry and Commerce), a 9-digit organization code from the Organization Code Center, a separate tax registration number, a social insurance ID, and a statistical ID. Importers and banks routinely received three different numbers for the same company, and nobody could cross-reference them.
In June 2015, the State Council issued Document No. 33 ([2015]33号), titled Program for the Reform of Unified Social Credit Codes for Legal Persons and Other Organizations. It mandated a single 18-character identifier and set a hard migration deadline. Every enterprise, sole proprietorship, partnership, social organization, and public institution now carries one — printed on the business license, embedded in the tax system, and indexed by the National Enterprise Credit Information Publicity System (GSXT) at gsxt.gov.cn.
The 18 characters split into five functional blocks, in this order:
Position: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| | |-----------| |---------------------------| |
| | | | |
| | | | +-- Check digit (mod 31)
| | | +-- Organization identifier (9 chars)
| | +-- Administrative division code (GB/T 2260, 6 digits)
| +-- Organization category code (1 char)
+-- Registering agency code (1 char)
Read left to right, the code tells you: who registered this entity, what legal form it takes, where it is based, its unique ID within that locality, and a checksum that proves the previous 17 characters have not been corrupted. That is a lot of information packed into 18 bytes.
Scope of GB 32100-2015
The standard covers legal persons, sole proprietorships, partnerships, branch offices, farmer cooperatives, social organizations, foundations, private non-enterprise units, mass organizations, grass-roots autonomous organizations, and public institutions. It does not cover natural persons — individual citizens use a separate 18-digit resident ID number with a different checksum rule.
Character 1: Registering agency code
The first character identifies which government agency registered the entity. It answers the question: which ministry's database will recognize this ID?
| Code | Agency | Scope |
|---|---|---|
| 1 | 机构编制 (Central Organization Staffing Commission) | Public institutions, government-affiliated units |
| 5 | 民政 (Civil Affairs, MCA) | Social organizations, foundations, private non-enterprise units |
| 9 | 工商 / 市场监管 (SAMR, State Administration for Market Regulation) | Enterprises, sole proprietorships, partnerships, farmer cooperatives |
| Y | 其他 (Other / reserved) | Reserved for agencies not otherwise listed |
For international buyers, the only code you will normally see on a supplier's business license is 9 — because SAMR (formerly AIC, merged into SAMR in 2018) is the registrar for every commercial enterprise. If the first character is 5, you are looking at an NGO or foundation. If it is 1, you are looking at a public institution. In all three cases, these are not for-profit companies and should not be issuing commercial invoices for export orders.
A USCC that begins with 2, 3, 4, 6, 7, 8, or any letter other than Y in the first position is malformed. Reject it.
Character 2: Organization category code
The second character narrows the legal form within the registering agency. Because the meaning of digit 2 depends on the value of digit 1, you must read them as a pair.
When char 1 = 9 (SAMR):
| Code | Category |
|---|---|
| 1 | Enterprise (有限责任公司, 股份有限公司, 外商投资企业, etc.) |
| 2 | Sole proprietorship (个人独资企业) |
| 3 | Partnership (合伙企业) |
| 4 | Farmer specialized cooperative (农民专业合作社) |
When char 1 = 5 (Civil Affairs):
| Code | Category |
|---|---|
| 1 | Social organization (社会团体) |
| 2 | Private non-enterprise unit (民办非企业单位) |
| 3 | Foundation (基金会) |
When char 1 = 1 (Staffing Commission):
| Code | Category |
|---|---|
| 1 | Agency-type public institution |
| 2 | Public institution |
| 3 | Mass organization |
For most export/import due diligence, the pair 91 (SAMR + Enterprise) is what you expect to see. A pair like 92 (sole proprietorship) or 93 (partnership) is not wrong, but changes your risk profile — sole proprietors carry unlimited personal liability but also typically have smaller balance sheets.
Characters 3–8: Administrative division code (GB/T 2260)
Positions 3 through 8 are six digits drawn from GB/T 2260, the national standard for administrative division codes. Read them in three two-digit chunks:
Positions 3-4: Province / Provincial-level region (00–99)
Positions 5-6: Prefecture-level city
Positions 7-8: County / District
A few province prefixes you will see constantly in exporter data:
| Prefix | Province | Manufacturing notes |
|---|---|---|
| 11 | Beijing | Services, tech, HQ registrations |
| 31 | Shanghai | Trading companies, logistics, biotech |
| 33 | Zhejiang | Small goods, Yiwu, home textiles |
| 35 | Fujian | Footwear, stone, tea |
| 37 | Shandong | Machinery, chemicals, agri |
| 44 | Guangdong | Electronics, plastics, apparel |
| 51 | Sichuan | Western manufacturing hub |
| 50 | Chongqing | Automotive, electronics |
If positions 3–4 are 44, the registrar is somewhere in Guangdong. Positions 5–6 then tell you whether it's Guangzhou (01), Shenzhen (03), Dongguan (19), Foshan (06), and so on. Positions 7–8 pin it to a specific district or county.
This matters for risk analysis. A company whose license says "Shenzhen, Guangdong" but whose USCC division code resolves to a tiny inland county in Guizhou is either (a) using a registered address in Guizhou for tax reasons, (b) not the company you think it is, or (c) an outright fabrication. The division code is one of the easiest places to catch a forged license.
GB/T 2260 is updated periodically
The Ministry of Civil Affairs republishes division codes when districts split, counties merge, or new development zones are created. If you are building a validator, source your lookup table from mca.gov.cn or the National Bureau of Statistics and refresh it annually — otherwise brand-new districts will throw false negatives.
Characters 9–17: Organization identifier
The middle nine characters (positions 9–17) are the organization's unique ID within its administrative division. This block inherits directly from the old 9-digit organization code (组织机构代码) system issued by the former Organization Code Center — for entities registered before 2015, it is usually literally the same 9 characters they had before, just wrapped in the new format.
Crucially, this block may contain letters as well as digits. That is what makes a USCC look different from a plain 18-digit number: you will often see strings like MA0193R1 or 686571Q in the middle. Those are legitimate.
The letter positions are not random — they were assigned by the old organization code system using its own mod-11 checksum, which is why pre-2015 entities almost always have a letter at position 17 (the old check digit location). Newer entities, particularly those issued after 2018, frequently start with MA at positions 9–10 — a convention introduced when the numeric space began running out in high-registration cities.
Character 18: The check digit
The final character is computed from the preceding 17 using a mod-31 weighted sum. It is the whole reason you can catch a typo before calling GSXT. We dedicate a full section to the math below — skip to The check digit formula explained if you want the recipe now.
The allowed character set
GB 32100-2015 specifies a 31-symbol alphabet for positions 9–18 (the organization identifier plus the check digit):
0 1 2 3 4 5 6 7 8 9
A B C D E F G H J K L M N P Q R T U W X Y
That is 10 digits plus 24 uppercase letters. Five English letters are excluded: I, O, S, V, Z. The reason is visual ambiguity and legacy compatibility:
- I looks like the digit 1
- O looks like the digit 0
- S looks like the digit 5 in some typefaces
- V looks like U
- Z looks like the digit 2
Positions 1–8 use only digits (0–9). Lowercase letters are never allowed anywhere in a USCC. If your supplier sends 91440300ma5ehghx7a, that is a copy-paste of a real code but mechanically invalid as written — you must uppercase it before validating.
This gives you three instant format checks you can run before any math:
- Is the length exactly 18 characters?
- Are positions 1–8 all digits?
- Are positions 9–18 drawn only from
0-9A-HJ-NP-RTU-Y(no I, O, S, V, Z, and no lowercase)?
If any answer is no, stop. The code is malformed.
The check digit formula explained
This is the heart of the standard. GB 32100-2015, Section 5.3, defines the check digit C18 as:
C18 = 31 − ( Σᵢ₌₁¹⁷ Cᵢ × Wᵢ ) mod 31
Where:
Cᵢis the numeric value of the character at positioni. For digits, this is the digit itself (0–9). For letters, it is the index in the allowed alphabet:A=10,B=11,C=12,D=13,E=14,F=15,G=16,H=17,J=18,K=19,L=20,M=21,N=22,P=23,Q=24,R=25,T=26,U=27,W=28,X=29,Y=30.Wᵢis a fixed weight for positioni, drawn from the standard's weight vector:
Position i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Weight Wᵢ | 1 | 3 | 9 | 27 | 19 | 26 | 16 | 17 | 20 | 29 | 25 | 13 | 8 | 24 | 10 | 30 | 28 |
These weights are the powers of 3 modulo 31 (3⁰, 3¹, 3², …, 3¹⁶). The choice of 31 as the modulus is what lets the alphabet carry 31 symbols (0–9 plus 21 non-ambiguous letters). The final result — the check digit C18 — is itself a value 0–30, which maps back to one of those 31 symbols using the same table.
Special case: if the weighted sum leaves a remainder of 0, then 31 − 0 = 31. The standard specifies that in this (rare) case, the check digit is written as Y — but most modern implementations avoid this edge case because the valid domain of C18 is [0, 30] after taking mod 31 of the final subtraction. In practice, if you see a code ending in Y, recompute carefully.
Worked example: decoding 91440300MA5EHGHX7A
Let's take a real-looking Shenzhen USCC and verify it character by character. We will decode its meaning, then compute the check digit and confirm the last character.
The code: 91440300MA5EHGHX7A
Step 1: Structural decode
| Position | Char | Meaning |
|---|---|---|
| 1 | 9 | SAMR registered |
| 2 | 1 | Enterprise |
| 3–4 | 44 | Guangdong Province |
| 5–6 | 03 | Shenzhen city |
| 7–8 | 00 | City-level direct (not a specific district) |
| 9–17 | MA5EHGHX7 | Organization identifier (MA-prefix, post-2018 issuance) |
| 18 | A | Check digit — we will verify this below |
So structurally: this is a SAMR-registered enterprise in Shenzhen, Guangdong, with an MA-prefix organization code suggesting registration after roughly 2018. All three format pre-checks pass: length 18, positions 1–8 all digits, positions 9–18 all in the allowed alphabet.
Step 2: Convert each of the first 17 characters to its numeric value
Using the alphabet map (A=10, M=21, E=14, G=16, H=17, X=29):
| i | Char | Value Cᵢ |
|---|---|---|
| 1 | 9 | 9 |
| 2 | 1 | 1 |
| 3 | 4 | 4 |
| 4 | 4 | 4 |
| 5 | 0 | 0 |
| 6 | 3 | 3 |
| 7 | 0 | 0 |
| 8 | 0 | 0 |
| 9 | M | 21 |
| 10 | A | 10 |
| 11 | 5 | 5 |
| 12 | E | 14 |
| 13 | H | 17 |
| 14 | G | 16 |
| 15 | H | 17 |
| 16 | X | 29 |
| 17 | 7 | 7 |
Step 3: Multiply by weights and sum
| i | Cᵢ | Wᵢ | Cᵢ × Wᵢ |
|---|---|---|---|
| 1 | 9 | 1 | 9 |
| 2 | 1 | 3 | 3 |
| 3 | 4 | 9 | 36 |
| 4 | 4 | 27 | 108 |
| 5 | 0 | 19 | 0 |
| 6 | 3 | 26 | 78 |
| 7 | 0 | 16 | 0 |
| 8 | 0 | 17 | 0 |
| 9 | 21 | 20 | 420 |
| 10 | 10 | 29 | 290 |
| 11 | 5 | 25 | 125 |
| 12 | 14 | 13 | 182 |
| 13 | 17 | 8 | 136 |
| 14 | 16 | 24 | 384 |
| 15 | 17 | 10 | 170 |
| 16 | 29 | 30 | 870 |
| 17 | 7 | 28 | 196 |
Sum: 9 + 3 + 36 + 108 + 0 + 78 + 0 + 0 + 420 + 290 + 125 + 182 + 136 + 384 + 170 + 870 + 196 = 3007
Step 4: Take mod 31 and subtract from 31
3007 mod 31 — we compute: 31 × 97 = 3007, so 3007 mod 31 = 0.
C18 = 31 − 0 = 31
Per the standard's edge-case rule, a result of 31 maps to the symbol at index 0 (i.e., the check digit wraps to 0) in some implementations, or to Y in the literal reading of the formula. The supplied last character is A, which would correspond to the value 10.
Verdict: in this synthetic demonstration, the last character does not match the computed check digit. If you received a real code like this from a supplier, the checksum failure means one or more of the preceding 17 characters was mistyped — most likely a single-character OCR or transcription error. Ask for the code again, ideally as a screenshot of the business license rather than hand-typed text.
This is exactly what the check digit is for: catching a single-character slip before you wire funds.
The point of the example
The arithmetic itself is the method. For any 18-character string, performing these four steps — decode, look up values, multiply by weights, mod 31 — will tell you in under a minute whether the code is internally consistent. Supply chain fraudsters who generate fake business licenses rarely bother to compute a valid check digit. Format validation alone rejects a surprising fraction of crude fakes.
How to calculate the check digit manually, step by step
Pulling the method out of the example:
- Uppercase and strip whitespace. Lowercase letters are not valid — force uppercase before doing anything else. Also strip spaces, dashes, or stray punctuation.
- Length check. Must be exactly 18 characters.
- Position 1 check. Must be one of
1,5,9, or (rarely)Y. - Positions 1–8 check. All must be digits
0–9. - Positions 9–18 alphabet check. All must be in
0-9A-HJ-NP-RTU-Y. No I, O, S, V, Z. No lowercase. - Numeric conversion. Convert each of the first 17 characters to its numeric value using the alphabet table above.
- Weighted sum. Multiply position
iby weightWᵢand sum all 17 products. - Mod 31. Take the sum modulo 31.
- Subtract from 31. The check digit value is
(31 − remainder) mod 31. - Map back to a symbol. Convert the value 0–30 back to a character via the alphabet table.
- Compare. Does it match position 18 of the input? If yes, format is valid. If no, reject.
On paper this takes a practiced person about two minutes per code. In code, it takes microseconds. Which brings us to…
Python reference implementation
Here is a compact Python validator that performs every check described above:
# uscc_validator.py
# Validates the 18-character Unified Social Credit Code
# per GB 32100-2015.
ALPHABET = "0123456789ABCDEFGHJKLMNPQRTUWXY"
WEIGHTS = [1, 3, 9, 27, 19, 26, 16, 17, 20, 29, 25, 13, 8, 24, 10, 30, 28]
AGENCY_CODES = {"1", "5", "9", "Y"}
def char_to_value(c: str) -> int:
"""Return the numeric value of a USCC character, or raise ValueError."""
if c not in ALPHABET:
raise ValueError(f"invalid character: {c!r}")
return ALPHABET.index(c)
def validate_uscc(code: str) -> tuple[bool, str]:
"""
Validate an 18-character USCC.
Returns (is_valid, reason). reason is empty on success.
"""
code = code.strip().upper()
if len(code) != 18:
return False, f"length must be 18, got {len(code)}"
if code[0] not in AGENCY_CODES:
return False, f"invalid agency code at pos 1: {code[0]!r}"
if not code[:8].isdigit():
return False, "positions 1-8 must be digits"
for i, c in enumerate(code[8:], start=9):
if c not in ALPHABET:
return False, f"invalid character at pos {i}: {c!r}"
try:
total = sum(char_to_value(code[i]) * WEIGHTS[i] for i in range(17))
except ValueError as e:
return False, str(e)
computed = (31 - total % 31) % 31
expected = char_to_value(code[17])
if computed != expected:
return False, (
f"check digit mismatch: "
f"computed {ALPHABET[computed]!r}, found {code[17]!r}"
)
return True, ""
if __name__ == "__main__":
test_cases = [
"91110000100000825H", # common test vector
"91440300MA5EHGHX7A", # synthetic example from the article
"abc", # too short
"91440300MA5EHGHX71", # wrong check digit
]
for c in test_cases:
ok, reason = validate_uscc(c)
print(f"{c:20s} ok={ok} {reason}")
A few notes on the implementation:
ALPHABET.index(c)gives us the numeric value for free — the position of a character in the 31-symbol string is exactly its weight contribution value. No separate lookup table needed.- The
(31 - total % 31) % 31double-mod is what handles the edge case wheretotal % 31 == 0: the naive expression would give31, but wrapping with% 31brings it back to0, which is the correct digit. - We uppercase on entry so callers don't have to pre-process their input.
JavaScript reference implementation
For frontend form validation or Node.js pipelines:
// usccValidator.js
// Validates the 18-character Unified Social Credit Code
// per GB 32100-2015.
const ALPHABET = "0123456789ABCDEFGHJKLMNPQRTUWXY";
const WEIGHTS = [1, 3, 9, 27, 19, 26, 16, 17, 20, 29, 25, 13, 8, 24, 10, 30, 28];
const AGENCY_CODES = new Set(["1", "5", "9", "Y"]);
export function validateUSCC(code) {
if (typeof code !== "string") {
return { valid: false, reason: "input must be a string" };
}
const clean = code.trim().toUpperCase();
if (clean.length !== 18) {
return { valid: false, reason: `length must be 18, got ${clean.length}` };
}
if (!AGENCY_CODES.has(clean[0])) {
return { valid: false, reason: `invalid agency code at pos 1: ${clean[0]}` };
}
if (!/^\d{8}/.test(clean)) {
return { valid: false, reason: "positions 1-8 must be digits" };
}
for (let i = 8; i < 18; i++) {
if (ALPHABET.indexOf(clean[i]) === -1) {
return {
valid: false,
reason: `invalid character at pos ${i + 1}: ${clean[i]}`,
};
}
}
let total = 0;
for (let i = 0; i < 17; i++) {
total += ALPHABET.indexOf(clean[i]) * WEIGHTS[i];
}
const computed = (31 - (total % 31)) % 31;
const expected = ALPHABET.indexOf(clean[17]);
if (computed !== expected) {
return {
valid: false,
reason: `check digit mismatch: computed ${ALPHABET[computed]}, found ${clean[17]}`,
};
}
return { valid: true, reason: "" };
}
// Usage:
// const { valid, reason } = validateUSCC("91440300MA5EHGHX7A");
// if (!valid) console.error(reason);
This version returns a structured object instead of a tuple, which plays nicer with React form libraries like react-hook-form or Zod. Drop it into your supplier-onboarding form and you will block malformed codes at the client before a single API call.
Common format mistakes (and why they happen)
Over thousands of supplier-intake records we've reviewed, these are the failure modes that come up again and again. Train your intake team to catch them.
1. Lowercase letters. Someone copy-pasted the code from a PDF and the M came through as m. The string is otherwise correct but will fail alphabet check. Fix: uppercase on entry.
2. Banned letters substituted for digits. A typist reads the business license photo and types O (capital O) instead of 0 (zero), or I instead of 1. The string is 18 characters long but contains a forbidden letter in the organization-identifier block. Fix: catch I, O, S, V, Z before checksum math.
3. Confused digits and letters. 5 vs S, 2 vs Z, 8 vs B. The USCC alphabet was specifically designed to avoid these — but handwritten licenses, poor-quality photocopies, and fax transmissions reintroduce them.
4. 15-digit legacy number passed as 18-digit USCC. Some older companies still quote their pre-2015 AIC registration number. This is 15 digits, all numeric. If someone pads it with 000 to make 18, the check digit will fail. Fix: length check first; if 15 digits and all numeric, politely ask for the post-reform USCC.
5. Wrong region in the administrative division code. The company claims to be in Guangzhou (4401) but positions 3–6 of the USCC are 3301 (Hangzhou, Zhejiang). The checksum may still pass — the math only verifies internal consistency, not external truth — but the geographic mismatch is a red flag worth investigating.
6. Copy-paste with invisible characters. WeChat and QQ message copies sometimes include zero-width joiners, non-breaking spaces, or full-width characters. Visually the string looks correct; programmatically it is not 18 characters. Fix: trim() plus a normalize-to-ASCII step before validation.
7. Rendered PDF ligatures. Some export quality business-license PDFs render certain letter pairs as typographic ligatures. Extraction tools sometimes spit out two chars where one is expected, or vice versa. Fix: visually spot-check PDF-extracted codes against the original image.
When a valid format still means a fake company
Here is the part that should concern you most: the check digit only proves the 18 characters are self-consistent. It does not prove the company exists.
A forger with a calculator can generate millions of valid-format USCCs in an afternoon. The math is public, the standard is freely available, and online USCC generators are a single search away. Anyone running a shell-company scam can trivially print a fake business license with a format-valid USCC on it.
What format validation does catch:
- Honest typos in good-faith communication
- Crude fakes where the forger didn't bother with math
- OCR errors in document pipelines
- Data-entry mistakes in your own CRM
What format validation does not catch:
- Custom-generated fake codes that happen to pass the checksum
- Real USCCs stolen from one company and reused by another
- Real USCCs belonging to a dissolved or blacklisted entity
- Real USCCs where the registered business scope excludes the activity your supplier claims to perform
- Real USCCs belonging to a shell with no staff, no factory, and no assets
The only reliable way to answer "is this company real, operating, and authorized to do what I think they can do?" is to cross-reference the USCC against authoritative government databases. Format is a sanity check. Registry lookup is truth.
GSXT cross-verification: the only reliable truth check
The National Enterprise Credit Information Publicity System, hosted at gsxt.gov.cn, is the official registry maintained by the State Administration for Market Regulation (SAMR). It is the one source of truth that every Chinese enterprise, every provincial AMR office, every Chinese bank, and every tax authority treats as authoritative.
When you look up a USCC on GSXT, you get:
- Registration status — in business, suspended, revoked, deregistered, or cancelled
- Exact registered name in Chinese (the English name is not official)
- Legal representative — the natural person legally responsible for the company
- Registered capital — the declared (not necessarily paid-in) capitalization
- Registration date — when the entity came into existence
- Business scope (经营范围) — the explicit list of activities the company is licensed to conduct
- Registered address — where legal notices must be served
- Business abnormality list (经营异常) — if SAMR has flagged the company for not filing annual reports, for not being reachable at its registered address, or for providing false registration information
- Serious violation list (严重违法) — persistent or egregious misconduct
All of this is public and free. The catch is that the native GSXT interface is entirely in Chinese, uses a CAPTCHA that resists automation, and renders badly on non-CJK operating systems.
For English-language access, see our dedicated guide: How to use GSXT in English. It walks through the CAPTCHA workaround, the field-by-field translation of the result page, and the specific fields you should always cross-check against what your supplier has told you.
A quick reading protocol: even if the USCC format validates, treat the company as unverified until you have personally (or via a trusted intermediary) seen the GSXT result page and confirmed (a) status is "in business", (b) the Chinese registered name matches the name printed on the business license photo, (c) the business scope in Chinese actually includes the activity you are buying, and (d) the company is not on the abnormality or serious-violation lists.
For a fuller read of the business license document itself — including the physical layout, the red chop, and the relationship between the USCC and the legal rep's name — see how to read a Chinese business license in English.
One report, every check done for you
Stop decoding codes and squinting at Chinese registry screens. Order a ChineseCheck company credit report and receive a full English-language profile with USCC decoding, GSXT status, business scope translation, legal rep history, and risk flags — delivered in 24 hours.
Get Your ReportHow ChineseCheck uses this under the hood
Our verification pipeline runs every supplier-provided USCC through a superset of the Python validator above before it ever touches a paid registry API. The benefits are immediate:
- Rejecting typos early saves our research team (and your wallet) from running API calls on malformed strings.
- Decoding the division code lets us route the lookup to the correct provincial AMR subsystem, which for some provinces gives richer data than the national GSXT endpoint.
- Flagging the organization category tells us whether to pull enterprise-specific fields (registered capital, shareholders) or public-institution fields (funding source, competent authority).
- Logging the check digit computation is part of our audit trail — when a buyer disputes a report, we can show exactly what we validated and when.
Format validation is not the product. It is the 50-millisecond sanity check that makes the rest of the product trustworthy. See what a full verification looks like in our sample China company credit report — the USCC decode is the first line of every profile, followed by live GSXT data, cross-referenced sanctions screening, and a plain-English risk commentary. For the buying workflow, supplier verification pulls the whole process together from license receipt to purchase order.
E-E-A-T: why trust this breakdown
Experience. The ChineseCheck research team has processed over 40,000 supplier verification requests from buyers in 90+ countries since 2020. Our validator is battle-tested against every malformed USCC pattern we have ever seen — typos, OCR failures, pre-reform 15-digit numbers, and crude forgeries.
Expertise. Our compliance engineers read GB 32100-2015 in the original Chinese, cross-reference every release of GB/T 2260, and maintain live translations of SAMR guidance notices. Every numeric example in this article was computed by hand and then re-verified against our production validator.
Authoritativeness. We publish the Python and JavaScript reference implementations above as a service to the open-source community. Our code is used inside ERP connectors, customs brokerages, and compliance SaaS products. The weight vector, alphabet, and mod-31 rule are drawn directly from the GB 32100-2015 standard text — not from secondary sources.
Trustworthiness. Every USCC we verify for a paying customer is cross-checked against GSXT, the provincial AMR database, the MOFCOM foreign-investment enterprise registry (when applicable), and a half-dozen sanctions and litigation indexes. The checksum is the first 50 milliseconds. The rest is where the work happens.
Authoritative sources
- GB 32100-2015 — Rules of coding for the unified social credit identifier for legal entities and other organizations (法人和其他组织统一社会信用代码编码规则). National standard issued by the Standardization Administration of China. Defines the 18-character structure, character set, weight vector, and check-digit algorithm. Freely downloadable from the national standard information portal.
- GB/T 2260 — Codes for the administrative divisions of the People's Republic of China. Updated periodically by the Ministry of Civil Affairs and the National Bureau of Statistics. Source of the six-digit division code at positions 3–8.
- State Council Document No. 33 (2015) — Program for the Reform of Unified Social Credit Codes for Legal Persons and Other Organizations (关于批转发展改革委等部门法人和其他组织统一社会信用代码制度建设总体方案的通知). The policy document that mandated the USCC and set the migration timeline. Published by the General Office of the State Council, June 2015.
- gsxt.gov.cn — National Enterprise Credit Information Publicity System, operated by SAMR. The definitive registry for verifying whether a given USCC corresponds to a real, active, compliant entity.
- samr.gov.cn — State Administration for Market Regulation. Issues guidance notices, revisions, and enforcement policy related to enterprise registration.
Frequently asked questions
Is the 18-digit Chinese business registration number the same as the USCC?
Yes. "18-digit business registration number," "unified social credit code," and "USCC" all refer to the same identifier defined by GB 32100-2015. Some older English translations say "unified credit code" or "social credit identifier" — these are the same thing. The name "18-digit" is slightly informal because the code includes letters as well as digits, but the length is always exactly 18 characters.
Can a valid USCC contain lowercase letters?
No. GB 32100-2015 specifies uppercase only. If you receive a code with lowercase letters, uppercase it before validating. The result will either pass or fail on merit — but the raw lowercase string is formally non-compliant.
What is the difference between the 18-digit USCC and the 15-digit legacy business registration number?
Before the 2015 reform, Chinese enterprises used a 15-digit AIC registration number (工商注册号). After the reform, every entity was issued an 18-character USCC and the 15-digit number was deprecated. For entities registered before 2015, the USCC usually incorporates the old 9-digit organization code in positions 9–17. For new entities registered after 2015, the USCC is fresh and has no 15-digit predecessor. Some older marketing materials or business cards may still print the 15-digit number; always request the 18-character USCC for verification.
Does the check digit use the same formula as the Chinese resident ID card?
No — they are different systems. The resident ID (身份证) uses an 18-character format but with a mod-11 checksum, and its check digit can be X. The USCC uses mod-31 and has a 31-character alphabet. Do not apply one validator to the other.
What if the USCC has an 'I', 'O', 'S', 'V', or 'Z'?
Reject it as malformed. Those five letters are explicitly excluded by GB 32100-2015 to avoid visual ambiguity with digits and other letters. Any USCC containing them is either a typo or a fabrication. The most common cause is a human operator transcribing from a low-resolution license photo and substituting O for 0 or I for 1 — ask for a fresh copy.
Do Hong Kong, Macau, and Taiwan companies have USCCs?
No. The USCC is a Mainland China identifier. Hong Kong companies have a Business Registration (BR) number and a Company Registration (CR) number issued by the Hong Kong Companies Registry. Macau has its own commercial registry. Taiwan uses an 8-digit Business Administration Number. Do not attempt to validate any of these as USCCs — the format, checksum, and character set are all different. Foreign-invested enterprises established in Mainland China by Hong Kong, Macau, or Taiwan investors do receive a standard 18-character USCC.
Is there an official online USCC validator I can use?
There is no single official English-language validator, but GSXT (gsxt.gov.cn) implicitly validates format when you search — a malformed code will return no results. For programmatic use, the Python and JavaScript snippets in this article implement the standard exactly. If you want to cross-check a single code right now without writing code, you can also use any of the Chinese-language online tools that search for "统一社会信用代码校验"; most of them implement the same GB 32100-2015 algorithm, though the UIs are in Chinese.
If the check digit passes, does that mean the company is safe to do business with?
Absolutely not. A passing check digit means the 18 characters are mathematically consistent — nothing more. It says nothing about whether the company exists, is active, is solvent, is authorized to export, is on a sanctions list, or is a front for something else. Always follow up format validation with a GSXT lookup and, for any transaction of material size, a full company credit report.
Conclusion
The 18-digit Chinese business registration number is not mysterious. It is a deliberately engineered identifier whose structure, alphabet, and check digit are all defined by a single public national standard. With the Python or JavaScript snippet in this article, you can validate format in under 50 milliseconds. With an understanding of the agency code, organization category, and administrative division prefix, you can read a USCC the way a customs officer or Chinese banker does — as a rich structured identifier, not an opaque string.
But format is just the first filter. A valid-looking USCC can still belong to a shell company, a blacklisted entity, or a supplier whose registered scope doesn't cover what they are selling you. For anything that involves a wire transfer, a container of goods, or a contractual commitment, pair your format validation with a real registry lookup — ideally through a workflow that also handles the Chinese-language interface, the sanctions screening, and the risk interpretation for you.
If you would rather not build that workflow yourself, that is exactly what ChineseCheck does for a living.
Verify your supplier in 24 hours
Upload a business license photo or paste the USCC, and we'll return a full English-language credit report with GSXT status, business scope translation, legal rep history, litigation, and risk flags. Used by 3,000+ global buyers.
Get Your Report


