Skip to content

Commit 0d99193

Browse files
Add BigQuery datasets section in sourcify-database page (#43)
* Add BigQuery datasets section in sourcify-database page * Add sourcify production dataset link
1 parent ce62a4c commit 0d99193

1 file changed

Lines changed: 25 additions & 8 deletions

File tree

docs/4. repository/1. sourcify-database.mdx

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@
22

33
Sourcify Database is the main storage backend for Sourcify. It is a PostgreSQL database that follows the [Verified Alliance Schema](https://github.com/verifier-alliance/database-specs) as its base with few modifications.
44

5-
On a high level, these modifications are:
6-
- Sourcify DB does accept contracts without the deployment details such as `block_number`, `transaction_hash` as well as without an onchain creation bytecode (`contracts.creation_code_hash`).
5+
On a high level, these modifications are:
6+
7+
- Sourcify DB does accept contracts without the deployment details such as `block_number`, `transaction_hash` as well as without an onchain creation bytecode (`contracts.creation_code_hash`).
78
- Stores the [Solidity metadata](/docs/metadata) separately in the `sourcify_matches` table.
89
- Introduces tables for other purposes.
910

@@ -13,22 +14,29 @@ You can follow the [`services/database/migrations`](https://github.com/argotorg/
1314

1415
You can access the live schema of the database [here](https://dbdiagram.io/d/Sourcify-DB-67fcf5ee9cea640381a217d2) or in the embedded frame below.
1516

16-
<iframe src='https://dbdiagram.io/e/67fcf5ee9cea640381a217d2/67fcf5fc9cea640381a21a00' style={{width: "100%", height: "500px"}}> </iframe>
17+
<iframe
18+
src="https://dbdiagram.io/e/67fcf5ee9cea640381a217d2/67fcf5fc9cea640381a21a00"
19+
style={{ width: "100%", height: "500px" }}
20+
>
21+
{" "}
22+
</iframe>
1723

1824
In short:
25+
1926
- Every verified contract is a coupling between a deployed contract (`contract_deployments`) and a compilation (`compiled_contracts`)
20-
- ["Transformations"](https://verifieralliance.org/docs/transformations) are applied to reach the final matching onchain bytecode from a bytecode from a compilation.
27+
- ["Transformations"](https://verifieralliance.org/docs/transformations) are applied to reach the final matching onchain bytecode from a bytecode from a compilation.
2128
- Bytecodes and sources are dedeplicated. The bytecode and the sources of a popular contract like `ERC20.sol` will only be stored once in `sources` and `code` respectively.
2229

2330
:::warning
24-
If the contract has ["unlinked libraries"](https://docs.soliditylang.org/en/v0.8.30/using-the-compiler.html#library-linking), the placeholder strings like `__$53ae...a537$__` in bytecodes will be normalized to `0000...0000`s. This is required since the `code` column is a `bytea` type in the DB.
31+
If the contract has ["unlinked libraries"](https://docs.soliditylang.org/en/v0.8.30/using-the-compiler.html#library-linking), the placeholder strings like `__$53ae...a537$__` in bytecodes will be normalized to `0000...0000`s. This is required since the `code` column is a `bytea` type in the DB.
2532

2633
Therefore, the bytecode string from the DB **will not be identical** to the output of the compilation. You can "de-normalize" these fields by looking at the library transformations and filling the placeholders with the library identifier.
2734
:::
2835

2936
For more information about the schemas of the json fields below check the [Verifier Alliance repo](https://github.com/verifier-alliance/database-specs/tree/master/json-schemas).
3037

3138
JSON fields of `verified_contracts` table:
39+
3240
- `creation_values`
3341
- `creation_transformations`
3442
- `runtime_values`
@@ -37,6 +45,7 @@ JSON fields of `verified_contracts` table:
3745
The [transformations](https://verifieralliance.org/docs/transformations) and values are the operations done on a bytecode from a compilation to reach the final matching onchain bytecode.
3846

3947
JSON fields of `compiled_contracts` table:
48+
4049
- `sources`: Source code files of a contract
4150
- `compiler_settings`
4251
- `compilation_artifacts`: Fields from the compilation output JSON. Fields: `abi`, `userdoc`, `devdoc`, `sources` (AST identifiers), `storageLayout`
@@ -45,7 +54,7 @@ JSON fields of `compiled_contracts` table:
4554

4655
### Notes on the data
4756

48-
For the issues on the data we are aware of and plan to fix, see this issue: https://github.com/argotorg/sourcify/issues/2276
57+
For the issues on the data we are aware of and plan to fix, see this issue: https://github.com/argotorg/sourcify/issues/2276
4958

5059
Other known inconsistencies in the data below (not planned to fix) are documented below:
5160

@@ -58,7 +67,6 @@ Other known inconsistencies in the data below (not planned to fix) are documente
5867

5968
We dump the whole database daily in [Parquet](https://en.wikipedia.org/wiki/Apache_Parquet) format and upload it to a Cloudflare R2 storage. You can access the manifest file at https://export.sourcify.dev ( `.dev` redirects to `.app` domain, which also belongs to Sourcify). The script that does the dump is at [sourcifyeth/parquet-export](https://github.com/sourcifyeth/parquet-export).
6069

61-
6270
[export.sourcify.dev](https://export.sourcify.dev) will redirect to a `manifest.json` file:
6371

6472
<details>
@@ -103,11 +111,13 @@ We dump the whole database daily in [Parquet](https://en.wikipedia.org/wiki/Apac
103111
}
104112
}
105113
```
114+
106115
</details>
107116

108117
You can download all the files and use a parquet client to query, inspect, or process the data.
109118

110119
1. Download the manifest file (`-L` to follow redirects):
120+
111121
```bash
112122
curl -L -O https://export.sourcify.dev/manifest.json
113123
```
@@ -125,4 +135,11 @@ brew install parquet-cli
125135
parquet meta compiled_contracts_0_5000.parquet
126136
```
127137

128-
alternatively use your favorite data processing tool or import this data into a database.
138+
alternatively use your favorite data processing tool or import this data into a database.
139+
140+
## BigQuery Datasets
141+
142+
We also provide public BigQuery datasets for convenient querying and exploration:
143+
144+
- [Sourcify production dataset](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssourcify-project!2ssourcify)
145+
- [Sourcify staging dataset](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssourcify-project!2ssourcify_staging)

0 commit comments

Comments
 (0)