- 1.Ingest & parse on-chain and off-chain NFT transactions from the top N marketplaces
- 2.Filter out obscenely low-priced sales (a proxy for wash/accidental trades)
For Solana, we fetch, filter and parse transaction data from a cluster of on-chain RPC nodes for 9+ top NFT marketplaces.
We tag each transaction with their corresponding type (e.g., listing, de-listing, sale) and ingest it in a normalized format for querying later.
We also fetch the on-chain NFT mint accounts and their metadata (e.g., traits) for rarity scoring.
Without filtering for any extremely low-priced sales transactions, we end up exposing ourselves to wash trades that are cheap to manufacture and thus cheap to manipulate. For example, it is not uncommon for the typical "floor" price to dip to obscenely low values, and we can easily observe this from the the floor plot for a collection like DeGods:
Sales price statistics for the Solana "DeGods" collection (NB: y-axis is in log-scale).
Observe that the median is relatively stable: in fact, it is the most robust quantile statistic (the highest breakdown point). We can use it as a reference point: that is, we can filter out all sales that are less than 0.2X the rolling median. We observe that 0.2X the median has sufficient slack based on historical data from the "blue-chip" collections we track across both Solana and Ethereum.
After filtering out low-priced sales, we can compute quantile statistics on the remaining sales transactions.
As we described with canonical floor prices, taking the minimum of a sample of transaction prices is usually a bad idea. The minimum has a breakdown point of 1, which means that all it takes is one incorrect (read: bad actor) to influence the statistic. To introduce some robustness or buffer room, we compute higher-order quantiles (e.g., 5th percentile) to establish the smart floor price.
In addition to using higher-order quantiles, we also look at a rolling window of historical data. This ensures that we have a larger sample size to produce a higher confidence statistic and to ensure that blips in sales data do not have a disproportional impact.
Choosing the correct quantiles, window sizes, and minimum sample size is our secret sauce. We've backtested our hyperparameter choices on all historical data to ensure that our choices hold up robustly throughout history and hopefully towards the future.
The final result is a much smoother and more robust time series for a "fair price":
The historical daily smart floor price series for the Solana "DeGod" collection.