@AbstractPhil on Hugging Face: "geolip-ryan-spearman, the first dedicated protein observation structure meant…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Mar 27

Post

148

geolip-ryan-spearman, the first dedicated protein observation structure meant to expand the tooling of the observer modeling system and introducing additional introspective analysis to the equation for genetic mutation and abnormality.

AbstractPhil/geolip-esm2_t33_650M_UR50D

This model is based on edm2 33 650m from facebook, assessed with specific benchmarks to be around 50% accurate or so. I'll be improving those numbers by self distillation spectrum. The models will never see the validation data while unfrozen. The full spectrum of training tools are visible.

This is the first self-distillation observer prototype, and it works. Not as rapidly as I had hoped, but it most definitely works. The SVD was the missing piece of geometric solidity required to preserve full rotational behavioral control. The kernel made this possible for rapid iteration, and the first results are coming in.

This inherits much of the functionality from the CLIP_L and CLIP_G memory banks, while benefitting from the advanced research I performed while extracting CaptionBert 5x bert pooled captions for target points.

The primary driving point here is the sheer data size - and the important contributions of that data size to a full construct of geometric aligned data. There is a massive amount of very specific information, all curated, perfectly labeled, and organized in a way that can be... well not so easily accessed, but I did find a few ways in.

This data is highly accurate and forged through life for billions of years. This is what is there, this is what is expected, and I have the tooling - stage by stage, to not only develop a solution for the problem, but to fully contribute to an improved version with minimal hardware requirement for training.

This is real expectation and the results are pouring in hourly, this can improve models beyond a reasonable baseline while preserving the baseline's correctness.

AbstractPhil

Mar 27

•

edited Mar 27

I expect the sheer geometric alignment alone to yield a new form of Adam tuning specific to introspective analytical alignment and with that a new format of optimizer dedicated to geometric preservation in conjunction with informational data accumulation. I also expect a new methodology for larger-buffer data movement kernel-wise, a structural boundary for SVD limitations within full spectrum, a substructure measured collapse state of SVD when projected, and multiple other models that will have hiccups and growing pains.

These tools are all building to the end-state format, which will express everything simultaneously in order to combine the necessary data from many many forms of models together, without requiring direct tooling to each model simultaneously.

Such finalized tools will include a reusable pretrained geometric patchwork that exhibits all the necessary traits of a geometric structure in it's frozen state, capable of being finetuned quickly into any other state, or simply utilized as a lookup beacon with the correct geometric transformer alignment.

The geometric transformer, which is specifically a revamped format for the transformer intentionally designed with the structural preservation of the overarching structure in mind, rather than falling directly to the naturalistic entropy of immediate solution over larger-scale contribution. This system will not replace rope, it will contribute to the concept of long-concept preservation and work hand-in-hand with systems like rope, attention, and original transformers simultaneously. ROPE based models will benefit most from this structure, as they are already trained intrinsically with alignment and rotation at their cores.

The geometric transformer by design takes nth inputs in as variant states, and those are transformed internally. Utilizing this by it's default state will yield by design, but it will require tuning and curation for specific use cases no matter which case. This is conceptually familiar to those who use transformers, and simultaneously intimidating to those who understand what I'm describing I'd think. I myself am a little intimidated that I'm this close as-is.

There are multiple other prototypes at work all leading to the geometric transformer, which will be both an empirically superior utility to any of the utilities I currently use, and embody the very essence of the geometric structure that I'm currently working with as a full trainable data mutation operation - meant to directly attenuate the structure of the observation, to the expectation of the autograd and gradients.

Getting pretty close to a few pieces, but not there yet.

AbstractPhil

Mar 27

•

edited Mar 27

I've taken the benchmarks of the model from 50% to 86-93% spearman utilizing a quaternion-oriented attention head.

This is getting dangerously close to 99.9% mutation detection accuracy, with a model deemed 50% accurate - all by extracting geometric features from the constellation and training the ensemble head with the correct rules.

These are spearman result logits. These are in fact detecting the results.

This is the power of what I'm doing. From 50% to 90% in 48 hours with a single GPU.

Training your own alignment only requires a piece of the dataset you wish to run and about 8 hours or so. Run it, fall asleep, check on it in the morning. It'll be ready. Extract features, train your head in minutes. The spearman will be nearly perfect.

I'm currently preparing what I consider to be the final head that will need to be created. The quaternion head, which will be specifically predictive based on an ensemble of four divergent-methodology heads, each specifically tasked to solve the SVD in conjunction with the features. This system should extract any little bit of differentiation that exists. The imaginary head is the most crucial. Explaining this requires an entire paper of it's own.

I call this imaginary head the "Cletus" head, as it's inherently lesser accuracy in relation to the others. However, without it the combination does not coalesce correctly. Without the Cletus, the model does not reach full cohesion. This head is the most crucial, because it has the hardest job. It's actually the one who returned from the battlefield with the blueprint to describe everything it saw.

AbstractPhil

Mar 28

Self-distillation has shown improvement. I think most importantly I've discovered a core component that can be utilized as a geometric attention, the quaternion MHA. The constellation produces all the necessary information to allow the quaternion MHA to benefit from the information in a directly utilizable fashion.

The quaternion MHA is quite the vessel. It's bulky, has multiple MHA structures, and is shockingly effective in the process. I'll be refining this head in the coming days as a composite Procrustes alignment tool.

Geometric structure has a very high amount of informational accumulation potential, so a multi-series of MHA can capture a great amount of informational processing from those elements, if the elements are curated correctly and within the specifications.

In this post

AbstractPhil AbstractPhila