AI-powered semantic search using pgvector and embeddings

Level of difficulty
Reading time

In the age of information, the ability to accurately and quickly retrieve data relevant to a user's query is paramount. Traditional search methodologies, which rely on keyword matching, often fall short when it comes to understanding the context and nuances of user queries. Semantic search, which seeks to improve search accuracy by understanding the searcher's intent and the contextual meaning of terms, has emerged as a solution to these limitations. However, implementing semantic search can be complex, involving advanced algorithms and understanding of natural language processing (NLP).

Existing solutions such as Elasticsearch and Solr have been at the forefront of tackling these challenges, providing platforms that support more nuanced search capabilities. These tools use a combination of inverted indices and text analysis techniques to improve search outcomes. Yet, the advent of machine learning and vector search technologies opens up new avenues for enhancing semantic search, with solutions like OpenAI's Embeddings API and the pgvector extension for PostgreSQL leading the charge.

Level of difficulty
Reading time

Sergey Pshenichnikov

Sign sequences (for example, verbal and musical texts) can be turned into mathematical objects. Words and numbers have become one entity, a representation of a matrix unit, which is a matrix generalization of integers and a hypercomplex number. A matrix unit is a matrix in which one element is equal to unit, and the rest are zeros.

If the words of the text are represented by such matrices, then concatenation (combination while maintaining order) of words and texts becomes an operation of adding matrices.

You can perform transformations with texts using algebraic operations, for example, dividing one text by another with a remainder. Mathematically recognize the sense of text and calculate the context of words. In this case, algebra helps to interpret all the intermediate stages of calculations.

A person sees and hears only what he understands (J.W. Goethe). Understands what he attaches sense to as significant for him. Sense is subjective and depends on the interests, motivations, and feelings of different people.

L. S. Vygotsky distinguished between the concepts of «sense» and «meaning»: «if the «meaning» of a word is an objective reflection of a system of connections and relationships, then « sense» is the introduction of subjective aspects of meaning according to a given moment and situation».

According to G. Frege, «meaning» are properties, relationships of objects, «sense» is only part of these properties. In this case, both “meanings” and «sense» are called one «sign», for example a word. Two people can choose from a list of meanings for one word two non-overlapping fragments (two senses) to interpret it.

Level of difficulty
Reading time

Sergey Pshenichnikov, Tatiana Sotnikova


Sergey Pshenichnikov, Tatiana Sotnikova

Trio Sapiens

Musical text can be represented using matrix units, like the description of verbal texts and other symbolic sequences. In the future, mathematical recognition, and creation of musical sense with substantive justification for intermediate calculations (as opposed to AI) may become possible.

Sound has four properties: pitch, duration, volume, and timbre. Timbre is not considered yet. The dictionary of the algebra of musical texts is built on the basis of musical notation for the piano.

The duration here, for the sake of brevity of the first presentation, is considered as «absolute». «Relative» is not considered, although intervals are very well studied, and their features will be needed to categorize composers.

The complexity of the musical text for the application of mathematics is explained by the desire to simplify the reading of musical notes by musicians and to minimize the use of lower and upper additional lines.

To apply text algebra to musical symbolic sequences there is no need to use a five-line staff. What is useful and familiar to musicians is «unbearably harmful» for the use of algebra. It seems advisable to use a one-line staff. In this case, the musical text becomes like the verbal text.

To solve the problem, you need to find a transformation of the canonical musical text into a «thread». And as always, for a new application of algebra, correct coordination of the subject area is necessary. In this case, each used musical notation and symbol of modern musical notation must be assigned its own serial number (natural number).

Instead of a sign, you can use the names of each note symbol - then it will be a verbal notation of musical texts written in one line «thread»).

Since the musical scale is completely represented by piano keys, the first section in height of the dictionary of musical texts consists of 88 numbered white and black keys (of which 52 are white). This eliminates the need for an octave division of the scale, octave transfer signs, keys, five alteration signs (key and random), diatonic and chromatic semitones.

All notes of the scale became fundamental in algebraic musical notation. There is an order of magnitude more of them of them than the main stages of Guido Aretinsky, but the alteration signs and names of octaves disappeared, the use of which made musical texts algebraically incompatible with verbal texts. Numbers from 1 to 88 in algebraic notation constitute a fragment of the pitch dictionary for the «thread» one-line staff.

Numbering (coordination) of notes is needed to become in the future indices of mathematical objects (matrix units), which will replace the signs of notes or their names. These matrix units are binary generalizations of integers (hyperbinary numbers). The operation of division with remainder is defined for them, as for integers. The operation will allow you to divide musical texts and their f

Why I need RSS 3.0

Level of difficulty
Reading time

In the past 5 years, I moved across 3 countries and 2 continents. It was not a short tourist travel or vacation, but a full immigrant experience with 1+ year experience minimum. I had to adapt to new cultures, new languages, new people, new food, new weather, new everything. One of the pains was to adopt new online services and information sources.

The problems I have faced were not obvious and interesting at the same time. I tried to analyze what was missing and required to make life easier.

Why would a software engineer attend an FPGA hardware meetup at Hacker Dojo?

Reading time

For the last 30 years digital chip design is not a schematic entry anymore: hardware engineers write code just like software engineers.

The difference is that the code software engineer writes becomes a chain of CPU instructions stored in memory, while the code in a hardware description language (HDL) becomes the CPU itself, its transistors and metal connections. And not only a CPU: the same technique is used to design processor-less ("fixed function") blocks in GPU that shuffle triangles and pixels, as well as network router chips that edit packet headers 100 times faster than CPU.

There are ways to experience this workflow without paying a million dollars to a silicon fab. One way is simulation, and another way is to use a matrix of reconfigurable logic cells, a Field Programmable Gate Array (FPGA). You can come on January 14 to Hacker Dojo in Mountain View, California. We have a bunch of computers and FPGA boards, and we will show you how to use them not only to blink LEDs but also to output graphics and recognize music.

This will change your perspective of what the code is.

Prepare for a ride:
Делаем Refal на Prolog. Магия в семь строк

Reading time
Если распознающая машина на рисунок слона отзывается сигналом «мура», на изображения верблюда — тоже «мура» и на портрет видного ученого — опять-таки «мура», это не обязательно означает, что она неисправна. Она может быть просто философски настроена.
Владимир Савченко, «Открытие себя»

1. Полюбите Рефал. Немедленно!

Всем известно, что есть такой язык программирования — Рефал. Рефал разработан в 1966 году нашим соотечественником Валентином Турчиным. Судьба у Рефала сложная, но язык до сих пор жив и развивается. Для интересующихся приведем несколько ссылок:

Сильно утрируя, можно сказать, что Рефал — это смесь Лиспа и Пролога. В синтаксисе языка есть одна интересная особенность — сопоставление с образцом т.н. «прямым выводом».
Нейро-сотрудники на основе ChatGPT. Вы создаете работника и продаете его на биржу труда

Level of difficulty
Reading time

Что если я скажу, что Вы можете практически без знаний программирования создать полноценного нейро-сотрудника, составить для него резюме, опубликовать его на бирже труда и трудоустроить в реальную компанию и получать 100% его зарплаты?

Это уже реально и это может сделать абсолютно каждый и вас.

Язык программирования Zig

Reading time

Первым комментарием к замечательной статье Субъективное видение идеального языка программирования оказалась ссылка на язык программирования Zig. Естественно, стало интересно, что же это такое за язык, который претендует на нишу C++, D и Rust. Посмотрел — язык показался симпатичным и в чем-то интересным. Приятный си-подобный синтаксис, оригинальный подход к обработке ошибок, встроенные сопрограммы. Данная статья является кратким обзором официальной документации с вкраплениями собственных мыслей и впечатлений от запуска примеров кода.
How to make a robot? What is first

Level of difficulty
Reading time

I develop robots, and I'm often asked, "How to make a robot?" and "Where do you find information and what resources do you use?"

If you don't know where to start and want to create your own robot, this article is for you. In it, I will try to explain the process and also share the first steps you should take.

lsFusion: Open-Source Rapid Application Development Platform

Reading time

lsFusion platform is designed for rapid development of business applications. It is distributed under the terms of a Lesser General Public License (LGPLv3). The source code of the platform is available on Github.

lsFusion is best suited for creating complex systems with large numbers of entities and forms, where users need to input and process large amounts of data. However, the platform can also be used to quickly create simple applications instead of spreadsheets when Excel’s functionality is not enough.

At the same time the use of the platform will not give a great advantage when developing applications aimed at interaction with a large number of “external” users or without the need for any complex calculations. You should also take into account that the web interface is a single page application using JavaScript. Therefore, the lsFusion platform is not well-suited for creating websites, for example.

Может ли биолог починить радио? 20 лет спустя

Level of difficulty
Reading time

В 2002 в журнале Cancer Cell вышла весьма саркастическая статья Юрия Лазебника «Может ли биолог починить радиоприемник, или что я понял, изучая апоптоз».

За 20 лет много изменилось. Биологи создали графический язык SBGN (Systems Biology Graphical Notation) для представления структуры биохимических путей и XML формат SBML (Systems Biology Markup Language) для представления математических моделей.

Кроме самих стандартов, необходимо программное обеспечение, которое их поддерживает. Начиная с 2001 года наша команда разрабатывает программный комплекс BioUML для моделирования сложных биологических систем и анализа биомедицинских данных. UML в его названии – это отсылка к стандарту UML – Unified Modeling Language, языку графического описания для объектного моделирования в области разработки программного обеспечения. Используя ПК BioUML, нашей группой были построены сложные компьютерные модели биологических систем (насколько я знаю, некоторые из них — наиболее сложные в мире для соответствующих систем).

Таким образом, современные стандарты SBGN и SBML и ПК BioUML позволяют биологам создавать схемы и модели биологических систем, вполне сопоставимые по уровню формализации с инженерными схемами.

Erlang больше не в моде. berry-lang — новый язык для BEAM со статической типизацией

Level of difficulty
Reading time

Привет! Сегодня хочу поделиться идеей нового языка для платформы BEAM: читатели хабра всё должны узнавать из первых рук! Планируется, что он будет транслироваться в эрланг source-to-source, и семантически будет тоже максимально совместим с эрлангом.

berry-lang поддерживает статическую типизацию, однако типы в нём - не главное. Главное - это приятный синтаксис, о чём свидетельствует его ягодное название. Кстати, о названии: помимо того, что оно созвучно слову Erlang, у него есть и другая подоплёка.

Дело в том, что berry-lang крадёт весь свой синтаксис у языка Сyber (слышали о таком?) Получается - кибер-тема. А чем заняты в кибер-городе? Выращиванием ягод, конечно! Скриншот - из последней Матрицы, на нём генерал Найоби угощает Нео клубникой и говорит, не без гордости - "Zion could have never made something like this!".

Статья содержит много коротких примеров с кодом! Будет интересна всем поклонникам языка эрланг и платформы BEAM. А если о языке Сyber ничего не слышали, то вообще - must-see.

Langton's ant: a mystery cellular automaton

Reading time

The life of Langton's Ant seems sad and lonely, but, as we'll soon discover, he is not ready to put up with such an outrageous situation and is trying his best to escape. American scientist Christopher Langton invented his ant back in 1986. Since then, no one has been able to explain the strange behavior of this mysterious model...

Database selection cheat sheet: SQL or NoSQL?

Reading time

This is a series of articles dedicated to the optimal choice between different systems on a real project or an architectural interview.

This topic seemed relevant to me because such tasks can be encountered both at work and at an interview for System Design Interview and you will have to choose between these two types of DBMS. I plunged into this issue and will tell you what and how. What is better in each case, what are the advantages and disadvantages of these systems and which one to choose, I will show with several examples at the end of the article.


Краткое введение в SIM-карты

Reading time
Когда на вопрос «кем вы работаете?» я отвечал «разработчиком ПО для SIM-карт», даже технически подкованные люди частенько удивлялись. Многие думают, что SIM-карта это «что-то типа флешки».

В этой статье я постараюсь кратко рассказать что такое SIM-карта (и смарт-карты в общем), зачем она нужна и что у нее внутри.

На самом деле SIM-карта — это частный случай контактной смарт-карты с микропроцессором. По сути, представляет из себя достаточно защищенный микрокомпьютер с CPU, ROM (опционально), RAM и NVRAM (которая выступает в качестве аналога жесткого диска в PC), с аппаратными генераторами случайных чисел и аппаратной реализацией крипто-алгоритмов.

В некотором приближении архитектуру микропроцессорной смарт-карты можно представить так:
Архитектура смарт-карты

Global corporations. Is there a light in the end of the tunnel?

Reading time

Global corporations became a part of our everyday life for a long time ago, their products often don’t leave an alternative option for users. Either is it exist? This article touches an issue of dominating big companies in certain areas, but also contains a row of useful tips. Spoiler of one of them: if you’re a user of Android then the tips will help you increase time between charging your phone and improve your privacy.

Main Challenges and Mistakes in Creating Your Design System

Reading time

Design system creation and integration is a challenging and rather tedious task. It can simply the development process or make it even harder. Anton Polyakov, Project Management Director for Innotech’s Mobile Development Department shares his team’s experience to demonstrate the unforeseen challenges they encountered.

The On-Line Encyclopedia of Integer Sequences today

Reading time

You can encounter integer sequences all around combinatorics, number theory, and recreational mathematics. And if there is a multitude of objects of the similar form, then one can create an index for these objects. The On-Line Encyclopedia of Integer Sequences, OEIS, is such an index.

This is a translation of my article The On-Line Encyclopedia of Integer Sequences in 2021, published in Mat. Pros. Ser. 3 28, 199–212 (2021).

This article covers the On-Line Encyclopedia inclusion criteria, its editorial process, its role in mathematics, and its future.

I trained a neural network on my drawings and give the model for free (and teach you to create your own)

Reading time

Great for seamless patterns, abstract drawings, and watercolor-styled images. How to use it and train a neural network on your own pictures?

Download the model here: https://huggingface.co/netsvetaev/netsvetaev-free

