Pull to refresh
52.13

Big Data *

Everything about big data

Show first
Rating limit
Level of difficulty

You are standing at a red light at an empty intersection. How to make traffic lights smarter?

Reading time 14 min
Views 2K

Types of smart traffic lights: adaptive and neural networks

Adaptive works at relatively simple intersections, where the rules and possibilities for switching phases are quite obvious. Adaptive management is only applicable where there is no constant loading in all directions, otherwise it simply has nothing to adapt to – there are no free time windows. The first adaptive control intersections appeared in the United States in the early 70s of the last century. Unfortunately, they have reached Russia only now, their number according to some estimates does not exceed 3,000 in the country.

Neural networks – a higher level of traffic regulation. They take into account a lot of factors at once, which are not even always obvious. Their result is based on self-learning: the computer receives live data on the bandwidth and selects the maximum value by all possible algorithms, so that in total, as many vehicles as possible pass from all sides in a comfortable mode per unit of time. How this is done, usually programmers answer – we do not know, the neural network is a black box, but we will reveal the basic principles to you…

Adaptive traffic lights use, at least, leading companies in Russia, rather outdated technology for counting vehicles at intersections: physical sensors or video background detector. A capacitive sensor or an induction loop only sees the vehicle at the installation site-for a few meters, unless of course you spend millions on laying them along the entire length of the roadway. The video background detector shows only the filling of the roadway with vehicles relative to this roadway. The camera should clearly see this area, which is quite difficult at a long distance due to the perspective and is highly susceptible to atmospheric interference: even a light snowstorm will be diagnosed as the presence of traffic – the background video detector does not distinguish the type of detection.

Read more
Total votes 3: ↑3 and ↓0 +3
Comments 0

Data Science Digest — 21.04.21

Reading time 3 min
Views 982

Hi All,

I’m pleased to invite you all to enroll in the Lviv Data Science Summer School, to delve into advanced methods and tools of Data Science and Machine Learning, including such domains as CV, NLP, Healthcare, Social Network Analysis, and Urban Data Science. The courses are practice-oriented and are geared towards undergraduates, Ph.D. students, and young professionals (intermediate level). The studies begin July 19–30 and will be hosted online. Make sure to apply — Spots are running fast!

If you’re more used to getting updates every day, follow us on social media:

Telegram
Twitter
LinkedIn
Facebook

Regards,
Dmitry Spodarets.

Read more
Total votes 3: ↑2 and ↓1 +1
Comments 0

Data Science Digest — We Are Back

Reading time 5 min
Views 1.1K

Hi All,

I have some good news for you…

Data Science Digest is back! We’ve been “offline” for a while, but no worries — You’ll receive regular digest updates with top news and resources on AI/ML/DS every Wednesday, starting today.

If you’re more used to getting updates every day, follow us on social media:

Telegram - https://t.me/DataScienceDigest
Twitter - https://twitter.com/Data_Digest
LinkedIn - https://www.linkedin.com/company/data-science-digest/
Facebook - https://www.facebook.com/DataScienceDigest/

And finally, your feedback is very much appreciated. Feel free to share any ideas with me and the team, and we’ll do our best to make Data Science Digest a better place for all.

Regards,
Dmitry Spodarets.

Read more
Rating 0
Comments 0

Coins classifier Neural Network: Head or Tail?

Reading time 14 min
Views 1.3K

Home of this article: https://robotics.snowcron.com/coins/02_head_or_tail.htm

The global objective of these articles is to build a coin classifier, capable of scanning your pocket change and find rare / valuable coins. This is a second article in a series, so let me remind you what happened earlier (https://habr.com/ru/post/538958/).

During previous step we got a rather large dataset composed of pairs of images, loaded from an online coins site meshok.ru. Those images were uploaded to the Internet by people we do not know, and though they are supposed to contain coin's head in one image and tail in the other, we can not rule out a situation when we have two heads and no tail and vice versa. Also at the moment we have no idea which image contains head and which contains tail: this might be important when we feed data to our final classifier.

So let's write a program to distinguish heads from tails. It is a rather simple task, involving a convolutional neural network that is using transfer learning.

Same way as before, we are going to use Google Colab environment, taking the advantage of a free video card they grant us an access to. We will store data on a Google Drive, so first thing we need is to allow Colab to access the Drive:

Читать далее
Rating 0
Comments 0

Coins Classification using Neural Networks

Reading time 19 min
Views 2.8K

See more at robotics.snowcron.comThis is the first article in a serie dedicated to coins classification.Having countless "dogs vs cats" or "find a pedestrian on the street" classifiers all over the Internet, coins classification doesn't look like a difficult task. At first. Unfortunately, it is degree of magnitude harder - a formidable challenge indeed. You can easily tell heads of tails? Great. Can you figure out if the number is 1 mm shifted to the left? See, from classifier's view it is still the same head... while it can make a difference between a common coin priced according to the number on it and a rare one, 1000 times more expensive.Of course, we can do what we usually do in image classification: provide 10,000 sample images... No, wait, we can not. Some types of coins are rare indeed - you need to sort through a BASKET (10 liters) of coins to find one. Easy arithmetics suggests that to get 10000 images of DIFFERENT coins you will need 10,000 baskets of coins to start with. Well, and unlimited time.So it is not that easy.Anyway, we are going to begin with getting large number of images and work from there. We will use Russian coins as an example, as Russia had money reform in 1994 and so the number of coins one can expect to find in the pocket is limited. Unlike USA with its 200 years of monetary history. And yes, we are ONLY going to focus on current coins: the ultimate goal of our work is to write a program for smartphone to classify coins you have received in a grocery store as a change.Which makes things even worse, as we can not count on good lighting and quality cameras anymore. But we'll still try.In addition to "only Russian coins, beginning from 1994", we are going to add an extra limitation: no special occasion coins. Those coins look distinctive, so anyone can figure that this coin is special. We focus on REGULAR coins. Which limits their number severely.Don't take me wrong: if we need to apply the same approach to a full list of coins... it will work. But I got 15 GB of images for that limited set, can you imagine how large the complete set will be?!To get images, I am going to scan one of the largest Russian coins site "meshok.ru".This site allows buyers and sellers to find each other; sellers can upload images... just what we need. Unfortunately, a business-oriented seller can easily upload his 1 rouble image to 1, 2, 5, 10 roubles topics, just to increase the exposure.

So we can not count on the topic name, we have to determine what coin is on the photo ourselves.To scan the site, a simple scanner was written, based on the Python's Beautiful Soup library. In just few hours I got over 50,000 photos. Not a lot by Machine Learning standards, but definitely a start.After we got the images, we have to - unfortunately - revisit them by hand, looking for images we do not want in our training set, or for images that should be edited somehow. For example, someone could have uploaded a photo of his cat. We don't need a cat in our dataset.First, we delete all images, that can not be split to head/hail.

Читать далее
Rating 0
Comments 0

Big Data Tools EAP 12 Is Out: Experimental Python Support and Search Function in Zeppelin Notebooks

Reading time 3 min
Views 1.1K

Update 12 of the Big Data Tools plugin for IntelliJ IDEA Ultimate, PyCharm Professional Edition, and DataGrip has been released. You can install it from the JetBrains Plugin Repository or from inside your IDE. The plugin allows you to edit Zeppelin notebooks, upload files to cloud filesystems, and monitor Hadoop and Spark clusters.


In this release, we've added experimental Python support and global search inside Zeppelin notebooks. We’ve also addressed a variety of bugs. Let's talk about the details.


Read more →
Rating 0
Comments 1

Big / Bug Data: Analyzing the Apache Flink Source Code

Reading time 11 min
Views 842
image1.png

Applications used in the field of Big Data process huge amounts of information, and this often happens in real time. Naturally, such applications must be highly reliable so that no error in the code can interfere with data processing. To achieve high reliability, one needs to keep a wary eye on the code quality of projects developed for this area. The PVS-Studio static analyzer is one of the solutions to this problem. Today, the Apache Flink project developed by the Apache Software Foundation, one of the leaders in the Big Data software market, was chosen as a test subject for the analyzer.
Read more →
Total votes 1: ↑0 and ↓1 -1
Comments 0

Playing with Nvidia's New Ampere GPUs and Trying MIG

Reading time 11 min
Views 3.8K


Every time when the essential question arises, whether to upgrade the cards in the server room or not, I look through similar articles and watch such videos.


Channel with the aforementioned video is very underestimated, but the author does not deal with ML. In general, when analyzing comparisons of accelerators for ML, several things usually catch your eye:


  • The authors usually take into account only the "adequacy" for the market of new cards in the United States;
  • The ratings are far from the people and are made on very standard networks (which is probably good overall) without details;
  • The popular mantra to train more and more gigantic models makes its own adjustments to the comparison;

The answer to the question "which card is better?" is not rocket science: Cards of the 20* series didn't get much popularity, while the 1080 Ti from Avito (Russian craigslist) still are very attractive (and, oddly enough, don't get cheaper, probably for this reason).


All this is fine and dandy and the standard benchmarks are unlikely to lie too much, but recently I learned about the existence of Multi-Instance-GPU technology for A100 video cards and native support for TF32 for Ampere devices and I got the idea to share my experience of the real testing cards on the Ampere architecture (3090 and A100). In this short note, I will try to answer the questions:


  • Is the upgrade to Ampere worth it? (spoiler for the impatient — yes);
  • Are the A100 worth the money (spoiler — in general — no);
  • Are there any cases when the A100 is still interesting (spoiler — yes);
  • Is MIG technology useful (spoiler — yes, but for inference and for very specific cases for training);
Read more →
Total votes 5: ↑5 and ↓0 +5
Comments 0

Distributed File Systems

Reading time 9 min
Views 8.5K

The Big Data Tools plugin seamlessly integrates HDFS into your IDE and provides access to different cloud storage systems (AWS S3, Minio, Linode, Digital Open Space, GS, Azure). But is this the end? Have we implemented everything and now progress has stopped? Of course not.


In this short digest, we'll take a look at 15 popular distributed file systems available on the market and try to get a sense of their individual advantages.


Almost all of these systems are free or open-source, and you can find the sources on GitHub. The sites of these projects, their documentation, and online reviews provide most of the information we’ll consider here. Other than HDFS, none of these technologies have been implemented yet in Big Data Tools. But who knows? Perhaps someday we'll see them in our plugin.


Read more →
Total votes 8: ↑8 and ↓0 +8
Comments 1

Big Data Tools Update 11 Is Out

Reading time 3 min
Views 1.6K

EAP 11 of the Big Data Tools plugin for IntelliJ IDEA Ultimate, PyCharm, and DataGrip is available starting today. You can install it from the JetBrains Plugin Repository or inside your IDE.


Big Data Tools is a new JetBrains plugin that allows you to connect to Hadoop and Spark clusters and monitor nodes, applications, and jobs. It also brings support for editing and running Zeppelin notebooks inside IntelliJ IDEA and DataGrip, so you can create, edit, and run Zeppelin notebooks without ever having to leave your favorite IDE. The plugin offers smart navigation, code completion, inspections, quick-fixes, and refactoring inside notebooks.


Read more →
Total votes 7: ↑7 and ↓0 +7
Comments 0

ZTools for Apache Zeppelin

Reading time 8 min
Views 1.3K



Zeppelin is a web-based notebook for data engineers that enables data-driven, interactive data analytics with Spark, Scala, and more.


The project recently reached version 0.9.0-preview2 and is being actively developed, but there are still many things to be implemented.


One such thing is an API for getting comprehensive information about what's going on inside the notebook. There is already an API that completely solves the problems of high-level notebook management, but it doesn’t help if you want to do anything more complex.

Read more →
Total votes 3: ↑3 and ↓0 +3
Comments 0

Crime, Race and Lethal Force in the USA — Part 3

Reading time 24 min
Views 1.7K
image
This is the concluding part of my article devoted to a statistical analysis of police shootings and criminality among the white and the black population of the United States. In the first part, we talked about the research background, goals, assumptions, and source data; in the second part, we investigated the national use-of-force and crime data and tracked their connection with race.
Read more →
Total votes 3: ↑3 and ↓0 +3
Comments 0

Modern Google-level STT Models Released

Reading time 2 min
Views 5K


We are proud to announce that we have built from ground up and released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:


  • English;
  • German;
  • Spanish;

You can find all of our models in our repository together with examples, quality and performance benchmarks. Also we invested some time into making our models as accessible as possible — you can try our examples as well as PyTorch, ONNX, TensorFlow checkpoints. You can also load our model via TorchHub.


PyTorch ONNX TensorFlow Quality Colab
English (en_v1) link Open In Colab
German (de_v1) link Open In Colab
Spanish (es_v1) link Open In Colab
Read more →
Total votes 9: ↑9 and ↓0 +9
Comments 0

Neural networks in reality

Reading time 2 min
Views 861
The mass of news and articles about artificial intelligence creates the illusion that we are living in a fantastic time. But when you start asking everyone what exactly is useful in real life from these high technologies, the answers come down to some Google features, mobile games and a story about Chinese videos. By the way, oh, these Chinese videos — for some reason, they are constantly shown by the central mass media when they demonstrate Moscow's intellectual technologies.

In words, it seems, all the «intellects» are installed already everywhere, the whole country has long been transferred to neural networks, but only in some kind of demonstration pictures, in diagrams, on fingers. There is a mental dissonance — why not take a video camera and shoot at least a fragment of how Russia's super mega technologies work?

As Nikita Sergeevich said, «science ceases to be self-indulgence when its fruits are applied in the national economy.» And today's artificial intelligence is familiar to us only from games. Many people really want to see something useful in reality. Therefore, we were not too lazy and recorded our video of the operation of neural networks from real objects.

Total votes 1: ↑0 and ↓1 -1
Comments 0

The Project «Fabula»: How to find the desired video-fragment or person in a pile of video files?

Reading time 2 min
Views 1.4K
If a person is far over 20, then he has already accumulated a huge film library of his life, as well as videos from friends, relatives, and from his place of work… It is no longer possible to find someone or something specific there. Recently, I was preparing a video compilation for my daughter's anniversary – I spent a week. The media is all the more overloaded with video archives. And every day, millions of terabytes of video content appear in the world. And this is in the era of BIG DATA.

image
Read more →
Total votes 3: ↑3 and ↓0 +3
Comments 4

Machine Learning & Big Data: Let’s Find The Relationship Between Them

Reading time 4 min
Views 3.1K
image

Machine learning is indeed a famous word among technologies. Today we will relate it with another famous term that is Big data. Both these have become Buzz words these days. Let’s here find out their meaning individually.

Big data is known as the process in which we collect and analyze the large volume of data sets (called Big Data) which helps in discovering useful hidden patterns and other information such as customer choices, market trends which is really beneficial for the organizations to remain informed and customer-oriented business decisions.
Read more →
Rating 0
Comments 1

The QC House of Cards

Reading time 4 min
Views 697
There’s Gold in Them Thar Hills

Gold rushes can make people crazy. 1848 was enough of an indicator of that. When Sam Brannan announced to the world: ‘Gold! Gold! Gold from the American River!’, half the world’s population (or so it seemed to the tiny California population which lived there at the time) descended on the soon to be the newest state of the union.

San Francisco, before a small hamlet with a few hundred pioneers living there, became a centre of vice, murder and debauchery overnight.

image

Two hundred years before tulip mania hit Europe, and like in California with its argonauts or 49ers, it impoverished more than it made rich. In the early 2000s, too, the Dot.Com bubble created a speculative tendency in people when irrationality took over all reason.
Read more →
Total votes 1: ↑0 and ↓1 -1
Comments 0

Four Ways Quantum Computing Will Change Artificial Intelligence Forever

Reading time 4 min
Views 1.8K
If science were a dating app, quantum physics and machine learning probably wouldn’t be a match. They’re from completely different fields and often require completely different backgrounds and skills. But, throw in a little quantum computing and, suddenly, that science-matchmaking app becomes Tinder and the attraction between the two is palpable.

image

(Credit: cmo.adobe.com/articles/2017/5/how-will-artificial-intelligence-impact-business-tlp-ptr.html#gs.5zlifl)

Even though the extent of change that quantum computing will unleash on AI is up for debate, many experts now more than suspect that quantum computing will definitely alter AI at some level. Analysts from bank holding company BBVA, for example, point toward the natural synergy between quantum computing and AI as reasons why quantum machine learning will eventually best classical machine learning.

“Quantum machine learning can be more efficient than classic machine learning, at least for certain models that are intrinsically hard to learn using conventional computers,” says Samuel Fernández Lorenzo, a quantum algorithm researcher who collaborates with BBVA’s New Digital Businesses area. “We still have to find out to what extent do these models appear in practical applications.”
Read more →
Rating 0
Comments 2

Reach Out Top Hadoop Consulting Companies To Leverage Big Data In 2020

Reading time 7 min
Views 1.1K
image

Hadoop is divided into different modules, each of which delivers a distinct task crucial for a computer system and is uniquely designed for big data analytics. Apache Software Foundation developed this incredible platform. It is extensively utilized by worldwide developers to build big data Hadoop solutions amazingly and easily.

Big data offers several perks, some of them are; examining root causes of failures, recognizing the potential of data-driven marketing, improving and enhancing customer engagement, and much more. By offering multiple solutions in a single stream it helps in lowering the cost of the organization.

In various industries such as Retail, Manufacturing, Financial insurance, Education, Transportation, Agriculture, Healthcare, Energy, etc big data is utilized and that’s why it’s demand is expanding day by day. The Global Hadoop Market is envisioned to grow to $84.6 billion by 2021, with an expected CAGR of 63.4%.
Read more →
Total votes 3: ↑3 and ↓0 +3
Comments 2

Authors' contribution