Abstract

Background

One night at 2:30 AM in the heart of Seoul, I was awakened by a woman’s scream. Even though I’m usually a heavy sleeper, at first I thought the screaming and shrieking might be the sound of stray cats in the alley. Was it another nightmare? I tried to go back to sleep, but this time I heard someone crying out for help. It definitely wasn’t a dream. When I opened the window, all I could see were villas and dark alleyways filling the neighborhood. Though it was a maze-like dark place, right next to it was a police station and a large 8-lane road where people frequently walked around.

When I opened the window and listened carefully, I could clearly hear a woman’s screams and groaning sounds as if she were being choked. I was so horrified that I woke up completely in that moment, and immediately called the police station to report it. This was definitely domestic violence or a critical situation. The police officer asked me for the location, but having been asleep just 10 minutes earlier, I couldn’t tell where in the villa complex it was.

So I hurriedly put on my shoes and, though sleepy, opened the door and went out. Even if I couldn’t help directly, I could at least tell them where the sound was coming from. Surprisingly, finding the source of the screams was incredibly easy. It was crystal clear that the sounds were coming from a man and woman on the third floor of a motel right next to the main road. The closer I got, the more terrifying the sounds coming from the motel window became. The sound of someone being hit with something, things breaking, the unmistakable sound of someone being choked and unable to breathe, sounds of someone trying to escape, and so on. I clearly heard cries for help at least twenty times throughout all of this.

Fortunately, three police cars and one fire truck arrived within four minutes of my call. I told them which floor of the motel I thought it was, and five firefighters used equipment to break open the motel door while four police officers and two detectives immediately arrested the man.

The video above is footage I recorded at the time. My hands and feet were frozen with shock, so I couldn’t film properly.

The perpetrator was believed to be a foreign worker and someone of foreign nationality who identified themselves as a university student. The suspect was immediately arrested, put in a police car, and taken away. Since I was the first person to report and was at the scene, the police asked me several questions. Fortunately—or unfortunately—I couldn’t know what condition the victim was in or what harm she had suffered, as a female police officer spoke with her privately in the ambulance. However, according to what I heard from a nearby detective, the details of the incident were as follows:

According to the victim’s account, a foreign man she had been dating as her boyfriend entered this motel around 2 AM and subjected her to indiscriminate violence. The victim tried to grab her phone to call for help, but couldn’t make the call herself because her phone was out of reach. The victim had been screaming for about 40 minutes before I called the police, but not a single passerby or any of the nearby residents had reported it. I thought to myself that if the suspect had approached with the premeditated intent to kill, no report would have been made at all.


Insights

I don’t know what happened to the suspect and victim after that. However, I was shocked that in the heart of Seoul—in a place with such high foot traffic and right in front of a police station—no one had been able to report this while it was happening. Moreover, since the victim was a foreigner, she likely had far less information than locals about which places in Korea are dangerous or safe. If I had fallen asleep at some roadside motel in Ohio and been attacked, local residents would certainly know that place wasn’t safe and would be better equipped to know how to respond, but outsiders wouldn’t easily know this.

This incident made me think for the first time about how difficult it can be for someone in a crime or dangerous situation to ask others for help. Nevertheless, there are times when we desperately need someone’s help. So what should we do?

The most ideal way to ask for help is to directly request it verbally from “someone trustworthy who can provide immediate and appropriate assistance.” They must not betray you when you ask for help, they must be able to respond immediately when help is requested, and their response must be professional and appropriate. What generally meets all three of these criteria would be public authorities, including firefighters and police. However, as we can see from the situation above, there are times when we cannot directly request help from police and firefighters.

Let’s go back to the beginning. The biggest factor that enabled me to make the report in this incident was the “screaming.” The most fundamental and primal reaction. Everyone screams in dangerous situations. We scream when someone startles us, and we scream when watching horror movies. Screaming is a very natural phenomenon, and in fact, not screaming in an extremely terrifying situation would only happen if someone is consciously controlling themselves. If we could only ask someone for help through screaming, it would certainly be an excellent method.


Previous Efforts in India

So, was there no app that could provide such help? According to my Google search, a developer in India created an Android app called “Chilla” in 2015. According to the developer’s description, it had a simple structure that detected dangerous situations by pressing a button or screaming, and sent SMS messages to the police or people nearby. However, I couldn’t find out how well this app actually detected screams and made reports in practice, and even when I tried to download it, I couldn’t because it was an app from too long ago.

https://www.huffpost.com/archive/in/entry/this-women-safety-app-detects-screams-and-sends-an-sos_n_10316170


So I found the developer who created this app, a person named Kishlay Raj, on LinkedIn and reached out to him. I asked (1) whether he used deep learning models when developing this app, and (2) why he no longer operates it. Thankfully, Mr. Raj graciously responded.

Mr. Raj kindly answered my two questions: (1) At that time, deep learning models weren’t popular, so he detected screams through frequency and volume, and used a sliding window to avoid Type 1 and Type 2 errors (false detection). (2) As Android apps became more robust, it became extremely difficult to keep the app running continuously in the background. Additionally, some Chinese-manufactured devices interfered with the OS kernel to prevent background apps from running in order to save battery. And most importantly, after getting employed and working at a job, it became difficult to continue such work. It was an incredibly useful response.


So, do similar apps not exist in Korea? All similar types of apps that exist domestically require you to press a specific button on the screen or device (power or volume button) multiple times before a notification goes to the police or a designated person. This is probably because there’s no clearer indication of intent than a user pressing a button. From a technical standpoint, pressing a specific button and thereby sending an SMS or notification to a designated person is a very simple structure. However, considering the case above, it may not be easy for someone to press a button. Most importantly, in a crisis situation, there are simpler methods than pressing a button, such as calling 112, 119, or 911. Therefore, I didn’t think existing solutions were particularly useful.

So I decided to devise a solution that’s simpler and more intuitive—a way to ask someone for help using only the sound of screaming. Then how should we detect screams? First, how can we define what a scream is? Frequency? Pitch range? When I looked for related papers, there was no content defining what a scream is. In other words, I couldn’t find any papers that defined the characteristic pitch range and frequency of human screams.

Should I take a very crude approach—recording sound, dividing it into chunks, and sending files in real-time using the ChatGPT API for detection? Obviously, this method has several problems: the recorded files would be leaked externally, and most importantly, high communication costs and API usage fees. It’s clearly an absurd solution. Nevertheless, there are implications that can be derived from such a simple and crude method. Why is that solution not useful?

(1) Even if listening to ambient noise in the background, that noise should not be stored in file form. There should be no external connections like APIs. All judgment must be made and completed within the device. (2) Maintenance should be simple and the method of asking for help should be very intuitive.

Another problem exists. How do we train the system to determine whether screaming is really a call for help or just fooling around? Is there really a clear difference between the two? This is the most fundamental question. Despite being built on the premise that “someone screams → help is needed,” there are also very many cases where “someone screams but doesn’t need help.” For example, screaming while seeing your favorite singer at a rock concert… there are far too many counterexamples. Therefore, screams must be classifiable and have clear characteristics. Otherwise, the notification cannot be trusted. As we already experienced during COVID, we fell into a panic state at the first emergency alerts, but became numb to too-frequent emergency texts and in many cases began ignoring or turning them off. As I said before, when asking for help, the recipient must be (1) trustworthy, (2) able to provide appropriate help, and (3) most importantly, able to provide immediate help. Too-frequent notifications make people doubt the usefulness of the alerts and can make them numb to requests for help.


The conclusion I reached was that there is clearly a significant difference. Therefore, this can be learned and distinguished. However, as the papers also revealed, even though the two types of screams have sufficiently different properties to be distinguished, humans cannot differentiate between these screams. In other words, we cannot distinguish whether it’s a scream of joy or a scream of terror. Then, even if I obtained scream samples for deep learning training and labeled them, it was very difficult to determine whether these audio files were screams full of joy or screams of terror. But I couldn’t give up. So, to resolve this curiosity, I studied the methodology of how the related papers were researched and sent emails to the professors who conducted those studi

Posted in

Leave a comment