Video to Text: Amazon Transcribe with S3

How lazy can you possibly get? Well, that’s my story on how to transcribe text from boring videos, and checking for the keywords, before even checking the video/audio.

For the start, first cornerstone was to actually get the video. Most of the Streaming Players use HLS media player that heavily rely on m3u8 extension (those who remember playlists in WinAmp, might remember it), that sets base URL for all the video segments that will be streamed.

If you hit “Play” on the media player, while having your DevTools Network tab open, you’d see something like that:

After some time google around Python/PHP http bindings to fetch the content, the most optimal solution was ffmpeg:

ffmpeg -i http://example.org/playlist.m3u8 -c copy -bsf:a aac_adtstoasc output.mp4

Once done, you can check the video for consistency (either with -i command, or simply scrolling through the video).

Just to save on whole procedure, we convert mp4 to only mp3 audio stream with “`ffmpeg“` once again:

ffmpeg -i video.mp4 -b:a 192K -vn music.mp3

Since we have mp3 ready for being check, Amazon Transcribe kicks in, but you need to store your mp3 somewhere. The easiest way is to get yourself S3 bucket from Amazon, and point S3 URL of the file using Transcribe.

Transcribe Admin Panel.

Overall result, of the same 1.5 hrs video being converted into transcribed text, with enabled/disabled speakers identification. Approximately 25-30 mins to get 1.5 MB JSON file of the text, with separate spk_1|spk_2 and time codes.

#delete campaigns: social solidarity vs privacy

Uber: Travel ban

January 2017, we witnessed #deleteuber social media campaign. The movement erupted after Trump’s ban on travel ban from Muslim-majority countries, when NYC taxi drivers went on strike. At the very same moment Uber announced “surged pricing on JFK airport” being turned off.

February 2017 benchmarks shown 200,000 accounts being deleted as an act of solidarity against US President decision.

Cambridge Analytica: Elections

March 2018, Christopher Wylie, whistleblower from Cambridge Analytica (has nothing to do with famous university), gives an interview to Guardian on how the company was collecting Facebook user profiles and presumably helped targeting elections campaign for Republican Party to win the elections.

The whole social media just went nuts on the subject. A chance of your social profile being harvested for micro-targeting to form your opinion on any sociopolitical matter, launched yet another delete campaign – #deletefacebook. Today, it’s been reported on 87m profiles may’ve been leaked to Cambridge Analytica, which is said to be a part of SCL Group.

Some details on who these folks are:

SCL’s involvement in the political world has been primarily in the developing world where it has been used by the military and politicians to study and manipulate public opinion and political will. It uses what have been called “psy ops” to provide insight into the thinking of the target audience. According to its website, SCL has influenced elections in Italy, Latvia, Ukraine, Albania, Romania, South Africa, Nigeria, Kenya, Mauritius, India, Indonesia, The Philippines…(c) Wikipedia

What’s quiet interesting about this whole story, that it’s emphasised the privacy leak at first place. Next week it twisted into yet-another-Trump fault, and all the hell broke loose in social networks.

Frankly speaking, this Trump for/against campaign is not my thing, I’m not  a US citizen. I didn’t vote. Thus, I don’t care. American elections is solely the matter of US people.

Technically speaking, as a person who reads and does some IT things, it breaks down to the subject of privacy, and the medium that we use in day-to-day routines.

If you’re not paying for the product, you are the product

Whenever you use any social medium, you share your private information. Those crazy useless quizzes, asking for your locations, ads rotation, bounce rates. It was just the matter of time, when some company will appear on the horizon and start crunching your data for its own purposes. Marketing tools in combination with psychology and IT, might give you a proper railgun in social science and opinion forming.

It’s your decision to support or ignore #deletefacebook movement. Edward Snowden gave an interview on the matter, that has some insights on your data privacy and the state control. He might be right, that it’s us – our generation – that will impose the control of our personal data, or it’s too late.

What’s after senior developer by Christian Heilmann

It is painful to see how clumsy companies are in trying to keep their techies happy. We do team building exercises, we offer share options. We pay free lunches and try to do everything to keep people in the office. We print team T-shirts and stickers and pretend that the company is a big, happy family. We pay our technical staff a lot and wonder why people are grumpy and leave.

What gets us going is a feeling of recognition and respect. And only peers who’ve been in the same place can give that. There is no way to give a sincere compliment when you can’t even understand what the person does.

Great article from Christian Heilmann about the career ladders and what comes after senior developer.

CakePHP CsvMigrations: prototype in 3 2 .. Done!

Every programmer is tired of coding yet another login form. Yet another CRUD view module. So today I’ll show you an example where our laziness can get you building prototype systems without a single line of extra code.

One of the tasks, that we had to face when developing Qobrix CRM, was fast prototyping of the system. The result of ultimate laziness and DRY concept resulted in cakephp-csv-migrations plugin that we try to use pretty much everywhere, while delivering the system.

Your application is not unique

Whatever you request for your Prototype is mostly based on same functionality:

  • CRUD views
  • Basic CRUD actions
  • Event/Trigger system that allows you mutating the data

If we dive deeper into these points, whatever you work with is form based – your input fields are the minimum atomic unit of interaction: strings, longtexts, datetime. Looks familiar? Exactly, database data types.

In certain cases, you store dates in strings, names in varchars, or even longtext. At this moment we come to the point of having a binding mechanism of your application logic with your storage engine. For simplicity reasons – I’d base this example on RDBMS like MariaDB, MySQL, etc.

Changing those binding might be difficult for the user. As the supplier of the system, you don’t know who’s going to deal with the system: some companies don’t have IT departments, but need to modify things rapidly. The same applies to developers level of expertise for the system. We wanted to make it as simple as possible.

Preparing the App

Note: If you already have a CakePHP application running, just composer require qobo/cakephp-csv-migrations, and you can skip this part.

Theory is boring without examples, so I’ll try to show you a basic thing on how to expand the system with extra modules. For simplicity reasons, I’ll base it on project-template-cakephp template that we frequently use. It already has some dependencies, as well as cakephp-csv-migrations plugin as part of cakephp-utils.

composer create-project qobo/project-template-cakephp baking_app
cd baking_app
./bin/build app:install DB_NAME="baking_app",CHOWN_USER=$USER,CHGRP_GROUP=$USER,PROJECT_NAME="My Baking App"
./bin/phpserv

That’s enough to check that your app is up and running. For basic creadentials and stuff, you can check .env file that was generated by the Robo build scripts.

Baking Recipes Module

Here comes the baking part. We’re going to make a simple recipes module to store our favourite recipes.

./bin/cake bake csv_module Recipes
./bin/cake bake csv_migration Recipes

First command will create dummy MVC instances for CakePHP: Model/Entity, Controller and ApiController files, based on which the second script will verify that you can bake a migration script (based on Phinx migrations).

If you’ll look into migration file created in config/Migrations/<timestamp>_Receipts<timestamp>.php, you see something like that:

<?php
use CsvMigrations\CsvMigration;

class Recipes20180126154309 extends CsvMigration
{
        public function change()
    {
        $table = $this->table('recipes');
        $table = $this->csv($table);

        if (!$this->hasTable('recipes')) {
            $table->create();
        } else {
            $table->update();
        }

        $joinedTables = $this->joins('recipes');
        if (!empty($joinedTables)) {
            foreach ($joinedTables as $joinedTable) {
                $joinedTable->create();
            }
        }
    }
}

Where are the fields? That’s the point where all the magic happens.

CsvMigrations plugin provides you with vast number of input types that can bind to basic data types of your database. They’re stored in config/Modules/Recipes/db/migration.csv file. We’ll expand it a bit:

FIELD NAME,FIELD TYPE,REQUIRED,NOT SEARCHABLE,UNIQUE
id,uuid
name,string
meal_type,list(meal_types)
recipe,text
created,datetime
modified,datetime
created_by,related(Users)
modified_by,related(Users)

I’ve added name, type and recipe fields that can be handled by varchar and longtext data types in the database. Let’s cook it:

./bin/cake migrations migrate

And we are done! You noticed list(<list_name>) type used within migration.csv. This FieldHandler type is used for defined option lists for rendering Select boxes in you form. The lists are stored in config/Modules/Common/lists/meal_types.csv:

VALUE,LABEL,INACTIVE
breakfast,Breakfast,
dinner,Dinner,
supper,Supper
Where are my views?

If you start up the application, and navigate to http://localhost:8000/recipes/add you’ll see something like that:

add recipes form

Now we need to add fields to CRUD form. CsvMigrations can help you with that. All your form fields are located in config/Modules/Recipes/views:

PANEL NAME,FIRST COLUMN FIELD NAME,SECOND COLUMN FIELD NAME
Details,name, meal_type
Recipe,recipe

The example above is for add/edit.csv files being modified. Reloading the add page:

add form complete
Complete add form
Conclusion

Now you’re ready to work with basic CRUD. All the common CRUD logic is already located in cakephp-csv-migrations plugin that will handle API requests for the index page for DataTables grid loading. If you want to change its behavior, you can always override action methods in `RecipesController`.

 

WordPress Gutenberg: it’s not about Text Editor

I wasn’t paying much attention after the announcement of Gutenberg projects from WordPress guys back in 2010’s.

I never had any dramatic impacts by CKEditor embedded in the WordPress admin panel. I still think it’s one of the best examples of UI/UX text editors on the Web. The whole development process caught my attention due to React licensing issue that got the Internet buzzing about for couple of months, until Facebook changed it.

And then I checked this video on the future of WYSIWYG editor and Gutenberg’s impact on the WordPress ecosystem.

This is huge! The whole ecosystem will change its standards of writing plugins/themes. The concept of expanding viewports going beyond the classical monitor resolution, including wearables and other portable devices. Block architecture. Enough with spoilers – just watch the video.

Puppeteer: NodeJS browser automation

Puppeteer is a Node library which provides a high-level API to control headless Chrome over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome.

Demo is woth thousand words. API Documentation covers all the aspects of browser emulation/handling I could think of..

Whether it will be a replacement of NightwatchJS or a sub-component in the current end-to-end stack – fun times of trials & errors will tell.

NoSQL: MongoDB vs DynamoDB for Big Data storage

Minor quote on basic comparison of MongoDB vs DynamoDB from Amazon:

Engines DynamoDB MongoDB
Data Model
  • Key-value with JSON support
  • Up to 400 kb record size
  • Limited data type support
  • JSON-like document
  • Up to 16 mb record size
Querying Key-value queries Query & analyze data in multiple ways — by single keys, ranges, faceted search, graph traversals, and geospatial queries through to complex aggregations

In both cases they perfectly suite for big data storage. Both offer cloud-based (AWS, and Mongo Atlas) and local storage solutions.

Positive feedback on DynamoDB and its pricing policy  (pricing calculator):

Amazon lets you buy operations per second capability rather than CPU hours or storage space. This removes a whole lot of complexity for developers who would otherwise need to tune the database configuration, monitor performance levels, ramp up hardware resources when needed. This provides users a fast and reliable storage space for their needs with costs that scale in direct proportion to the demand.

For now, DynamoDB looks more favorable. The fact that it’s an AWS-only engine doesn’t set any limits. The overall stack of AWS tools that exists in their infrastructure looks promising.

HubSpot: Gatekeepers and Gardeners

HubSpot tech blog published great article on job balancing and tech leads paradox of gatekeepers gardeners.

You might be a gatekeeper if:

  • your team regularly waits for you to review their PRs
  • your team waits to do the next thing assigned to them instead of taking initiative to find projects for themselves
  • you hesitate to go on vacation because you’re concerned your team will struggle in your absence

On the opposite.

A gardener might:

  • forego reviewing work, or let other members of the team take on the responsibility
  • let the team handle their own task management, trusting they understand the needs of the customer, business, and team
  • encourage members to build relationships on and off the team
  • let the team experience failure, trusting in their accountability to fix their problems and learn from their mistakes
  • have their team take on grungy work along with the “fun” work, because they understand the value of it

Great overview of these two roles people occasionally take once becoming managers/tech leads.