Data Craze Weekly #5

Tę wiadomość możesz otrzymać bezpośrednio na swoją skrzynkę dzięki zapisowi na newsletter – Data Craze Weekly.

Przegląd Tygodnia

Optymalizacja PostgreSQL

Slajdy które są w linku, zostały stworzone w 2017 trafiłem na nie przypadkiem w poprzednim tygodniu.

Są tak solidne (i dalej aktualne) , że nie mogłem się nimi z Tobą nie podzielić.

Autor skupia się na pokazaniu w jaki sposób, można zoptymalizować zapytania SQL, dość częste i trywialne mogłoby się wydawać, ale niestety kosztujące nas (przynajmniej w PostgreSQL) sporo mocy obliczeniowej i czasu.

Nie korzystasz z PostgreSQL? Nic nie szkodzi, sprawdź zapytania i zobacz czy nie mają zastosowania również w Twoim silniku bazodanowym.

Slajdy od 38, nt. DISTINCT wgniotły mnie w fotel.

I jeszcze lekki cytat od autora:

– Efficient execution of some popular queries requires the implementation of the alternative procedural algorithm
– Implementation of custom algorithms is usually easier when using PL/PgSQL
– The same algorithm implemented on SQL runs faster
Process:
– Implement and debug algorithm on PL/PgSQL
– Convert to SQL

Link: https://www.slideshare.net/pgdayasia/how-to-teach-an-elephant-to-rocknroll

Czym jest Kafka i czy jej potrzebujesz

Świetny artykuł jak stoisz przed wyborem wykorzytania Apache Kafka.

Czym jest to narzędzie (technologia), kiedy warto z niej skorzystać, kiedy lepiej się wstrzymać.

Poniżej krótki cytat z sekcji Conclusion ale naprawdę warto przeczytać całość.

Kafka is a highly scalable and durable message processing platform with great real-time data processing features. It will be a good fit in use cases like IoT, Click Stream Analytics, Real-Time Data Integration, Event Sourcing, Log Aggregation, etc. But it is not a solution that can be used in any data processing requirement. Kafka should not be used as an ETL tool or as a database even though its feature set may seem similar.

Link: https://memphis.dev/blog/apache-kafka-use-cases-when-to-use-it-when-not-to/

Tableau Data Trends 2022

PDF podzielony na części tematyczne. Każda część zakończona rekomendacjami.

Warte przejrzenia chociażby z perspektywy kierunków w którym podążają duże firmy.

Dwa cytaty poniżej:

Companies that are developing AI will increasingly spin up their own Ethics as a Service (EaaS) offerings within their professional service organizations. We will see a race to hire AI ethicists to become compliant with the new regulations, making AI ethicists in even greater demand than AI developers.
— KATHY BAXTER, PRINCIPAL ARCHITECT, SALESFORCE ETHICAL AI PRACTICE

Data quality and data-driven decision-making go hand in hand. An organization-wide commitment to data governance mitigates risk and drives future success for everyone in the business.
—SCOTT TEAL, PRODUCT MARKETING MANAGER, SNOWFLAKE

Link: https://www.tableau.com/sites/default/files/2022-02/Data_Trends_2022.pdf

Narzędzia

TablePlus - “a native application which helps you easily edit database contents and structure in a clean, fluent manner.”

Korzystasz z jakiegoś IDE do pracy z bazą danych np. DBeaver? Może warto przetestować coś innego?

Jeżeli tak to na ratunek przychodzi TablePlus. „Ładne” (kwestia gustu) teoretycznie natywne (wspierające natywnie konkretne bazy danych) narzędzie.

W teorii można korzystać za darmo (przynajmniej tak twierdzi repozytorium) w praktyce darmowe korzystanie mocno ogranicza narzędzie:

The free trial is limited to 2 opened tabs, 2 opened windows, 2 advanced filters (filters are not available on the free TablePlus Windows) at a time. We can change the limitations without any notifications in the future releases.

Jako alternatywa do np. Data Gripa, warte rozważenia.

Link: https://github.com/TablePlus/TablePlus

Sprawdź Wiedzę

#SQL

Dzisiaj bez zadania, ale bardzo Cię proszę otwórz slajdy z pierwszego linku.

TUTAJ (dla przypomnienia)

Zerknij na przykład 02 „IOS for large data offsets”.

Najczęsciej spotykana paginacja na stronach internetowych OFFSET + LIMIT.

Poczytaj jakie może mieć konsekwencje przy dużym offsecie. Sprawdź czy takie sytuacje są u Ciebie.

Więcej pytań z SQL znajdziesz: SQL - Q&A

Praca

Remote Data Engineer, 10 Senses – Fully Remote – PLN 36,000 – PLN 40,000 (net/month, B2B)

Szukane umiejętności: SQL, Python, Spark