Jiří Eischmann

2 months ago

Jiří Eischmann
2 months ago

We have a smart water meter which reports water consumption several times a day. I thought I'd just integrate #HomeAssistant with the service API. However, there is no API, and the webpage with the values is buried behind at least four interactions. Yesterday, I started writing a fairly complex Selenium-based scraper. In the process, I discovered that I was being redirected to a page with values using an authentication token in the URL. I could just use the link and skip all the previous steps.

However, the page with the values is still generated by JavaScript, so I still have to use a web engine for scraping. It's too heavy for Home Assistant Green, so I have to run the service elsewhere and integrate it with HA.

All that just to read ONE goddamn value! What a beautiful rabbit hole for the weekend!

#homeassistant

in reply to Jiří Eischmann

Táta Geek

in reply to Jiří Eischmann 2 months ago

kdysi jsem četl něčí projekt, že odchytával ty odesílané pakety a měl rozkódovaný protokol na kterém to běží. Ale nenašel jsem to, tak moc nepomůžu a přeji hodně štěstí a zábavy

in reply to Táta Geek

Jiří Eischmann

in reply to Táta Geek 2 months ago

@tatageek to mi říkal už @bycx, že to jde odchytávat, ale to by vyžadovalo další hardware a s tímto nemám vůbec zkušenost, tak jsem si řekl, že scrapování bude jednodušší (ale to bylo před mnoha hodinami, co nad tím sedím 🙂). Ještě jsem přemýšlel nad tím tam dát zařízení na odečítání toho ciferníku jako u hloupého vodoměru, ale je v šachtě venku, kde je vlhko, žádný přívod elektřiny a je to dva metry v zemi, takže je otázka, jak by s tím šlo komunikovat vzduchem.
Je to fakt smutné, že máme chytrý vodoměr, ale BVK a Suez se opravdu snaží, aby se k tomu lidi nemohli strojově dostat.

@🔩 Adam Štrauch @Táta Geek

in reply to Jiří Eischmann

Věčný Stopař Kosmírem

in reply to Jiří Eischmann 2 months ago

@tatageek @bycx Jo selenium a data scraping. Připomíná mi to, když jsem dělal scraper Stravy pro #bezimlitr.

#bezimlitr @🔩 Adam Štrauch @Táta Geek

in reply to Jiří Eischmann

dan

in reply to Jiří Eischmann 2 months ago

@tatageek @bycx Taky mam takovej vodomer, mam HW na jeho odecet a stojim na tom, ze mi vodarny odmitaji poskytnout AES klic k desifovani stavu. Ten je pritom pro kazdy kus unikatni, takze kdyz mi ho daji, nijak je to neposkodi. Ty stavy jsou sifrovany aby si zlodeji nemohli podle zmeny stavu vodomeru tipovat prazdne domy.

@🔩 Adam Štrauch @Táta Geek

in reply to Jiří Eischmann

Steve Hill 🏴󠁧󠁢󠁷󠁬󠁳󠁿🇪🇺

in reply to Jiří Eischmann 2 months ago

surely the values themselves are available *somewhere* for the JS to take them and generate the page. So you don't need to execute the JS, you just need to figure out where the JS is getting the data.

in reply to Steve Hill 🏴󠁧󠁢󠁷󠁬󠁳󠁿🇪🇺

Radomír Žemlička

in reply to Steve Hill 🏴󠁧󠁢󠁷󠁬󠁳󠁿🇪🇺 2 months ago

@steve Lol, you beat me to it. 😄

@Steve Hill 🏴󠁧󠁢󠁷󠁬󠁳󠁿🇪🇺

in reply to Radomír Žemlička

Jiří Eischmann

in reply to Radomír Žemlička 2 months ago

@Razemix @steve thanks for the advice, I'll look into it. Not having to have a service in between would be a major simplification.

@Steve Hill 🏴󠁧󠁢󠁷󠁬󠁳󠁿🇪🇺 @Radomír Žemlička

in reply to Jiří Eischmann

Jiří Eischmann

in reply to Jiří Eischmann 2 months ago

@Razemix @steve At first glance, it looks like something that can be scraped without a web engine. The website is generated by ASP, simple "curl -L" fails the authentication, but if it doesn't work with curl after all, it's something Python requests + BeautifulSoup should handle. I have a new branch of the rabbit hole to dig, great! 😅

@Steve Hill 🏴󠁧󠁢󠁷󠁬󠁳󠁿🇪🇺 @Radomír Žemlička

in reply to Jiří Eischmann

Jiří Eischmann

in reply to Jiří Eischmann 2 months ago

@Razemix @steve At second glance, no. First they really try hard to make sure they talk to a browser and second they do refresh the token every day (I was naive to think it could be a long term token). So I really have to login and scrape the bvk.cz website to get the refreshed token. 🙄

@Steve Hill 🏴󠁧󠁢󠁷󠁬󠁳󠁿🇪🇺 @Radomír Žemlička

in reply to Jiří Eischmann

Radomír Žemlička

in reply to Jiří Eischmann 2 months ago

@steve Does it send the data to bvk.cz? Couldn't you just spoof the DNS record and redirect it to a local server?

@Steve Hill 🏴󠁧󠁢󠁷󠁬󠁳󠁿🇪🇺

in reply to Radomír Žemlička

Jiří Eischmann

in reply to Radomír Žemlička 2 months ago

@Razemix @steve There is no integration. You have to log in at bvk.cz, go through 3 pages to find a link (with a token attached) to the Suez portal and then you're redirected there and can read the values there. I've already completed the scraping. It doesn't take too long (20s) and works reliably, but it loads Chromium, so it's too heavy for Home Assistant Green.

@Steve Hill 🏴󠁧󠁢󠁷󠁬󠁳󠁿🇪🇺 @Radomír Žemlička

in reply to Jiří Eischmann

Radomír Žemlička

in reply to Jiří Eischmann 2 months ago

If it's generated by JavaScript, it needs to get those values somehow. Maybe they are already in the page (as JSON in some variable) or it fetches another endpoint. It still should be easier to access those directly.

in reply to Jiří Eischmann

Agnieszka R. Turczyńska

in reply to Jiří Eischmann 2 months ago

I'm confused. If page is generated by javascript, isn't there some kind of API you can use directly instead of scrapping?

in reply to Agnieszka R. Turczyńska

Jiří Eischmann

in reply to Agnieszka R. Turczyńska 2 months ago

@agturcz It's generated by JS and ASP. But most importantly as I found out the token is refreshed daily, so I still have to daily scrap the website of the water company for the token.

@Agnieszka R. Turczyńska

in reply to Jiří Eischmann

Ilkka Tengvall

in reply to Jiří Eischmann 2 months ago

we also have a water meter that sends the values over radio. Unfortunately it's encrypted and no key given to consumers :( sucks

in reply to Jiří Eischmann

Jiří Eischmann

in reply to Jiří Eischmann 2 months ago

A small man's victory over the malice of the water company. $\o/$

#HomeAssistant

#homeassistant

in reply to Jiří Eischmann

Václav Pašek

in reply to Jiří Eischmann 2 months ago

Super! Závidím, mi tady pořád máme staré analogové vodoměry, kde se jen točí kolečka a jednou za rok přijde pracovník na odečet. 🫣 Už jsem se snažil pomocí infradiody odečíst počet otoček jedné ručky, ale jak je to ještě zalité vodou, nebo co to je, tak se mi to nedaří. 😁

in reply to Václav Pašek

Jiří Eischmann

in reply to Václav Pašek 2 months ago

@electricCZ já ti nevím, jestli je co závidět, protože toto je digitalizace udělaná úplně tím nejblbějším způsobem. Vodárny se ani nesnažily to nějak integrovat do svého webu. Musím se tam přihlásit a proklikat čtyři obrazovky, aby ses dostal k odkazu s přiloženým tokenem do systému Suezu (dodavatele vodoměrů). A to je taky kapitola sama pro sebe. Stav ukazují na takové JS šílenosti, která simuluje ciferník vodoměru. Vodoměr očividně posílá stavy 4x denně, protože některé grafy se 4x denně aktualizují, ale samotný stav vodoměru se aktualizuje jen jednou denně. Navíc tam píšou, že k odečtu došlo ve tři ráno, ale ještě v devět ráno tam byl stav z předchozího dne.
No prostě v určité fázi už jsem si říkal, že snad raději budu denně lozit do šachty a odečítat to ručně než pracovat s tímto. 😅

@Václav Pašek

in reply to Jiří Eischmann

Miroslav Buček 🌳

in reply to Jiří Eischmann 2 months ago

@electricCZ API ve 21. století? Zapomeňme.

@Václav Pašek

in reply to Jiří Eischmann

Václav Pašek

in reply to Jiří Eischmann 2 months ago

Sakra.

⇧

Jiří Eischmann 2 months ago • •

Jiří Eischmann
2 months ago