tallrain's blog: 2020

2020年11月24日

Rebuilding SQL Server Indexes

Highlights:

- Check index fragmentation: average percentage, page count

- ALTER INDEX REBUILD

- Manually set Timeout parameter

- Set counter to control session frozen time

- Sleep to release suspended sessions

- EXEC sp_who2 to check real time deadlocks

1. Background

A lot of systems have part of logic running in Database, and as time passed, data and indexes increased, and then Database performance will decrease.

One of key factor of Database performance is Index.

When data and indexes increase, the fragmentation level increase as well.

According to Microsoft article “Reorganizing and Rebuilding Indexes(https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2005/ms189858(v=sql.90))”, we should Reorganize indexes when average fragmentation is between 5% and 30%, and we should Rebuild indexes when average fragmentation is greater than 30%.

We can use below SQL to check average fragmentation of indexes which are greater than 30%:

-------------------------------------------------------------------------------------

SELECT dbschemas.[name] AS 'Schema'

,dbtables.[name] AS 'Table'

,dbindexes.[name] AS 'Index'

,indexstats.avg_fragmentation_in_percent

,indexstats.page_count

FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) AS indexstats

INNER JOIN sys.tables dbtables ON dbtables.[object_id] = indexstats.[object_id]

INNER JOIN sys.schemas dbschemas ON dbtables.[schema_id] = dbschemas.[schema_id]

INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id]

AND indexstats.index_id = dbindexes.index_id

WHERE indexstats.database_id = DB_ID()

AND indexstats.avg_fragmentation_in_percent > 30

--AND dbschemas.[name] = 'dbo'

--AND dbtables.[name] = 'RPV_MAGPLANBEL'

--ORDER BY indexstats.page_count DESC;

ORDER BY indexstats.avg_fragmentation_in_percent DESC;

-------------------------------------------------------------------------------------

We also need to pay attention to page count, each page stores about 8KB data, so a big page count means the rebuild time will be quite long.

2. Considerations

1) Use ALTER INDEX index_name or ALTER INDEX ALL

Use ALTER INDEX index_name ON Table REBUILD can only rebuild 1 index with a command.

ALTER INDEX ALL ON Table REBUILD can rebuild all Indexes of a Table with a command, but it require longer shut down time, because the new SQL might be suspended during the rebuild process.

2) Manually set remote timeout to avoid timeout disconnect:

EXEC sp_configure 'remote query timeout', 600 ;

RECONFIGURE ;

Since rebuilding process might take quite long time, which could be longer than default remote timeout time, so it will be better to manually set it.

3) Set counter to control session frozen time

Even if we’re using PRINT() in SQL Server Management Studio(SSMS), the print out messages will only be showed after the whole SQL block has completed execution.

During the SQL execution, the Messages Tab is blank, so during that time we have no idea what’s going on or going wrong.

So it will be better to spilt the whole session into many small sessions, and we can use Counter to control it.

Now we change the SQL of checking index fragmentation, to order by page_count desc:

-------------------------------------------------------------------------------------

SELECT --dbschemas.[name] AS 'Schema',

dbtables.[name] AS 'Table',

dbindexes.[name] AS 'Index',

indexstats.avg_fragmentation_in_percent,

indexstats.page_count

FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) AS indexstats

INNER JOIN sys.tables dbtables ON dbtables.[object_id] = indexstats.[object_id]

INNER JOIN sys.schemas dbschemas ON dbtables.[schema_id] = dbschemas.[schema_id]

INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id]

AND indexstats.index_id = dbindexes.index_id

WHERE indexstats.database_id = DB_ID()

AND indexstats.avg_fragmentation_in_percent > 30

AND dbschemas.[name] = 'dbo'

--AND dbtables.[name] = 'RPV_MAGPLANBEL'

AND dbindexes.[name] IS NOT NULL

ORDER BY indexstats.page_count DESC;

--ORDER BY indexstats.avg_fragmentation_in_percent DESC;

-------------------------------------------------------------------------------------

So for these indexes with such huge page count, we should rebuild 1 index each time.

And when page count is getting smaller, we can set counter into a bigger number.

4) Set sleep to release suspended sessions.

If we’re running multiple REBUILD operations in same session, we should sleep at least for 5 seconds in between, so the suspended SQLs will be run and released.

By using this:

WAITFOR DELAY '00:00:05';

5) Check real time deadlocks before and during REBUILD.

By running:

EXEC sp_who2

We can see deadlocked sessions(Status=SUSPENDED, or BlkBy IS NOT NULL), the BlkBy points to the session who caused the deadlock.

We should run REBUILD when the moment no deadlocks are found.

Another SQL to list deadlocks:

SELECT *

FROM sys.dm_exec_requests

WHERE DB_NAME(database_id) = 'KBS_FDM'

AND blocking_session_id <> 0;

3. Optimized SQL:

-------------------------------------------------------------------------------------

Declare @getTables CURSOR; -- Cursor to list out all indexes have frag % > 30

Declare @TableName varchar(255);

Declare @IndexName varchar(255);

Declare @command nvarchar(4000); -- Detail ALTER INDEX command

DECLARE @DT NVARCHAR(50); -- Datetime to print out

DECLARE @i INT; -- Counter

-- set remote time out

EXEC sp_configure 'remote query timeout', 600 ;

RECONFIGURE ;

SELECT @DT = CONVERT(NVARCHAR, GETDATE(), 121); --YYYY-MM-DD HH:MI:SS

PRINT(@DT);

PRINT('Rebuilding Index started.');

PRINT('');

--SET @getTables = CURSOR for SELECT name FROM sys.objects WHERE type = (N'U');

--EXEC sp_who2 -- to check existing deadlock(BlkBy)

SET @i = 0;

SET @getTables = CURSOR for

SELECT --dbschemas.[name] AS 'Schema',

dbtables.[name] AS 'TableName',

dbindexes.[name] AS 'IndexName'

--indexstats.avg_fragmentation_in_percent,

--indexstats.page_count

FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) AS indexstats

INNER JOIN sys.tables dbtables ON dbtables.[object_id] = indexstats.[object_id]

INNER JOIN sys.schemas dbschemas ON dbtables.[schema_id] = dbschemas.[schema_id]

INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id]

AND indexstats.index_id = dbindexes.index_id

WHERE indexstats.database_id = DB_ID()

AND indexstats.avg_fragmentation_in_percent > 30

AND dbschemas.[name] = 'dbo'

--AND dbtables.[name] = 'RPV_MAGPLANBEL'

AND dbindexes.[name] IS NOT NULL

ORDER BY indexstats.page_count DESC;

--ORDER BY indexstats.avg_fragmentation_in_percent DESC;

OPEN @getTables;

FETCH NEXT FROM @getTables into @TableName, @IndexName;

--WHILE @@FETCH_STATUS = 0

WHILE @i < 1 AND @@FETCH_STATUS = 0 -- Set @i to bigger number when page count is getting smaller

BEGIN;

SELECT @DT = CONVERT(NVARCHAR, GETDATE(), 121); --YYYY-MM-DD HH:MI:SS

PRINT(@DT);

--set @command = N'ALTER INDEX ALL ON ' + @TableName + N' REBUILD;'; -- WITH (ONLINE=ON);'; for enterprise version only

set @command = N'ALTER INDEX ' + @IndexName + N' ON ' + @TableName + N' REBUILD;';

PRINT(@command);

--PRINT('');

--WAITFOR DELAY '00:00:01';

BEGIN TRY

EXEC (@command);

SELECT @DT = CONVERT(NVARCHAR, GETDATE(), 121); --YYYY-MM-DD HH:MI:SS

PRINT(@DT);

PRINT('Rebuilding Index completed.');

PRINT('');

END TRY

BEGIN CATCH

SELECT @DT = CONVERT(NVARCHAR, GETDATE(), 121); --YYYY-MM-DD HH:MI:SS

PRINT(@DT);

PRINT 'CATCHED!';

END CATCH;

WAITFOR DELAY '00:00:05'; -- wait 5s to release suspended sessions

FETCH NEXT FROM @getTables into @TableName, @IndexName;

SET @i = @i + 1;

END;

CLOSE @getTables;

DEALLOCATE @getTables;

SELECT @DT = CONVERT(NVARCHAR, GETDATE(), 121); --YYYY-MM-DD HH:MI:SS

PRINT(@DT);

PRINT('All indexes rebuilt.');

-------------------------------------------------------------------------------------

4. PS: WITH (ONLINE=ON) condition.

While running ALTER INDEX (ALL) REBUILD, there’s an optional parameter called ONLINE.

If we set ONLINE=ON, that makes targeted index still available during most time of rebuild process, but that’s only available for SQL Server Enterprise version.

2020年9月15日

Thingworx心得之Events Router与Expression的限制

1. Events Router的限制

Thingworx尽管参考了面向对象的设计，但是作为一个工业物联网平台，它简化了很多功能。

比如，我们不能直接利用某个Service或Expression直接给Property赋值，而只能通过Data Binging的方式进行绑定，并且一个Property只能绑定一个数据源。

但是有时候我们有多个数据源，这时就需要自己写Service或者使用Events Router组件。

如下图所示，我们希望点击“Output A”，则输出A；点击“Output B”，则输出列表选中的数据。

其中，列表选中项先绑定到Mashup参数Input2，以模拟Contained Mashup传递的参数。

实际测试：

1) 点击Output A，输出A，正确。

2) 列表选中Area，点击Output B，输出Area，正确。

3) 点击Output A，输出A，正确。

4) 列表选中Area，点击Output B，输出A，错误！

这是因为，Input 1对应的数据来自于Expression输出，每次执行以后，系统都会认为其值已变化（Data Change Type=Always，尽管数值可能不变）。而Input2对应的数据来自于Mashup参数，只有其数值变化时系统才会认为其值已变化。

解决办法是强制让Input2发生变化，如增加一个Events Router，并将Events Router 1的Output作为Input1，这样的效果相当于Output A执行时，将Input 2强制复位：

2. Expression的限制

Expression的作用和Service类似，主要用于客户端的简单处理，其限制为：

1) 不能定义临时变量。

2) 不能输出InfoTable。

3) 只有Changed事件，没有ServiceInvokeCompleted事件，因此无法严格区分多个Expression的执行顺序。

2020年8月21日

美剧式的笑傲江湖 - 游戏《巫师3》赏析

去年买了台二手XBOX ONE，玩了很多畅销大作，如《使命召唤》、《战地》、《绝地求生》、《刺客信条》、《FIFA》等，也利用XGP帐号玩了很多免费游戏如《荒野大飙客2》，但是我最爱的还是《巫师3》。

我在第一遍快速通关《巫师3》后，也找了一些广受好评的RPG游戏玩，如《上古卷轴5》、《刺客信条之奥德赛》、《荒野大飙客2》，但是总感觉没有《巫师3》的那个味。

作为一个中年人，我玩游戏最主要的是看故事，进入一个陌生世界，看看那里的人间百态。

相比较而言，《上古卷轴5》有太多的抉择，而我只想做观众，不想做主角；《荒野大飙客2》和《GTA5》都是黑帮小人物的故事，而这样的背景对我无感；《刺客信条之奥德赛》的故事太乱颇为无趣。

于是我又打开《巫师3》，开始二周目，这次我把精力放在所有的支线任务、狩猎和寻宝上，我耐心地看所有过场动画、对话、文字，任由自己沉浸其中。

如果用一句话总结我对《巫师3》的感受，那就是：美剧式的笑傲江湖。

游戏《巫师》系列继承了奇幻小说《巫师》系列的世界观和人物设定，总体框架宏大，可类比《冰与火之歌》。

在场景设计方面，有低地的沼泽，有中原的都市，有海边的群岛，有高山的宫殿。

在故事发展方面，也是有多条故事线同时发展。如血腥男爵的故事，一条线是他的家庭矛盾，另一条线是沼泽女巫对当地人的控制。在诺维格瑞，更是有3条线同时发展：四大黑帮的斗争，两大王国的战争，永恒之火与女术士集会所两大宗教团体的斗争。但是对于一个活了一百多年的巫师(猎魔人)来说，这些都算不上什么事，因为他只是一个从容的过客、一个无心的看客。

为了推动剧情，游戏制作了大量的过场CG动画，其中大量动画采用动作捕捉制作，因此很多动作和表情都非常令人信服。

巫师(猎魔人)经过身体改造，已经不能算是完全的人类，因此他和人类社会既存在联系，又有一定的距离。这一点和武侠世界中的侠客很象。

主角杰洛特对权力和金钱都没有太多的欲念，主线有两条，一条是寻找女儿希里，另一条是和女术士椰奈法、特丽丝的感情发展，这两条线的进展决定了最终的结局。

但是《巫师3》最令人赞叹的并不是宏大的整体叙事，而是无数小人物的支线故事，而主角在面对他们时往往会面临艰难的抉择：如血腥男爵家暴妻子，但不知道妻子出轨，要不要告诉他真相；如猎人会变成狼人，这一点他自己和妻子都不知道，而爱他的小姨子知道，然后她诱骗猎人变身杀死自己的姐姐；如村民雇佣巫师除魔，成功后却想杀了他来逃避酬金。有太多令人感慨的故事，有太多令人无从下手的抉择。

相对于平民，巫师能够做出一定程度的审判，但任何时候他都能转身离开。

主角在由人类转变成巫师的过程中，因为魔药的作用，从此没有了人类的情感，因此他已是郎心似铁。因此在任何时候，他既可以行动参与其中，也能够视若不见。

《巫师3》的游戏类型为ACTION RPG，有其丰富的动作系统：钢剑砍人兽，银剑斩妖魔，另外还有法印、剑油、炸弹、魔药等，既有物理攻击，也有魔法攻击，手残的玩家也可以通过跳跃闪避苟活。虽然不是最出色的，但是可玩性还是颇高的，特别是注魔后的剑舞很好看。

游戏的整体节奏感非常好，音乐也配合得非常到位，特别在衬托氛围和调动情绪方面，是一个很大的加分点。

最后补充几个游戏的小技巧：

1) 角色技能里激活“美食家”，吃食物时能够自动回血。

2) 冥想能够满血，并补充煎药和炸弹。

3) 常备煎药：猫眼夜视、金莺防毒、拉法达回血、鲸心屏气。

4) 可以通过寻宝任务获得猫派或狮鹫派的装备蓝图，然后找工匠打造全套装备。

5) 激活《石之心》DLC后，执行支线任务“注魔”，完成后可以为剑注魔，比如“断裂”、“护身”。注入“断裂”后的剑舞又秀又能打。

6) 增加金币的最简单方法是消灭流氓和盗贼，然后捡他们的兵器换钱。

7) 与商人和工匠换钱时，如果对方金币不够，可用冥想跳时间，一般跳个五六天对方金币又增加了。

游戏版本建议购买年度版，包括本体的《石之心》、《血与酒》两个出色的DLC。整个游戏所有的支线任务能够玩数月至半年的时间，也就不到200块钱，非常划算。

2020年8月5日

从TikTok看数据主权

近十年来，互联网经济飞速发展，我认为在很大程序上得益于以下几个关键因素：

1. 企业在数据的获取上几乎免费，在数据的使用上几乎不受限制，在数据的保护上几乎没有责任。

2. 当信息作为一种观点时，企业作为技术中立方，免受舆论的指责。

3. 当信息作为一种商品时，企业利用免费策略向新兴领域进行倾销，却没有受到惩罚。

现在情况即将发生巨大的变化。

首先我们来看这3个因素的不合理之处：

1. 数据的拥有者是用户，但是用户很少实质上拥有数据的归属权。企业在获取数据时，没有向用户付费。很多企业偷偷地采集数据时，却没有告知用户。当用户数据丢失时，企业也没有给用户赔偿。

2. 很多平台自动从网上采集信息，然后通过算法自动向用户推送，事实上起到了媒体的作用。和传统媒体的区别是用算法代替了编辑，此外用户评论作为即时反馈会放大文章的舆论力量。

3. 即便我们承认全球化和自由贸易，但是一定程度的贸易保护是不可或缺的。而信息服务是一种特殊的商品，网络贸易保护的正面例子是百度战胜谷歌，负面例子是欧洲在社交应用领域全面溃败。

现在随着逆全球化的发展，很多前置条件将被推翻。

GITHUB的例子告诉我们，每一条信息都有国籍，每一个比特都有国籍。

从前是政府要求大公司删除数据，但是将来公司只是数据的托管方，政府对数据有管理权。

当信息作为一种观点时，政府会出于公共安全的考虑，对数据进行监管。在可见的将来，提供观点的社区或平台会被要求实名制，当A国的网民发出文章时，他的文章会被A国监管；当B国的网民对他的文章进行评论时，此文章和评论将同时被A国B国监管。也就是说，数据的拥有域带有主权的属性。

当信息作为一种商品时，政府会出于贸易保护的考虑，对数据进行拦截和过滤。防火墙是一种方便的工具。此外，免费销售和发行有可能受到反倾销调查。

因此，信息作为观点和商品，政府会出于社会安全和贸易保护的考虑，对其进行监管。在政府的眼里，数据有价、有风险、有立场，因此必须被纳入主权管理。

2020年6月1日

《巫师3》DLC《石之心》故事赏析

《石之心》是《巫师3》DLC，其剧情是由一个完整的故事串起来的。

贵族欧吉尔德与爱丽丝相爱、订婚，然而因为家族败落，被女方家庭悔婚。于是欧吉尔德和魔鬼(游戏中的镜子大师)订下契约，魔鬼帮他实现心愿，他就把灵魂卖给对方。魔鬼要契约生效，还得牺牲一个至亲之人。欧吉尔德说不能是爱丽丝，魔鬼说没问题，于是他的弟弟殒命了。魔鬼给猎魔人杰洛特下了个套，让他帮助欧吉尔德实现他的三个愿望。

这就是故事的基本框架，我们可以看出这是一个类似《浮士德》的故事，实际上在波兰民间有一个高度类似《浮士德》的传说，而《石之心》就是受它启发编写的。

第一个愿望：带欧吉尔德的弟弟弗洛迪米来阳间一日游。

欧吉尔德愧疚于弟弟的牺牲，于是让猎魔人唤醒他，来个人间一日游。弗洛迪米附身于猎魔人，和好友夏妮一起参加她闺蜜的婚礼。

夏妮闺蜜是个平民，她的婚礼质朴而又热闹。

弗洛迪米是个游荡子，但是内心善良且本质纯真，他本来看不起底层平民，但是婚礼让他认识到平凡生活中的幸福时刻。

他的感性活泼，和猎魔人的冷漠理性，两者相反相成，带给夏妮难忘的一日一夜。

弗洛迪米这个角色的设计是非常有意思的，因为他一直羡慕敬仰哥哥，却不知他的生活却也让平民羡慕，而平民的婚礼让他知足。

一个人的不快乐，往往因为他的不满足。

第二个愿望：取得波索迪拍卖行的屋子，获得财富。

这主要是一个抢劫金库的动作戏，但是里面又有兄弟相争的剧情。

波索迪的弟弟艾瓦德因为被家族抛弃，而对哥哥心生仇恨，从而策划了抢劫，生动地诠释了“嫉妒使我面目全非”。

第三个愿望：取回当初给爱丽丝的玫瑰。

欧吉尔德为了重新获得财富和地位，一直潜心于研究黑魔法，从而与爱丽丝越来越疏远，他们的爱情也渐渐消散。后来欧吉尔德离家出走游荡，爱丽丝在孤寂中香消玉殒。

找到玫瑰，就是找到最初的爱。

为了找到玫瑰，猎魔人唤醒爱丽丝的魂，修复她破碎的记忆。

猎魔人进入爱丽丝的画中世界，这个世界就是她与欧吉尔德的家，她记忆中的一切。而她破碎的记忆正是她生活中的片羽鳞爪，这些瞬间虽然平凡，但是于她而言却又弥足珍贵，正是因为它们的破碎让她的灵魂不得安歇。

最后猎魔人修复了这些记忆，而爱丽丝的身影也渐渐消失，最后留下了一朵玫瑰。

从这三个故事我们可以看到，猎魔人打的不是怪，而是人的心魔。

完成三个愿望后，魔鬼要夺走欧吉尔德的灵魂，此时猎魔人可以选择打败魔鬼，从而救下欧吉尔德。

魔鬼与猎魔人打了一个赌：只要他能在指定时间内，找到一个谜语的答案，那么欧吉尔德就可以活下来。

猎魔人在完成任务的过程中，除了要打怪，还要抵抗他的贪念：一个山洞里有大量的金币，另一个地方有怪物看守的宝剑，如果他去寻宝就会浪费宝贵的时间。

如果他想战胜魔鬼，那么首先他要战胜自己的心魔。

《石之心》真是带给人太多的惊喜：婚礼的热闹与欢乐让人留恋，月光下的湖面水乳交融，画中世界的哥特式景观带给人心悸，而这几个精心编织的故事更是让人长久地回味。

2020年5月19日

Thingworx Analytics Introduction 3/3: Time Series Data TTF Prediction

This is the 3^rd article of series: Thingworx Analytics introduction.

1^st article introduces Analytics Function and Modules.

2^nd article introduces Non Time Series TTF prediction.

Environment: Thingworx Platform 8.5, Analytics Server 8.5, Analytics Extension 8.5.

In TTF prediction, Time Series Data model is better than Non Time Series Data model.

What’s Time Series Data model?

As per Machine Learning concept, the process of training model is to find the best matching parameters, and to use them to combine with features to calculate the result values.

For Non Time Series Data models, it’s not relevant between current values and previous values.

But for Time Series Data, during each time of calculation, it will check not only current values, but also previous values, and the previous values are time relevant.

There’re 2 important terms here: look back size, and data sampling frequency.

Look back size = numbers of data samples need to look back.

Data sampling frequency = the frequency of taking a new data sample.

Look back size * Data sampling frequency = the total numbers will be feed to run calculation, for which I call it Look back area, the data is queried from Value Stream, with maxItems = look back size, and startDate = current date – look back area.

For example, if look back size = 4, data sampling frequency = 1 minute, then look back area = 4 minutes.

Thingworx used Value Stream to log Time Series Data, we can run service of QueryProperyHistory to get historic data:

Some other notes for TTF prediction models:

• Always set useRedundancyFilter = true

• Always set useGoalHistory = false

• Always set lookahead = 1

• Training time is much longer than Non Time Series data

After the model is published, test of model requires to input not only current values of all features, but also previous values as defined in look back size, than click Add Row to add the new record.

When creating Analysis Events, please be noted that the Thing Properties of Inputs Mapping should be Logged, because only Logged Properties can be queried with historic values from Value Stream.

And for Results Mapping, if we bind the result to Thing Property, even if that Property is set Logged, somehow the update by Analytics will not be logged into Value Stream. We can create another Property, and bind it with Results Mapping, and sync it with the final Property you’re monitoring by Service or Subscription, and Log it into Value Stream. After that, we can track all of the historic changes with TimeSeriesChart.

Below charts show the comparison between Time Series Data models predicted TTFs and actual TTFs.

We can see that, compare to Non Time Series models, the prediction is much more accurate, and is faster to match the actual TTF curve.

Below table lists out the settings of all models:

Another note, to make sure to write all features into Value Stream in same moment in same record, we should use UpdatePropertyValues instead of SetPropertyValues.

Some codes for reference:

----------------------------------------------------------------------

// Use UpdatePropertyValues to make sure to update all attribute in the same moment

var params = {

infoTableName : "InfoTable",

dataShapeName : "NamedVTQ"

};

// CreateInfoTableFromDataShape(infoTableName:STRING("InfoTable"), dataShapeName:STRING):INFOTABLE(VSTestDataShape)

var tempDable = Resources["InfoTableFunctions"].CreateInfoTableFromDataShape(params);

var time = new Date();

tempDable.AddRow({

time: time,

quality: "GOOD",

value: 71

});

tempDable.AddRow({

time: time,

quality: "GOOD",

value: 72

});

tempDable.AddRow({

time: time,

quality: "GOOD",

value: 73

});

tempDable.AddRow({

time: time,

quality: "GOOD",

value: 74

});

tempDable.AddRow({

time: time,

quality: "GOOD",

value: 75

});

me.UpdatePropertyValues({

values: tempDable /* INFOTABLE */

});

var result = tempDable;

----------------------------------------------------------------------

2020年5月18日

Thingworx Analytics Introduction 2/3: Non Time Series TTF Prediction

This is the 2nd article of series: Thingworx Analytics introduction.

1^st article introduces Analytics Function and Modules.

3^rd article introduces Time Series TTF prediction.

This article is for TTF(Time to Failure) prediction, it is quite useful in IoT applications.

Environment: Thingworx Platform 8.5, Analytics Server 8.5, Analytics Extension 8.5.

Step 1, configure Analytics setting.

Click Analytics icon >> Analytics Manager >> Analysis Providers, create a new Analysis Provider, Connector = TW.AnalysisServices.AnalyticsServer.AnalyticsServerConnector.

Click Analytics Builder >> Setting, select and set Analytics Server.

Step 2, create Analysis Data.

We need to prepare a CSV format data file and a JSON format data type file.

For CSV file, its 1^st line is its header, should be equivalent with definition in JSON file.

JSON file structure can refer to:

---------------------------------------------------

[

{

"fieldName": "s2",

"values": null,

"range": null,

"dataType": "DOUBLE",

"opType": "CONTINUOUS",

"timeSamplingInterval": null,

"isStatic": false

{

"fieldName": "s3",

"values": null,

"range": null,

"dataType": "DOUBLE",

"opType": "CONTINUOUS",

"timeSamplingInterval": null,

"isStatic": false

}

]

---------------------------------------------------

CSV file includes below information:

• goalField, for result checking.

• key parameters fields.

• Data filter field, e.g. we can use filed record_purpose to filter training data out of scoring data, while training data is for model training, and scoring data is for validation or test.

• If the dataset type is time series data, we need to create 2 additional fields to identify AssetID(sequence of failure) and Cycle(cycle number inside AssetID).

Here I used a public NASA dataset for training, download URL:

https://c3.nasa.gov/dashlink/resources/139/

When data is ready, click Analytics Builder >> Data >> New..

Select CSV file and JSON file, check option of “Review uploaded metadata”.

For most parameter fields, the data type should be Continuous.

And for goal filed, data type should be Continuous or Boolean, in this example we use Continuous.

Click “Create Dataset”.

When it’s done, the newly created Data will be showed in Datasets list.

Select the data, and click Dataset >> View >> Filters >> New, to create filter.

Step 3, create Machine Learning model.

Click Analytics Builder >> Models >> New.

Select dataset, and select goal field and filter, and select excluded fields:

Click Advanced Model Configuration:

In Machine Learning, normally we split data into training set, validation set, and test set, with rate of 60%, 20%, 20%.

In this step, we can use Filter to filter out test data, and use Validation Holdout % to define percentage of validation, we can use default setting of 20%.

Learning Techniques includes possible Machine Learning algorithms, we can manually add algorithms and modify parameters.

For each new model, I’d suggest to training it with different algorithms, and test them to get the most accurate one.

For same algorithm, we can train it with different parameters.

Ensemble Technique is the mix method of multiple algorithms, we can try and test with different settings.

My training methods:

Click Submit to create model.

For small dataset, it requires little training time, but for time series data, it requires much longer training time.

For any dataset, the data size is bigger, the training time is longer.

When it’s done, we can find the new model in Analytics Builder >> Models.

Select the model and click View, then we can see the model information.

Select the model and click Publish, then we can publish it to Analytics Manager, we can see it in Analytics Manager >> Analytics Models, and a Test Page will pop up.

Step 4, test model.

The test page will pop up when model is published, and we can also access it by: Click Analytics Manager >> Analysis Models, select model, click View >> Test:

For causalTechnique, normally we set it as FULL_RANGE.

For goalField, input name of goal field.

Then input values of all features/sensors, and click Add Row. In case of time series data, we need to add multiple rows for it.

Select 1^st row, click Set Parent Row, then click Submit Job.

System will calculate and get result based on model and input values.

It might take a few seconds for calculation.

Under below Results Data Shape, select AnalyticsServerConnector.xxx, click Refresh Job, then you can see the result value.

For more detail information, you can check from Analysis Jobs.

Step 5, set automatic calculation.

After model creation, we might monitor it for some time, to check the prediction with actual values.

Take the NASA data for example, we could use some of its data for comparison, you may refer to below steps.

Firstly, build a Data Table, load CSV data into it.

Then build a Thing, create attributes mapping with model features.

Then create a Timer, to read data from Data Table and write value into Thing timely, and use these data for test.

Click Analytics Manager >> Analysis Models, enable the newly created model.

Click Analysis Events >> New, set Source Type = Thing, set Source as newly created Thing, set Event = DataChange, set Property as trigger attribute.

Save and create new Event under Analysis Events, click Map Data >> Inputs Mapping, set Source Type = Thing, and bind model features with Thing Property.

Tips: model feature names are started with _, and that causalTechnique & goalField can use static values. So if we defined such features in Thing, we can use Map All in this step, which will map all features automatically.

Then click Results Mapping, bind the test result with Thing Property. Please be noted, system generated result will be updated, but will NOT be logged into Value Stream, so we need to build another Property, and use Service to sync the data and log to Value Stream eventually.

When Event is setup, system will monitor and trigger automatically, and output value by calling internal Analytics API.

For both manual test and Event calculation, we can see the details in Analysis Jobs.

My TTF Demo Mashup for reference:

Comparison of different models:

Some notes:

• If you didn’t’ install Analytics Platform, then Analytics will run job in Synchronous mode. That means if many jobs are submitted in same time, then only 1 job will be running in that moment, and the other jobs will be pending(state=waiting). To avoid waiting jobs, we can manually create multiple trigger properties, and use PAUSE in JavaScript to delay the time in purpose.

• If raw data has many features, we can build a smaller model with 2 features, and use it to train and test, to find issues in modeling, and to optimize logic in Data Table/Thing/Timer, then extend to all features.

• Be cautious of using Timer, because wrong operations might cause mass data in Value Stream, or generate lots of Waiting Jobs. Too many Waiting Jobs will block analysis and prediction. We can use Resource of TW.AnalysisServices.JobManagementServicesAPI.DeleteJobs to delete jobs by force, and we can check Job’s total number by running Data Table’s Service of TW.AnalysisServices.AnalysisDataTable.GetDataTableEntryCounts.

Thingworx Analytics Introduction 1/3：Analytics Function and Modules

This is the first article of series: Thingworx Analytics introduction.

2^nd article introduces Non Time Series TTF(Time to Failure) prediction.

3^rd article introduces Time Series TTF prediction.

In year 2015, PTC acquired Machine Learning company ColdLight, and integrate its products into Thingworx, and renamed to Thingworx Analytics.

PTC Thingworx Analytics Help Site lists its functions:

• Explanatory Analytics: such as to identify a signal’s value

• Descriptive Analytics: such as calculation of average value, median value, and standard deviation value.

• Model Generation: model creation of prediction, includes some popular Machine Learning algorithm.

• Predictive Scoring: to predict result value based on trained model and parameters.

• Prescriptive Scoring: to change result values by modifying model parameters.

• Confidence Models: convert prediction result into rates of value ranges.

• Anomaly Detection: to filter out abnormal signals by comparing high and low limits.

• Time Series Predictions: some signal are time series related, so for each time of prediction, besides checking current values, it will also check the values of previous cycles.

• Learners and Ensemble Techniques: Machine Learning algorithm and mixing method.

Analytics implemented these Machine Learning algorithm:

• Linear Regression

• Logistic Regression

• Decision Tree

• Neural Network

• Random Forest

• Gradient Boost

And let’s see Analytics modules.

Analytics has 3 installation modules: Analytics Server, Analytics Platform, Analytics Client(extension).

When Analytics Server is installed, it will add these services into Windows OS:

• twas-analytics-worker-1

• twas-analytics-worker-2

• twas-analytics-worker-3

• twas-async-ms

• twas-sync-ms

• twas-twx-adapter

• twas-zookeeper

And create these entities into Thingworx platform:

• StatisticalCalculationMicroserver, with some SPC calculation services.

• StatisticalMonitoringMicroserver, with some SPC distribution services.

• Some Thing entities with AnalyticsServer in names, can be found in Monitoring >> Remote Things, it can also be accessed by Thingworx Server API.

• Data Tables with prefix of TW.AnalysisService, they’re Database of Analytics.

• Resources with prefix of TW.AnalysisService, mapping with Analytics Builder functions.

Analytics Platform is an independent module, it can improve performance, and we can run basic prediction with it.

Analytics Client can be installed as an extension, Thingworx will add an Analytics icon after installation in left action bar.

Analytics action bar has 2 categories of functions: Builder and Manager. Builder is for model building, and Manager is for background handling.

订阅：博文 (Atom)

tallrain's blog

2020年11月24日

Rebuilding SQL Server Indexes

2020年9月15日

Thingworx心得之Events Router与Expression的限制

2020年8月21日

美剧式的笑傲江湖 - 游戏《巫师3》赏析

2020年8月5日

从TikTok看数据主权

2020年6月1日

《巫师3》DLC《石之心》故事赏析

2020年5月19日

Thingworx Analytics Introduction 3/3: Time Series Data TTF Prediction

2020年5月18日

Thingworx Analytics Introduction 2/3: Non Time Series TTF Prediction

Thingworx Analytics Introduction 1/3：Analytics Function and Modules

和我联系

博客归档